AI-era talent
evaluated in a new way
Don't measure tool fluency — measure the thinking: how someone explains a problem to AI, validates the output, and chooses the right tool. Beyond the deliverable, “how they solved it” is captured automatically, so reviewers can see the whole process at a glance.
- Evaluation areas
- 5
- Task time
- 60–180 min
- Auto-collected
- 100%

Looking only at the deliverable
isn't enough anymore
Anyone can produce output with AI now. The same deliverable is worth different amounts depending on how it was made. Evaluation has to see the difference.
The deliverable alone won't tell you
From the final output, you can't distinguish between an AI-only result and one a person crafted with AI.
Tool fluency ≠ ability
A heavy Cursor user isn't necessarily a strong collaborator. The real signal is the thinking — decomposing a problem, validating the result.
Fairness gets shaky
If having a paid subscription or premium tooling moves the needle, you'll miss high-potential candidates.
Candidates solve;
reviewers see the process
Candidates get a smooth task environment; reviewers get the deliverable plus a timeline on one screen. Neither side has to write extra narrative.
- 01Step 1
Task starts
Open the browser IDE and start solving. Built-in chat and terminal, plus top models provided free.
Web IDEBuilt-in AI modelsConnect external tools via MCP - 02Step 2
Process auto-captured
Prompts, tool switches, test runs, and git diffs are all recorded on the timeline. No write-up required from the candidate.
Prompt logTool switchingTest resultsgit diff - 03Step 3
Reviewer
Deliverable + auto-collected signals across 5 areas + qualitative notes — together on one screen. Pass/no-pass can be decided elsewhere.
Timeline view5-area scoringTags & notes
Five areas of AI evaluation
We map signals an evaluator can read from automatic capture vs. signals that need a human reviewer — separately, per area.
Problem framing
Track decomposition of requirements in the initial prompt and how sub-questions are split.
- # of decomposition steps
- Sub-question split pattern
- Whether constraints/assumptions are stated
Prompt design
Whether constraints and output format are specified, context attached, and system prompt used.
- Output format specified
- Context attached
- System prompt usage
Validation · critical thinking
Follow-up rate, edits to AI output, edits before paste, test re-runs.
- Follow-up rate
- AI output edit count
- Edits before paste
- Test re-runs
Tool use efficiency
Model choice patterns, tool switch frequency, completeness vs. prompt count.
- Model choice patterns
- Tool switch frequency
- Completeness vs. prompts
Final output
git diff, test pass rate, and the result file — read directly. No automatic scoring.
- git diff
- Test pass rate
- Result file
Timeline integration
Signals from all 5 areas in chronological order on one screen. Click an event → jump to the original prompt/diff/log.
- Event → original jump
- 1–5 scoring per area
- Tags & notes
Same starting line for everyone
and we collect only what's visible
The platform provides top Claude, OpenAI, and Gemini models for free. Whether a candidate has a paid subscription doesn't change the outcome. Before submission, candidates see exactly what records will be sent and can opt out per item.
- Models provided
- Free
- Pre-submit preview
- 100%
- Auto-deletion
- 90 days
- ASame environment
Candidate A
· No paid subscriptionClaudeGPT-5Gemini Pro - BSame environment
Candidate B
· Cursor Pro userClaudeGPT-5Gemini Pro - CSame environment
Candidate C
· First time with AI toolsClaudeGPT-5Gemini Pro

