
Post: How to Audit Your Screening-to-Hire Correlation: A 20-Hire Method
This audit tells you whether your hiring filter produces signal or noise in one afternoon. You pull your last 20 successful hires, find where each ranked in initial screening, and look at the distribution. If your best people didn’t rank at the top, your filter is misfiring. It’s the highest-leverage move in the AI resume screening rebuild.
Before You Start
Gather two things from your ATS: a list of your last 20 hires you’d happily hire again, and each one’s original screening score or rank. If your ATS can’t export stage data, that’s its own finding — see ATS features that resist AI gaming for what to demand from a tool. Block off an uninterrupted afternoon and a single spreadsheet; you need no statistician and no new software for this. The one prerequisite that trips teams up is honesty about the quality bar, so decide before you open the data what “would happily hire again” means in your shop and write that definition down. Fixing the standard in advance keeps you from quietly grading on a curve once the names appear and a flattering correlation starts to tempt you.
Step 1: Pick Your 20 Hires
Choose the last 20 people you hired who turned out well — not the last 20 hires regardless of outcome. Quality is the variable you’re testing against. The distinction matters more than it sounds. If you grab the last 20 hires regardless of how they worked out, you are testing whether your filter predicts “got hired,” which it does by definition and which tells you nothing. By restricting to people you would enthusiastically hire again, you make performance the thing your screen has to predict. Use a concrete bar to pick them: would you fight to keep this person, and would you clone the role with them in it? If you have fewer than 20 who clear that bar, use what you have and note the smaller sample, because a clear signal from 12 strong hires beats a muddy one from 20 mixed ones.
Step 2: Pull Each One’s Original Screening Rank
For every hire, find where they sat in the initial screen: ATS keyword score, assessment score, or recruiter rank at the application stage. Record it next to their name. This is the number your filter assigned them before anyone met them. Be strict about using the original number, not a reconstruction. The whole audit hinges on capturing what the screen actually said at application time, before any human judgment touched the candidate, so resist the urge to estimate where someone “would have” ranked. If your strongest performer scored a 71 when your typical cutoff is 80, that 71 is the finding — pull it exactly as recorded. Build a simple two-column list: name and original screening rank. That list is the entire raw material of the audit, and its value depends on the second column being the filter’s untouched verdict.
Step 3: Plot the Distribution
Sort your 20 hires by their screening rank. The question is simple: did your best people cluster at the top of the screen, or scatter through the middle and bottom? A working filter puts strong hires up top. A broken one spreads them everywhere. You do not need statistics software for this; a single sorted column tells the story at a glance. Lay the 20 ranks out from highest to lowest and look at the shape. If a filter that works is feeding you signal, your proven performers bunch near the top because the screen rewarded the same thing performance later rewarded. If instead they are sprinkled evenly from top to bottom, the screen ranked them as if at random, which is the visual signature of a filter measuring something disconnected from job performance. The scatter is the diagnosis, and it is visible without a single formula.
Step 4: Count the Would-Have-Been-Filtered
Mark every hire who ranked below your typical screening cutoff — the ones your filter would have buried if you’d trusted it fully. This count is your headline number. Hearing “six of our last twenty great hires would have been filtered out” is what changes a hiring manager’s mind. The reason this single count carries the argument is that it translates an abstract worry into a loss the team already absorbed. “AI is gaming our filter” is a debate; “we nearly rejected six people we now consider essential” is a fact about your own roster. Draw the cutoff line where you actually trust the screen in practice, then count every strong hire who fell below it. Keep the names attached during your own review, because nothing lands harder in the briefing than a manager realizing their best report sat under the line the filter would have enforced.
Step 5: Brief Your Team With the Number
Bring the distribution to a 30-minute alignment conversation. Show managers that application quality alone would have rejected real performers. Then connect it to the fix: move judgment earlier with a structured phone screen and a judgment-based application question. Run the meeting in that order on purpose. Lead with the count and the scatter, sit with the discomfort for a moment, and only then introduce the fix, because a room that has just seen its own near-misses is ready to change in a way a room hearing a theory is not. Name the specific hires if your culture allows it, since “remember how close we came to passing on this person” converts a skeptic faster than any deck. Close by assigning the two cheapest next moves so the energy from the number does not dissipate before anyone acts on it.
How to Know It Worked
You’ll have a single distribution chart and a count of strong hires who ranked below cutoff. If that count is greater than two or three out of twenty, your filter is producing noise and the rebuild is justified by your own data — not a theory. The deeper sign the audit worked is a changed conversation: hiring managers stop arguing about whether the filter is sound and start asking where to move judgment earlier. A clean result is informative too. If your strong hires genuinely cluster at the top, your screen is earning its place, and you have spent an afternoon to confirm it rather than assuming it. Either outcome replaces opinion with evidence, which is the entire point of running the audit before you touch a setting.
Common Mistakes
- Using all recent hires, not your best ones. You’re testing the filter against quality, so the sample has to be quality hires.
- Stopping at the chart. The audit’s value is the conversation it enables — book the manager briefing.
- Changing filter settings instead of the approach. If the filter produces noise, tuning it produces tuned noise. Move evaluation to judgment.
Expert Take
Every team I’ve run this with expected a clean correlation and found scatter. That first chart is uncomfortable, which is exactly why it works — it converts “AI is gaming us” from an abstraction into a number you can’t unsee. Run it before you touch a single filter setting. The data will tell you the filter isn’t the thing to optimize.
Next Step
Once the audit confirms the gap, add the cheapest fix first: a single judgment question on your application. Then read the pillar guide for the full screening rebuild.

