7 Duplicate Candidate Filters to Build in Make for Cleaner Talent Pipelines in 2026
Duplicate candidate records are a data integrity failure that compounds daily. Each duplicate inflates your pipeline counts, distorts source-of-hire analytics, and risks sending the same candidate two outreach sequences from two different recruiters — a candidate experience problem that top talent notices immediately. Your ATS won’t catch most of them. Candidates applying across job boards, referral portals, and direct web forms arrive with enough field-level variation to pass native duplicate detection without a flag.
The fix is an automation-layer filter that intercepts candidate data before it touches your ATS. This is exactly what the parent pillar — Master Data Filtering and Mapping in Make for HR Automation — establishes as the foundational principle: enforce data integrity at the intake layer, not after the fact. The 7 filter patterns below implement that principle specifically for candidate deduplication, ranked from the simplest entry point to the most comprehensive multi-signal approach.
1. Email Exact-Match Filter Using Make’s™ Built-In Condition
The email exact-match filter is the fastest, cheapest deduplication check available and should be the first layer in any pipeline.
- How it works: A Make™ filter condition placed immediately after the trigger module checks whether the incoming email address field is populated and evaluates it against a stored reference — typically a Data Store or a lookup in your ATS via an HTTP module.
- Trigger point: Fires on every new candidate submission regardless of source (web form, job board webhook, CSV import).
- Match behavior: If the email exists in the reference store, the record routes to a review branch. If no match, it passes through to the ATS write module.
- Limitation: Fails when the same individual uses different email addresses across applications — which is common for candidates with both personal and professional email accounts.
- Verdict: Non-negotiable first layer. Stops the majority of straightforward duplicates with zero additional API calls beyond the lookup.
2. Normalized Email Filter (Lowercase + Trim Before Match)
Case and whitespace variations cause exact-match filters to miss duplicates they should catch. Normalization closes that gap.
- How it works: Before the filter condition runs, a Make™ text function module applies
toLowerCase()andtrim()to the incoming email string and to the stored reference value. The comparison then runs on normalized strings. - Use case: Catches “Jane.Smith@Email.com” and “jane.smith@email.com” as the same record — a miss for raw exact-match logic.
- Build location: Insert a Set Variable or a Tools > Set Multiple Variables module between the trigger and the filter condition.
- Pairing recommendation: Always run normalization before any string-comparison filter in a recruiting pipeline. It adds one module and costs nothing.
- Verdict: A two-minute addition to Filter #1 that meaningfully improves catch rate. There is no reason to skip this step.
3. Composite Key Filter (Name + Phone Combination)
When email alone is insufficient, a composite key built from multiple fields creates a more resilient deduplication signal.
- How it works: Make™ concatenates normalized first name, last name, and phone number into a single string key (e.g., “johnsmith5551234567”). That composite key is checked against a Data Store where prior records have been stored in the same format.
- When to deploy: Use as a secondary check after the email filter passes — meaning the email is new, but you want to confirm the person isn’t already in the system under a different address.
- Data Store structure: Store composite keys as primary keys in a dedicated “candidate fingerprint” Data Store, written at the same moment the record is passed to the ATS.
- Edge case: Phone number formatting must be normalized before concatenation (strip spaces, dashes, parentheses, country codes) or the composite key will produce false negatives.
- Verdict: The right second layer for organizations receiving high referral volume, where the same individual is frequently submitted with a personal email by one source and a work email by another.
4. ATS Lookup Filter via HTTP Module
For teams whose ATS exposes a search or candidate-lookup API endpoint, a live ATS query during the Make™ scenario run is the most authoritative duplicate check possible.
- How it works: After the trigger fires, an HTTP > Make a Request module sends a GET request to the ATS API’s candidate search endpoint with the incoming email as a query parameter. The response is parsed; if a candidate ID is returned, a filter routes the record to the review branch.
- Advantage over Data Store: The ATS is the system of record. This approach catches duplicates created manually by recruiters inside the ATS that were never written through the automation pipeline — a gap that internal Data Store approaches cannot see.
- Latency consideration: An API call adds processing time per scenario run. For high-volume intake (hundreds of applications per hour), test throughput against Make™ rate limits and ATS API throttle limits before deploying at scale.
- Authentication: Most ATS platforms use OAuth 2.0 or API key auth. Store credentials in Make’s™ connection manager, never in plain text inside the scenario.
- Verdict: The gold standard for organizations where recruiter-entered records and automated records must be checked against a single shared truth. Requires ATS API access — confirm availability before building.
5. Fuzzy-Match Filter with String Normalization and Similarity Threshold
Exact-match logic fails on “Jon Smith” vs. “John Smith.” Fuzzy matching catches the near-duplicates that account for a significant share of undetected entries in high-volume pipelines.
- How it works: Make™ retrieves all existing candidate records from a Data Store, then uses a formula or a connected module to compute string similarity between the incoming name fields and stored names. Records above a defined similarity threshold (typically 85–90%) are flagged for human review rather than auto-processed.
- Implementation options: Make™ native string functions can handle basic normalization and substring matching. For true similarity scoring, route the candidate data through a lightweight AI module (e.g., a structured prompt asking “Are these two names likely the same person?”) and parse the binary response as the filter condition.
- False positive risk: A threshold set too low will flag legitimately different candidates with common names. Route fuzzy-match positives to human review — never to automatic discard.
- Performance note: Fuzzy matching against a large Data Store on every trigger is computationally heavier. Index by last name first to reduce the comparison set before running similarity logic.
- Verdict: Deploy this as a third layer, not a first. It catches what layers 1–3 miss, but adds complexity that is only justified after the simpler filters are in place and confirmed gaps remain.
6. Source-Tag Filter to Prevent Cross-Channel Re-Entry
A candidate who applied via a job board and is later referred internally isn’t always a problem — but they should never enter the pipeline as a net-new candidate. A source-tag filter controls that distinction.
- How it works: Every record written to the Data Store or ATS carries a source tag (e.g., “LinkedIn,” “Referral,” “Direct Web Form”). When a new application arrives, the duplicate check retrieves not just whether the candidate exists but also what source they originally came through. The routing logic then applies source-specific rules: a referral on an existing job-board candidate triggers a “merge and note” action rather than a standard duplicate discard.
- Business value: Source-of-hire data is a primary recruiting analytics input. McKinsey Global Institute research on workforce data quality consistently links clean source attribution to more accurate sourcing ROI measurement. A source-tag filter protects that attribution layer from contamination.
- Build tip: Add a “source” field to the Data Store schema from day one. Retrofitting source tags onto an existing Data Store requires a full re-migration of historical records.
- Verdict: Essential for any organization using source-of-hire data to make channel investment decisions. Treats duplicates as an analytics integrity problem, not just a data cleanliness problem.
7. Duplicate Review Queue with Recruiter Checkpoint
No automated filter is perfect. The seventh pattern is not another detection layer — it is the safety net that makes all six preceding layers safe to operate at scale.
- How it works: Every record flagged as a potential duplicate by any of the preceding filters routes to a dedicated review queue — a Google Sheet tab, an Airtable base, or a Slack channel with structured message formatting — rather than being silently discarded. A recruiter reviews the flagged record, makes a determination, and the queue item is marked resolved.
- Why deletion is dangerous: A record dropped without review could be a legitimate re-application after a six-month gap, a candidate referred for a completely different role, or a name-change scenario. Silent deletion is also a potential compliance exposure under data subject rights frameworks. SHRM guidance on candidate data handling consistently emphasizes documented disposition over untracked deletion.
- Queue hygiene: Build a Make™ scheduled scenario that flags review-queue items older than 48 hours with an escalation alert. Unreviewed queues become the same bottleneck they were designed to prevent.
- Asana research context: Asana’s Anatomy of Work research documents that knowledge workers lose significant productive time to work about work — tracking status, following up on unresolved items. A structured queue with a clear SLA eliminates the informal “is this a dup?” Slack message that interrupts recruiters mid-task.
- Verdict: Non-negotiable. Every deduplication pipeline needs a human checkpoint for ambiguous cases. This pattern is what converts a brittle automation into a production-grade workflow that recruiting teams trust.
Every team I’ve worked with underestimates the cost of duplicate candidates because they measure recruiter time in obvious places — interviews, screens, offers. They don’t measure the silent tax: the minutes lost every time someone opens a duplicate record, the skewed source-of-hire data that sends sourcing budget to the wrong channels, the candidate who gets two outreach emails and loses confidence in the organization before the first call. That accumulating cost dwarfs the one-time effort of building a proactive filter. Build the filter once. The cleanup cycle never ends.
In production pipelines, a single email-address deduplication check stops the large majority of duplicate records in typical multi-source recruiting environments. The remaining cases — candidates with multiple email addresses, name variations, or manual referral entries — require a composite key check or a fuzzy-match step. The right sequence: run the email check first (fast, cheap), then run the composite-key check on records that pass, then escalate only genuinely ambiguous records to the human review queue. Don’t build fuzzy logic before confirming exact-match gaps actually exist in your pipeline.
Teams new to automation often want a binary outcome: duplicate gets deleted, non-duplicate gets processed. In practice, silent deletion creates a different problem — a legitimate re-application or a referral for a new role disappears without recruiter visibility. Every pipeline we’ve built routes flagged duplicates to a lightweight review queue: a dedicated Slack channel, a Google Sheet with a ‘Review’ tag, or an email digest. The recruiter spends 30 seconds making the call. That 30 seconds prevents the data loss, the compliance exposure, and the candidate relationship damage that silent deletion causes.
How to Sequence These Filters in a Single Scenario
The seven patterns above are not mutually exclusive — the most resilient pipelines layer multiple checks in sequence. A practical production sequence looks like this:
- Normalize email (Filter #2) → Email exact-match lookup (Filter #1): catches the large majority of duplicates at the lowest cost.
- If email is new → Composite key check (Filter #3): catches same-person, different-email scenarios.
- If composite key is new → ATS live lookup (Filter #4) if API access is available: confirms against the authoritative system of record.
- Flag remaining ambiguous records for fuzzy-match review (Filter #5): reserve for high-volume pipelines where data quality analysis confirms near-duplicate gaps.
- Tag every record with a source label (Filter #6) on write, regardless of duplicate status.
- Route all flagged records to the review queue (Filter #7): no silent deletions.
Parseur’s Manual Data Entry Report documents the per-employee cost of manual data handling at $28,500 per year — a figure that reflects both direct labor time and downstream error correction. A layered deduplication pipeline eliminates a measurable share of that cost specifically for recruiting intake workflows.
For more on how filtering logic integrates with the full HR data layer, see the sibling guides on essential Make™ filters for recruitment data and precision hiring filter logic in Make™. For the module-level mechanics that support these scenarios, the guide on Make™ modules for HR data transformation covers the Data Store, HTTP, and text function modules used throughout the patterns above.
What to Do After the Filters Are Live
Building the filters is step one. Maintaining their effectiveness requires two ongoing practices:
Monitor the review queue weekly. A queue that fills without being cleared defeats its own purpose. Track the volume of flagged records and the resolution rate. If more than 15% of records are hitting the review queue, the filter thresholds or normalization logic need recalibration — not a more aggressive discard rule.
Audit Data Store accuracy quarterly. Data Stores are not self-cleaning. Candidate records that were entered manually into the ATS and never flowed through the automation layer will not exist in the Data Store. Run a periodic reconciliation between the ATS candidate list and the Data Store to close that gap. The guide on mapping resume data to ATS custom fields covers the field-mapping logic that keeps ATS and Data Store schemas synchronized.
For the full framework connecting deduplication to the broader data integrity architecture, return to the parent pillar: Master Data Filtering and Mapping in Make for HR Automation. The deduplication layer covered here is one component of a complete clean-data strategy — the pillar maps how all the components connect.
Additional context on clean pipeline strategy is covered in the guides on clean HR data workflows for strategic advantage and unifying HR data with filtering and mapping.




