What causes HR data duplication when using email-based automation?

Duplication almost always traces back to missing lookup logic. When a mailhook triggers a scenario that creates a new record without first checking whether one already exists, every email from the same person or application generates a new row. The fix is architectural: add a search step before every create step.

What is a search-before-create pattern and why does it matter?

Search-before-create means your scenario queries the target system for an existing record using a unique identifier before it ever writes a new one. If a match is found, the scenario routes to an update path. If no match is found, it routes to a create path. This single architectural decision eliminates the vast majority of HR data duplicates.

What is the compliance risk of not deduplicating HR data?

Duplicate records increase the surface area for data privacy violations. If an employee submits a deletion request under GDPR or CCPA, a system with duplicate records may delete one copy while the other persists — creating a non-compliance event. Deduplication at the point of capture is a data governance control, not just an operational convenience.

blog-headers-business-automation-4Spot-Consulting-26.png

Post: How to Prevent HR Data Duplication with Make.com Mailhooks: A Step-by-Step Guide

By Jeff ArnoldPublished On: December 10, 2025

How to Prevent HR Data Duplication with Make.com™ Mailhooks: A Step-by-Step Guide

HR data duplication is not a cleanup problem. It is a workflow architecture problem — and the place to solve it is at the point of capture, not after the damage is done. If your Make.com™ mailhook scenarios are writing new records every time an email arrives without first checking whether a record already exists, duplicates are inevitable regardless of how carefully your team manages data downstream.

This guide walks through a complete, production-ready deduplication pattern for HR mailhook scenarios: how to parse incoming emails, extract a reliable unique identifier, query your system of record, and branch cleanly between update and create paths. For the broader context on why mailhooks are the right trigger layer for email-driven HR workflows, start with the parent pillar: Webhooks vs Mailhooks: Master Make.com HR Automation. If you are new to the trigger type, read how mailhooks work in Make.com before proceeding.

Before You Start

Before building the deduplication scenario, confirm you have the following in place.

A Make.com™ account with an active subscription that supports the number of operations your email volume requires.
A dedicated mailhook email address configured in Make.com™ for the HR data type you are deduplicating (candidates, employees, or contractors — one mailhook per data type is the recommended architecture).
API access to your target system — your ATS, HRIS, or the spreadsheet/database acting as your system of record. You need both a search/lookup endpoint and a create/update endpoint.
A defined unique identifier for each record type. For candidates, this is typically email address or ATS application ID. For employees, this is the HRIS employee ID. Decide this before building — changing it later requires rearchitecting the lookup logic.
An error destination — a Slack channel, email address, or shared log where failed parse attempts will be routed. Without this, parse failures silently drop records.
Time estimate: Allow 2–3 hours to build, test, and validate the full scenario including error handling. Skipping the testing phase is the most common reason deduplication scenarios fail in production.

Step 1 — Create the Mailhook Trigger and Capture the Raw Email

The mailhook is the entry point. Every deduplication decision flows from what arrives here, so the trigger configuration determines the reliability of everything downstream.

Inside Make.com™, create a new scenario and add a Mailhook module as the trigger. Make.com™ will generate a unique email address for this mailhook. Copy that address — it is what HR staff, applicant portals, or forwarding rules will send to in order to trigger the scenario.

Configure the mailhook to capture:

Sender email address (this is often your primary unique identifier for candidate records)
Email subject line (useful for routing logic if you process multiple record types through one mailhook — though a separate mailhook per type is cleaner)
Email body in plain text (easier to parse reliably than HTML)
Any attachments, if your HR process includes resume or document ingestion

Send a test email to the mailhook address now. Make.com™ will capture the structure of the incoming data, which you will use to map fields in subsequent modules. Do not proceed until you have a successful test capture — every downstream module depends on the field structure established here.

For a full walkthrough of initial mailhook configuration, see the guide to set up your first Make.com™ mailhook.

Step 2 — Parse the Email and Extract the Unique Identifier

Raw email content is not queryable. Before you can run a lookup, you must extract a structured unique identifier from the email body or headers. This step is where most deduplication scenarios fail — not because the logic is wrong, but because the parsing is unreliable.

Add a Text Parser module immediately after the mailhook trigger. Use a regular expression or keyword extraction pattern to pull the unique identifier from the email body. Common patterns include:

Candidate email address: Extractable from the sender field directly — no body parsing required. Use {{1.from.address}} as your identifier without additional parsing.
ATS Application ID: Typically appears in a structured format in the email body (e.g., “Application ID: APP-00482”). Use a regex pattern like Application ID:\s*(\S+) to capture it.
Employee ID: If your HRIS sends notification emails, the employee ID is usually in a fixed position in the subject line or body. Build your regex around that fixed position.

After parsing, apply standardization immediately:

Convert email addresses to lowercase ({{toLowerCase(parsedEmail)}})
Strip leading/trailing whitespace from all extracted fields
Normalize name fields to Title Case if you are using name as a secondary matching field

Standardization before lookup is mandatory. A lookup for sarah@company.com will not match an existing record stored as Sarah@Company.com unless you normalize first.

For more complex email formats — forwarded resumes, freeform candidate replies, multi-section HR forms — see the guide on advanced mailhook parsing techniques.

Step 3 — Validate the Extracted Identifier Before the Lookup

A null or malformed identifier must stop the scenario before it reaches the lookup. If it does not, the lookup returns no match, the scenario routes to the create branch, and a duplicate record is created from a bad parse — the worst possible outcome.

Add a Filter module immediately after the parser. Configure the filter with a single condition:

Condition: Extracted identifier is not empty AND matches expected format (e.g., valid email pattern or non-zero length string for IDs)

If the filter condition fails — meaning the identifier is empty or malformed — the scenario stops. Route the stopped execution to your error destination using Make.com™’s error handler or a parallel notification branch:

Log the raw email subject, sender, and timestamp to your audit log
Send an alert to the HR ops Slack channel or email address with the message “Mailhook parse failed — manual review required”
Do not create a record, do not update a record. Stop.

This filter is the most important single module in the entire scenario. Every duplicate created by a failed parse traces back to a missing or incorrectly configured validation step here.

Step 4 — Search Your System of Record for an Existing Record

With a validated, standardized identifier in hand, query your target system to determine whether a record already exists. This is the search-before-create gate — the core of the deduplication pattern.

Add a Search Records module (or the equivalent lookup action for your specific ATS, HRIS, or database connector). Configure it to query by your unique identifier field:

ATS example: Search candidates where email = {{extractedEmail}}
HRIS example: Search employees where employee_id = {{extractedEmployeeId}}
Spreadsheet example: Search rows where column A = {{extractedIdentifier}}

Configure the search to return the record ID if a match exists, or return empty/null if no match is found. Make.com™ passes this result to the next module, which uses it to determine the routing path.

Important: Set the search to return at most one result. If your system of record already contains duplicates, a multi-result search can cause the Router logic in the next step to behave unpredictably. If multiple records are returned, treat it as an error condition and route to your error notification path for manual review.

Step 5 — Route to Update or Create Based on the Search Result

Add a Router module after the search step. Configure two routes:

Route A — Record Exists (Update Path)

Filter condition: Search result record ID is not empty

This route runs when the lookup found a match. The correct action is to update the existing record with any new information from the email — do not create a new record.

Map only the fields that should be updated from the email content. Do not overwrite fields that the email does not contain — leave those unchanged in the existing record.
Include a “last updated via” field update, setting it to “mailhook” and the current timestamp. This creates an audit trail for every email-driven update.
Log the action: existing record ID, identifier value, timestamp, and action = “UPDATED”.

Route B — No Record Found (Create Path)

Filter condition: Search result record ID is empty

This route runs only when no matching record exists. Create a new record using all fields extracted from the email.

Map all extracted and standardized fields to the appropriate columns/fields in your target system.
Set a “record source” field to “mailhook” so your HR team can identify email-originated records in reports.
Log the action: new record ID (returned by the create module), identifier value, timestamp, and action = “CREATED”.

Both routes write to the same audit log. This shared log is what gives HR an accurate count of creates vs. updates over time — a metric that surfaces if deduplication is working as expected.

Step 6 — Build the Audit Log

Every deduplication decision — create or update — must be recorded. An audit log is not overhead; it is the evidence layer that answers the question “why does this record look the way it does?” when HR needs to investigate a data discrepancy weeks later.

At the end of both Route A and Route B, add an Add Row (Google Sheets), Create Record (Airtable), or equivalent logging module. Each log entry should capture:

Timestamp of execution
Source email sender address
Email subject line
Extracted unique identifier
Action taken: CREATED or UPDATED
Target system record ID
Scenario name and Make.com™ execution ID (for cross-referencing Make.com™ execution history)

APQC research consistently finds that HR teams with documented data governance processes spend significantly less time on error correction and data reconciliation. An automated audit log built into every mailhook scenario is a lightweight but durable data governance control.

Step 7 — Configure Error Handling for the Full Scenario

Error handling is not a finishing touch — it is a structural requirement. Unhandled errors in Make.com™ scenarios silently drop the triggering email from the processing queue. In HR workflows, a dropped record means a candidate not followed up with, an employee record not updated, or a compliance-relevant change that never made it to the system of record.

Configure error handlers at three points in the scenario:

After the parser: Handle cases where the text parser module fails entirely (malformed email, encoding issues). Route to the error notification path and log the raw email for manual review.
After the search module: Handle API timeout or connection errors to the target system. Use Make.com™’s built-in retry logic (up to three retries with exponential backoff) before routing to the error notification path.
After the create/update modules: Handle write failures — permission errors, field validation failures, system downtime. Log the failed payload so a manual re-submission is possible without losing the original email data.

For a comprehensive treatment of all mailhook failure modes and their handlers, see the guide on mailhook error handling patterns.

How to Know It Worked

Run the following three-test validation sequence before activating the scenario for live traffic.

Test 1 — New Record Creation

Send an email to the mailhook address with a unique identifier that does not exist in your target system. Verify that exactly one new record is created in the target system and that the audit log captures a CREATED entry with the correct identifier and timestamp.

Test 2 — Duplicate Prevention (Update Path)

Send a second email to the mailhook address using the same identifier from Test 1. Verify that no new record is created in the target system. Verify that the existing record is updated with any new field values from the second email. Verify that the audit log captures an UPDATED entry — not a second CREATED entry.

Test 3 — Parse Failure Handling

Send a deliberately malformed email — one with a missing or garbled identifier field — to the mailhook address. Verify that no record is created or updated in the target system. Verify that the error notification is sent to your designated error destination. Verify that the audit log captures a PARSE FAILED entry.

All three tests must pass before the scenario handles live HR data. If Test 2 produces a second CREATED entry, the Router filter condition on Route A is misconfigured — revisit Step 5. If Test 3 produces a record in the target system, the validation filter in Step 3 is missing or incorrectly scoped.

Common Mistakes and How to Avoid Them

Using Name as the Primary Unique Identifier

Names are not unique and they change. Two candidates named “Sarah Johnson” will collide. An employee who changes their name after marriage will generate a new record instead of updating the existing one. Always use a system-assigned ID or email address as the primary key.

Skipping Standardization Before the Lookup

Based on our testing, the most common cause of false “no match” results — which incorrectly route to the create path — is a capitalization or whitespace mismatch between the incoming data and the stored value. Normalize before you query. Always.

Running One Scenario for All HR Data Types

A single mailhook scenario that handles candidates, employees, and contractors with internal branching logic becomes difficult to debug and fragile to maintain. Separate scenarios with separate mailhook addresses are faster to troubleshoot, easier to version, and simpler to hand off to another operator.

Not Monitoring the Audit Log

The audit log only delivers value if someone reviews it. Set a weekly reminder for an HR ops team member to scan the log for anomalies: an unexpected spike in CREATED entries may indicate a new email source that is bypassing the scenario, while an unexpected spike in PARSE FAILED entries may indicate a change in the format of incoming emails that requires a parser update.

Treating Deduplication as a One-Time Build

Email formats change. Upstream systems change. New data sources come online. The deduplication scenario requires periodic review — at minimum whenever a new HR email source is added or a connected system updates its API. Build the review into your regular automation maintenance cadence.

Why Deduplication Is a Data Governance Imperative, Not Just an Operations Fix

Gartner research has established that poor data quality costs organizations an average of $12.9 million per year. In HR, the stakes are amplified: duplicate records affect headcount reporting, recruiting pipeline metrics, payroll accuracy, and regulatory compliance simultaneously.

Parseur’s Manual Data Entry Report documents that manual data entry error rates reach 1–5% under normal conditions — and those errors compound when the same data is entered multiple times across multiple systems. The search-before-create pattern eliminates the redundant entry problem at the source, rather than attempting to reconcile errors after the fact.

SHRM data consistently shows that HR data errors create downstream costs in payroll corrections, compliance remediation, and employee relations — costs that dwarf the time investment required to architect the deduplication logic correctly from the start. The MarTech 1-10-100 rule applies directly here: it costs $1 to prevent a data error, $10 to correct it, and $100 to recover from the consequences of acting on bad data.

The architecture described in this guide is the prevention layer. It costs one afternoon to build and runs indefinitely without human intervention.

Next Steps

With a deduplication scenario in place, the next layer of HR email automation is knowing when mailhooks are the right trigger and when a webhook-based approach gives you better control. For that decision framework, read when to use webhooks instead of mailhooks — and if your team is still processing HR emails manually, the case for stopping that immediately is laid out in full at stop processing HR emails manually.

The deduplication pattern described here is one of the nine automation opportunities identified in a standard OpsMap™ engagement. If you want to map the full scope of your HR automation gaps before building, that is the place to start.

Free OpsMap™️ Quick Audit

One page. Five minutes. Pinpoint where your business is leaking time to broken processes.

Get Your Audit →

Free Recruiting Workbook

Stop drowning in admin. Build a recruiting engine that runs while you sleep.

Download Free →