
Post: Make.com Error Handling: Build Unbreakable ATS Data Syncs
How to Build Unbreakable ATS Data Syncs with Make.com™ Error Handling
ATS data syncs break for predictable reasons — missing fields, API timeouts, schema drift, and token expiration. The difference between a team that discovers these failures in a Monday morning audit and a team that never notices them is error architecture. This guide walks through how to build that architecture in Make.com™ — before the first production bundle moves. For the broader strategic framework this guide fits into, see our parent resource on advanced error handling in Make.com™ HR automation.
Before You Start
Before building any error handling layer, confirm you have these prerequisites in place.
- ATS API documentation: Know which fields are required, which endpoints are rate-limited, and what error codes the ATS returns on failure. Most ATS vendors publish this in a developer portal.
- Unique identifier mapped: Every candidate record in your ATS has a unique ID (applicant ID, email, or composite key). Identify it before building any write module.
- Destination system credentials: HRIS, CRM, or communication tool API tokens with write permissions. Confirm token expiration windows — short-lived tokens need a refresh module in the scenario.
- A persistent log destination: A Google Sheet, Airtable base, or database table dedicated to error logs. Make.com™ execution history has a limited retention window and is not a substitute for a persistent audit log.
- Notification channel: An email address or Slack channel the HR ops team actively monitors, designated specifically for automation error alerts — separate from general team notifications.
- Estimated time: 3–5 hours to build a production-ready ATS sync scenario with all three error handling layers. Scenarios without error architecture take 1 hour. The extra time is the difference between a prototype and a production system.
Step 1 — Map Every Failure Point Before Building
Identify every location in the sync where data can fail before writing a single module. This is non-negotiable.
Open a blank document and list each stage of the sync: webhook receipt, field parsing, validation, lookup, write, confirmation. For each stage, answer three questions: What can go wrong here? What does the downstream system do if this stage sends bad data? What is the business cost if this fails silently?
For a standard ATS-to-HRIS sync, the failure points typically include:
- Webhook payload: Missing required fields, unexpected null values, wrong data types
- Authentication: Expired token on the receiving system
- Lookup: ATS applicant ID not found in the HRIS — new record vs. duplicate ambiguity
- Write: Rate limit exceeded, 500-series server error, field validation rejection from the destination system
- Confirmation: No acknowledgment returned from the receiving system — was the write actually committed?
Document this map. It drives every design decision in the steps that follow. Asana’s Anatomy of Work research consistently finds that teams working without documented process maps spend significantly more time on unplanned rework — failure-point mapping is the fastest way to front-load that thinking.
Step 2 — Build the Input Validation Gate
Validation gates catch bad data at the source — before it contaminates any downstream system. Build this as the first functional layer in the scenario, immediately after the webhook trigger or ATS polling module.
In Make.com™, implement the validation gate using a Router module with filter conditions. Configure the router with two paths:
- Valid path: All required fields are present, formatted correctly, and meet business rules (e.g., email passes regex check, start date is in the future, applicant ID is non-null)
- Invalid path: Any required field is missing, null, or malformed
On the invalid path, add two modules in sequence: a datastore or Google Sheets write module that logs the failed payload with a timestamp and the specific validation failure reason, followed by a notification module that alerts the HR ops team with the candidate name, ATS ID, and what field failed.
Do not use the Ignore directive here. Ignore tells Make.com™ to proceed silently — which means malformed data either gets written incorrectly or the module fails downstream without context. A Router-based validation gate is explicit and auditable.
For a detailed walkthrough of validation logic patterns, see our guide on Make.com™ data validation for HR recruiting.
The 1-10-100 rule, documented by Labovitz and Chang and cited by MarTech, applies directly here: catching a malformed field at entry costs a fraction of the effort required to reconcile a corrupted HRIS record weeks later.
Step 3 — Implement the Upsert Pattern on Every Write Module
The upsert pattern eliminates the duplicate record problem that makes teams reluctant to enable retries. Build it before configuring any retry logic.
For every write operation in the scenario (creating or updating a candidate record, posting a status change, triggering a downstream workflow), structure the modules as follows:
- Lookup module: Search the destination system for a record matching the unique identifier (ATS applicant ID or candidate email). Configure it to return zero results gracefully — do not let a “not found” result throw an error.
- Router module: Branch based on the lookup result. If a record is found, route to an update path. If no record is found, route to a create path.
- Write module (update or create): Execute the appropriate write with error handling configured on the module itself (covered in Step 4).
With the upsert pattern in place, retries are safe. Whether a bundle retries once or five times, the outcome is the same: one record in the destination system, correctly updated.
Step 4 — Configure Module-Level Error Routes on Every API Write
Module-level error routes are the correct mechanism for handling API write failures in Make.com™. Right-click any module in the scenario builder, select “Add error handler,” and choose the appropriate directive. For ATS sync scenarios, the directive logic is:
- Break: Use on any write module where the failure should trigger a retry. Break stops execution for this bundle and queues it for retry. This is the correct directive for 429 rate-limit errors and 5xx server errors.
- Resume: Use only when the failure of one module should not block subsequent bundles from processing — for example, a non-critical status update that can be skipped without downstream impact. Use sparingly.
- Rollback: Use when a partial write would leave data in an inconsistent state across systems. Rollback reverses all committed operations in the current execution cycle. Required for any multi-system sync where atomicity matters.
- Ignore: Avoid. Ignore produces silent failures with no log and no alert. It is appropriate only for truly inconsequential operations — which rarely exist in ATS syncs.
For every module set to Break, attach an error route (a separate path connected to the module’s error handler output) that leads to the error logging and alert modules built in Step 5. This ensures that even failed-and-retried bundles generate an audit log entry.
Our sibling guide on error handling patterns for resilient HR automation covers additional directive combinations for complex multi-branch scenarios.
Step 5 — Build the Error Logging and Alert Layer
Every error route in the scenario should terminate at a standardized logging and alert module pair. Build this as a reusable sub-scenario or a consistent terminal block that all error routes connect to.
Log entry fields (minimum required):
- Timestamp (ISO 8601 format)
- Make.com™ scenario name and scenario ID
- Execution ID (available via the
{{executionId}}variable) - Module name and position where the failure occurred
- HTTP error code and error message returned by the API
- Candidate name, ATS applicant ID, and email (the minimum identifiers needed to find the record manually)
- Raw payload excerpt (truncated to avoid PII overexposure in log stores)
- Error severity level: Critical (data not written) vs. Warning (retry succeeded, no action needed)
Alert routing by severity:
- Critical: Immediate notification to the HR ops team via email or Slack. Include the log entry link and the manual resolution steps for the specific error type.
- Warning: Append to a daily digest log. Do not send real-time alerts for warnings — alert fatigue causes teams to ignore all alerts, including the critical ones.
For a deeper look at how structured error reporting feeds into monitoring workflows, see our resource on error reporting that makes HR automation unbreakable.
Step 6 — Configure Retry Logic with Appropriate Intervals
Retry logic resolves the majority of transient ATS sync failures — rate limit exhaustion, momentary API unavailability, network timeouts — without any human intervention. Configure it correctly or it creates new problems.
In Make.com™, retry behavior for modules set to Break is configured at the scenario settings level. Set the following:
- Maximum retry attempts: 3 attempts for most ATS API integrations. More than 5 retries rarely succeeds for persistent errors and delays human intervention unnecessarily.
- Retry interval: Start at 5 minutes for the first retry. Use increasing intervals (5 min, 15 min, 30 min) rather than fixed intervals. Fixed short intervals hammer a rate-limited API and worsen the situation. For a detailed breakdown of interval strategy, see our guide on rate limits and retry logic in Make.com™.
- After exhausted retries: Route to the error logging and alert layer built in Step 5. A bundle that fails all retry attempts must generate a Critical-level alert — this is the signal that a human needs to intervene.
Gartner’s automation research consistently identifies transient connectivity failures as the leading cause of automation scenario failures — proper retry logic eliminates this entire failure category from requiring human attention.
Step 7 — Handle Webhook-Specific Failure Modes
ATS webhooks introduce failure modes that polling-based triggers do not. Webhooks can arrive out of order, carry unexpected payload structures after an ATS vendor update, or time out before Make.com™ can process them.
Implement these webhook-specific safeguards:
- Immediate acknowledgment: Configure the webhook receiver to return a 200 OK response to the ATS as quickly as possible — before any processing logic runs. Most ATS systems will retry webhook delivery if they do not receive an acknowledgment within a timeout window, creating duplicate processing. Return the 200 first, then process.
- Payload structure validation: Add a JSON validation step immediately after the webhook trigger. Check that the payload contains the expected top-level keys before passing data to any downstream module. Route invalid payloads to the error log immediately.
- Idempotency key check: If the ATS includes a webhook event ID in the payload, log it and check for duplicates before processing. ATS systems that retry webhook delivery will send the same event multiple times — the idempotency check prevents duplicate processing.
Our sibling guide on preventing and recovering from webhook errors in recruiting workflows covers webhook-specific architecture in detail.
Step 8 — Test Every Error Path Before Go-Live
Happy-path testing is not sufficient for production ATS sync scenarios. Every error route, retry path, and alert trigger must be tested deliberately before the scenario goes live.
Run these specific tests:
- Missing required field: Send a payload with a required field deliberately removed. Confirm the validation gate catches it, logs it, and sends an alert. Confirm no partial data reaches the destination system.
- Invalid data type: Send a payload with a date field containing a string value. Confirm the validation gate catches it.
- Simulated API failure: Temporarily change the destination API endpoint to an invalid URL to force a connection error. Confirm the Break directive triggers, the retry queue activates, and the error log captures the failure.
- Duplicate payload: Send the same webhook payload twice. Confirm the upsert pattern results in a single updated record — not two created records.
- Exhausted retries: Configure the scenario to retry 1 time with a 1-minute interval for testing purposes. Force a persistent failure. Confirm the Critical alert fires after the retry is exhausted. Reset retry settings to production values afterward.
Document the results of each test. This documentation becomes the runbook for the HR ops team when a production alert fires.
How to Know It Worked
A properly built ATS sync error handling architecture produces these observable outcomes within the first two weeks of production operation:
- Zero silent failures: Every execution that does not complete successfully generates a log entry and — if data was not written — a Critical alert. If the HR ops team is not receiving any alerts and no log entries appear, run a deliberate test failure to confirm alerting is live.
- Retry success rate above 80%: Most transient failures resolve on the first retry. If more than 20% of retried bundles are exhausting all attempts and generating Critical alerts, investigate the root cause — likely a rate limit being hit consistently, indicating the scenario trigger frequency needs adjustment.
- Zero duplicate records in the destination system: Spot-check 10 candidate records in the HRIS or CRM against the ATS weekly for the first month. Any duplicate indicates the upsert pattern is not applied consistently across all write modules.
- Log entries match execution history: Cross-reference the persistent error log against Make.com™ execution history for the same period. Every failed execution in Make.com™ history should have a corresponding entry in the log store.
Common Mistakes and Troubleshooting
Mistake: Using Ignore on API write modules
Ignore produces silent failures. Replace Ignore directives on write modules with Break plus an error route to the log and alert layer. The extra 10 minutes of setup eliminates hours of manual reconciliation.
Mistake: Logging to Make.com™ execution history only
Make.com™ execution history has a retention window that varies by plan. Teams on lower-tier plans may lose historical error data within days. Always write to an external persistent store — Google Sheets, Airtable, or a database module — in addition to execution history.
Mistake: Fixed short retry intervals
A 30-second fixed retry interval hammers a rate-limited ATS API with repeated failed requests, worsening the rate limit situation. Use increasing intervals. Most transient API issues resolve within 5–15 minutes.
Mistake: No idempotency check on webhook-triggered scenarios
ATS systems retry webhook delivery when they do not receive a timely acknowledgment. Without an idempotency check, a single candidate application event can trigger multiple write operations. The result is duplicate records or conflicting status updates that require manual cleanup.
Mistake: Testing only the happy path before go-live
If the error routes have never been triggered in testing, you do not know they work. Always run deliberate failure tests against every error path before the scenario handles production data.
The Candidate Experience Downstream
Every unhandled ATS sync failure has a candidate on the other end. A missed application write means a candidate’s resume never reaches the hiring team. A failed status update means an automated follow-up email never sends. A duplicate record means two hiring managers may independently reach the same candidate with conflicting next steps.
Forrester’s research on automation ROI consistently identifies candidate experience degradation as one of the hardest-to-quantify but most impactful costs of automation failures. Our sibling post on how error handling protects the candidate experience covers the downstream effects in detail.
The error architecture built in this guide is not just a technical safeguard — it is a direct input into the quality of the candidate experience your recruiting team delivers.
Next Steps: Monitoring and Continuous Improvement
A production ATS sync scenario with full error architecture requires ongoing monitoring — not daily intervention, but a structured weekly review of the error log. Look for patterns: the same module failing repeatedly, a specific ATS field generating consistent validation rejections, a destination API returning 429 errors at predictable times of day.
Patterns in the error log are the inputs for scenario optimization. A rate limit error that fires every morning at 9:00 AM indicates the scenario trigger time should shift to off-peak hours. A validation rejection on a specific field that appears three times per week indicates the ATS data entry process upstream needs a fix.
For structured monitoring workflows, see our guides on proactive error log monitoring for recruiting automation and error handling patterns for resilient HR automation.
The full strategic framework — covering how ATS sync error architecture fits into an enterprise-wide HR automation resilience program — is in our parent pillar on advanced error handling in Make.com™ HR automation.