How to Clean HR Data with RegEx in Make™: A Step-by-Step Guide
HR automation breaks at the data layer — not at the AI layer. Duplicate candidate records, misrouted resumes, payroll fields populated with raw free-text: every one of these failures traces back to a format problem that entered the pipeline unchecked. Regular Expressions (RegEx) inside Make™ scenarios fix that problem at the source, before dirty data can propagate downstream. This guide is the tactical complement to our parent pillar on Master Data Filtering and Mapping in Make™ for HR Automation — where the pillar covers the full data integrity framework, this satellite gives you the exact steps to implement RegEx-based cleaning inside your scenarios today.
Gartner estimates that poor data quality costs organizations an average of $12.9 million per year. MarTech’s 1-10-100 rule makes the math concrete: correcting a data error at entry costs $1; correcting it after it has propagated to downstream systems costs $10 to $100. RegEx at the intake stage is the cheapest data quality control available inside any automation platform.
Before You Start
Before building your first RegEx scenario, confirm the following:
- Make™ account with an active scenario: Any plan that supports text functions works. RegEx is available on all tiers.
- A defined data source: Know which module triggers your scenario — a form submission, a webhook from your ATS, a spreadsheet row, or an email parser output.
- A sample of real (or realistic) dirty data: You need actual messy values to test your patterns against. Pull 10–20 real records from your most inconsistent source.
- Five core RegEx tokens memorized:
\d(any digit),\s(any whitespace),.(any character),+(one or more),*(zero or more). Everything else builds on these. - Time estimate: A single field standardization step takes 10–20 minutes end-to-end including testing. A full five-field cleaning module takes 60–90 minutes the first time; less than 30 minutes once you have reusable patterns.
- Risk awareness: A RegEx pattern that is too broad can strip legitimate characters. Always test with a dedicated test scenario before activating on live data. UC Irvine research on attention and task-switching reinforces the cost of catching errors late — build verification into the workflow, not as an afterthought.
Step 1 — Map Your Dirtiest HR Fields Before Writing a Single Pattern
Identify the specific fields causing downstream errors before opening Make™. Fixing the wrong fields wastes build time and creates false confidence.
Open your ATS or HRIS export from the last 90 days and run a quick frequency analysis on five field types: phone numbers, email addresses, candidate names, job titles, and hire/application dates. Look for:
- Phone numbers with mixed delimiters:
(555) 867-5309,555.867.5309,5558675309 - Names in inverted format:
Doe, Janevs.Jane Doe - Email addresses with trailing spaces or uppercase domains:
Jane@Company.COM - Job titles with inconsistent capitalization or abbreviations:
Sr. Software Eng.vs.Senior Software Engineer - Dates in mixed ISO and regional formats:
2024-03-15vs.03/15/24
Rank these by frequency of error and downstream impact. Payroll-adjacent fields (hire date, compensation band) warrant immediate attention — a single format error here is the type of data-integrity failure that, as Harvard Business Review notes, makes machine learning and analytics tools unreliable. Document your priority list before moving to Step 2.
This mapping step also directly informs the mapping of resume data to ATS custom fields — clean source data is the prerequisite for reliable field mapping, not an optional enhancement.
Step 2 — Build a Dedicated Cleaning Module in Your Make™ Scenario
Create a separate logical block inside your scenario specifically for data cleaning. Do not mix cleaning logic with routing or API call modules — separation makes debugging faster and patterns reusable.
In your Make™ scenario, after your trigger module (the one that receives raw data), insert a Tools > Set Variable or Text Parser module for each field you are cleaning. Here is the pattern for each of the five priority field types:
Phone Numbers
Use Make™’s replace() function with the RegEx pattern [^\d] to strip every non-digit character from the raw phone string:
replace({{1.phone}}; /[^\d]/g; "")This converts (555) 867-5309, 555.867.5309, and 555-867-5309 to 5558675309 — a consistent digit-only string your downstream system can format to whatever standard it requires. For international numbers with a leading plus sign, use [^\d+] instead.
Email Addresses
Apply two sequential operations: trim() to remove leading/trailing whitespace, then lower() to normalize casing:
lower(trim({{1.email}}))If you need to validate that the result is a structurally valid email before passing it downstream, add a Filter condition using Make™’s built-in “Matches pattern” operator with the pattern ^[^@\s]+@[^@\s]+\.[^@\s]+$. Records that fail validation route to an alert branch rather than into your ATS.
Candidate Names
Inverted names (Doe, Jane) require extraction rather than substitution. Use match() to capture the two components:
match({{1.name}}; /^([^,]+),\s*(.+)$/)This returns an array where index [1] is the last name and [2] is the first name. Reconstruct as {{result[2]}} {{result[1]}} to produce Jane Doe. For names already in first-last format, the match returns empty — use a Router to handle both cases and apply a passthrough for correctly formatted names.
Job Titles
Abbreviation normalization requires a lookup approach rather than pure RegEx. Build a Make™ Data Store with a two-column table: raw abbreviation and standardized title. In your scenario, query the data store with the raw title and return the standardized version. Use RegEx at the filter layer to catch obvious patterns — Sr\. → “Senior”, Eng\. → “Engineer” — using chained replace() calls before the data store lookup as a first-pass clean.
Dates
Use Make™’s parseDate() function to normalize mixed date formats to ISO 8601:
formatDate(parseDate({{1.hire_date}}; "MM/DD/YY"); "YYYY-MM-DD")For ambiguous formats where both MM/DD/YY and DD/MM/YY are possible, apply a RegEx pre-check to determine which format is present before calling parseDate(). Route ambiguous records to a manual review queue rather than guessing.
For broader context on the Make™ modules that power HR data transformation, the sibling satellite covers the full module ecosystem these steps plug into.
Step 3 — Chain Your Cleaning Steps in the Correct Sequence
Sequence matters. Apply cleaning operations in this order to avoid one step undoing another:
- Trim whitespace first. Leading and trailing spaces break every subsequent pattern.
trim()every field before any RegEx operation. - Normalize casing. Apply
lower()orupper()before pattern matching so your RegEx patterns do not need case-insensitive flags on every expression. - Strip unwanted characters. Remove delimiters, punctuation, and special characters using
replace()with a character class pattern. - Extract or restructure. Use
match()to pull components from complex strings (inverted names, composite fields) after the field is trimmed and normalized. - Validate the result. Apply a filter with a “Matches pattern” condition to confirm the cleaned value meets your target format before passing it downstream.
- Branch on failure. Any record that fails validation routes to an alert — a Slack message, an email notification, or a row appended to a review spreadsheet — rather than silently entering your ATS with a blank or malformed value.
This sequence is the practical implementation of the essential Make™ filters for recruitment data covered in detail in the sibling listicle. Filters and RegEx are complementary — RegEx transforms the value; filters decide what happens based on the transformed result.
Proper error handling in Make™ workflows is the structural companion to this step — every validation failure in your cleaning module needs a defined error path, not an unhandled empty value.
Step 4 — Test with a Deliberately Malformed Sample Dataset
Testing with clean data tells you nothing. Build a test dataset that includes every format variant your patterns are designed to handle — plus at least three adversarial cases your pattern should catch and reject.
For phone number cleaning, your test set should include:
- A correctly formatted number:
5558675309 - A hyphenated number:
555-867-5309 - A number with parentheses and spaces:
(555) 867-5309 - An international number:
+44 20 7946 0958 - A completely invalid value:
N/A - An empty string
- A number with an accidental letter:
555-86O-5309(letter O instead of zero)
Run each record through your Make™ test scenario using the “Run once” function with manual input. Inspect the output of every module in the execution log. Confirm:
- Valid records reach the downstream module with correctly formatted values.
- Invalid records route to your alert branch — not to the downstream module with blank values.
- The adversarial cases (empty string, N/A, letter-for-digit substitution) trigger the correct error path.
This testing discipline prevents the class of errors Parseur’s Manual Data Entry Report quantifies at $28,500 per employee per year in rework cost — errors that RegEx was designed to prevent but which reappear when patterns are deployed untested.
Step 5 — Activate, Monitor, and Refine
After test validation, activate the scenario with a narrow scope: run it on a single day’s incoming records before expanding to full volume. Monitor the execution log for the first 48 hours, specifically watching:
- Alert branch volume: How many records are hitting the validation failure path? A rate above 5% suggests your source data has a format variant your patterns did not anticipate.
- Empty field outputs: Any field arriving downstream as empty when it should carry a value indicates a pattern that is too strict or a
match()that found no result. - Downstream system errors: Check your ATS and HRIS for import errors in the 48 hours following activation. Format-related import rejections confirm which fields still need pattern refinement.
When you identify a new format variant, add it to your test dataset first, adjust the pattern, re-run the full test set, and then update the live scenario. Never edit a pattern in a live scenario without first validating in test — the risk of a broad pattern accidentally stripping valid data from records already in the pipeline is real.
For data flows where compliance is a factor, review the GDPR-compliant data filtering with Make™ guidance alongside this cleaning implementation — RegEx standardization and privacy-by-design filtering operate on the same records and should be designed together.
How to Know It Worked
Three signals confirm your RegEx cleaning module is functioning correctly in production:
- Zero format-related ATS import errors in the 30 days following activation. If your ATS previously logged field validation rejections and those disappear after deploying your cleaning scenario, the patterns are working.
- Consistent field values in downstream analytics. Run a frequency distribution on the cleaned fields in your reporting tool. You should see a small number of standardized formats, not dozens of variants. A phone number field that previously had 40 format variants should show one or two after cleaning.
- Alert branch volume below 2%. A healthy cleaning scenario catches edge cases without being so strict that it rejects valid records. If your alert rate is higher than 2%, your patterns need tuning. If it drops to zero, verify that your validation step is actually running — a silent pass-through is not the same as a successful validation.
Common Mistakes and How to Avoid Them
Mistake 1 — Using Greedy Patterns Without Anchors
A pattern like .* matches everything including characters you want to preserve. Anchor your patterns with ^ (start) and $ (end) when matching entire field values, and use specific character classes rather than wildcards when extracting substrings.
Mistake 2 — Forgetting to Escape Special Characters
A period in RegEx matches any character unless escaped as \.. An email validation pattern that uses an unescaped period will accept namexdomain.com as valid. Audit every literal special character in your patterns for proper escaping before deployment.
Mistake 3 — Cleaning Without a Failure Path
A replace() that strips invalid characters and a match() that finds no result both produce outputs — an empty string or an empty array. Without a downstream filter checking for empty values, those outputs silently create blank fields in your ATS. Every cleaning step needs a paired validation check.
Mistake 4 — Over-Engineering the Pattern
A RegEx pattern that tries to handle every possible edge case in one expression becomes unmaintainable and fragile. Write simple, composable patterns — one for trimming, one for stripping, one for extracting — and chain them. Simpler patterns are easier to test, debug, and update when your data sources change.
Mistake 5 — Ignoring the Source System
If the same format inconsistency appears in every batch from a specific source — a particular job board, a resume parser, a form builder — the most efficient fix is upstream configuration, not downstream RegEx. Use RegEx as the last line of defense, not the first. Configure your source systems to enforce format constraints where possible, and let RegEx catch what slips through.
Next Steps
RegEx cleaning is the foundation that makes every downstream automation reliable. Once your five core fields are standardized, the clean HR data workflows for strategic decisions become executable — because the data feeding them is consistent. For teams also managing duplicate candidate records, the guide to filtering candidate duplicates in Make™ is the logical next implementation after standardizing field formats.
The OpsMap™ audit process we run at 4Spot Consulting identifies exactly which fields in your current HR tech stack are entering pipelines uncleaned. If your automation is producing inconsistent outputs despite correct routing logic, a data format problem is almost always the root cause — and a RegEx-based cleaning layer is almost always the fastest fix.




