
Post: How to Clean HR Data with RegEx in Make: A Step-by-Step Guide
RegEx patterns inside Make scenarios strip dirty characters, normalize phone formats, fix name inversions, and standardize dates before bad data reaches your ATS or payroll system. Map your highest-error fields first, write one pattern per field, test against real records, and deploy inside a dedicated cleaning module — before any data touches a downstream app.
HR automation breaks at the data layer — not at the AI layer. Duplicate candidate records, misrouted resumes, payroll fields populated with raw free-text: every one of those failures traces back to a format problem that entered the pipeline unchecked. Regular Expressions (RegEx) inside Make scenarios fix that problem at the source, before dirty data propagates downstream. This post is the tactical complement to the parent pillar on Master Data Filtering and Mapping in Make for HR Automation — where the pillar covers the full data integrity framework, this satellite gives you the exact steps to implement RegEx-based cleaning inside your scenarios today.
Gartner estimates that poor data quality costs organizations an average of $12.9 million per year. The MarTech 1-10-100 rule makes that concrete: correcting a data error at entry costs $1; correcting it after it propagates to downstream systems costs $10 to $100. RegEx at the intake stage is the cheapest data quality control available inside any automation platform.
Before You Start
Confirm the following before building your first RegEx scenario:
- Make account with an active scenario. RegEx is available on all tiers — any plan that supports text functions works.
- A defined data source. Know which module triggers your scenario — a form submission, a webhook from your ATS, a spreadsheet row, or an email parser output.
- A sample of real dirty data. Pull 10–20 actual messy records from your most inconsistent source. You need real values to test patterns against, not invented examples.
- Five core RegEx tokens.
\d(any digit),\s(any whitespace),.(any character),+(one or more),*(zero or more). Everything else builds on these five. - Time budget. A single field standardization step takes 10–20 minutes including testing. A five-field cleaning module takes 60–90 minutes the first time; under 30 minutes once you have reusable patterns.
- A test scenario. A RegEx pattern that is too broad strips legitimate characters. Always test in a dedicated test scenario before activating on live data. UC Irvine research on attention and task-switching documents the cost of catching errors late — build verification into the workflow from the start, not as an afterthought.
Step 1 — Map Your Dirtiest HR Fields Before Writing a Single Pattern
Identify the specific fields causing downstream errors before opening Make. Fixing the wrong fields wastes build time and creates false confidence in a scenario that is still letting bad data through.
Open your ATS or HRIS export from the last 90 days and run a frequency analysis on five field types: phone numbers, email addresses, candidate names, job titles, and hire or application dates. Look for:
- Phone numbers with mixed delimiters:
(555) 867-5309,555.867.5309,5558675309 - Names in inverted format:
Doe, Janevs.Jane Doe - Email addresses with trailing spaces or uppercase domains:
Jane@Company.COM - Job titles with inconsistent capitalization or abbreviations:
Sr. Software Eng.vs.Senior Software Engineer - Dates in mixed ISO and regional formats:
2024-03-15vs.03/15/24
Rank fields by error frequency and downstream impact. Payroll-adjacent fields — hire date, compensation band — warrant immediate attention. A single format error there creates reconciliation work that compounds every pay period. Names and email addresses break CRM deduplication logic and generate duplicate records that require manual merging later.
Document your findings in a simple table: field name, example of dirty input, target clean format, and frequency score. That table is your build spec for Steps 2 through 5.
Step 2 — Write a RegEx Pattern for Each Target Field
Write one pattern per field. A single RegEx that attempts to handle all five field types simultaneously becomes unmaintainable and breaks without obvious error messages.
Phone Numbers
To strip all non-numeric characters and produce a 10-digit string:
replace({{trigger.phone}}; /[^0-9]/g; "")
This removes parentheses, dashes, dots, and spaces in a single pass. The output is always 5558675309. If your downstream system expects formatted output — (555) 867-5309 — add a second replace call to reinsert delimiters at positions 3 and 6.
Email Addresses
To trim trailing whitespace and force lowercase:
lower(trim({{trigger.email}}))
For cases where upstream sources inject non-standard characters:
replace(lower(trim({{trigger.email}})); /[^a-z0-9@._\-]/g; "")
Candidate Names (Inverted Format)
Run trim({{trigger.name}}) first to remove surrounding whitespace. Then check for a comma using contains({{trigger.name}}; ",") inside a Make router before applying a split-and-rejoin transformation — this prevents false positives on names that do not follow the Last, First convention.
Job Titles
Abbreviation normalization is a lookup problem, not a RegEx problem. Build a Make data store or Google Sheets reference table mapping Sr. → Senior, Eng. → Engineer, Mgr. → Manager. Apply trim(lower()) before the lookup runs so the match is case-insensitive.
Dates
Make’s native parseDate and formatDate functions handle most ISO and regional format conversions without RegEx. Use RegEx only to strip non-date characters that prevent parseDate from recognizing the input:
replace({{trigger.hire_date}}; /[^0-9\/\-]/g; "")
Then pass the cleaned string into parseDate with the appropriate format token.
Step 3 — Build a Dedicated Cleaning Module in Make
Cleaning logic belongs in its own module — not scattered across action steps. A dedicated text transformer module keeps the scenario readable and makes the cleaning logic easy to update without touching downstream logic.
In Make, use a Set Variable or Tools > Set Multiple Variables module immediately after your trigger. Name each variable with a clean_ prefix — clean_phone, clean_email, clean_name — so every downstream module references the cleaned version, never raw trigger output.
Module naming matters for maintenance. Label this module Clean: Normalize Intake Fields — not “Tools 1” or “Set Variables.” When a pattern breaks six months from now, anyone debugging the scenario finds the right module in under ten seconds.
Add a note to the module listing the last-updated date and which fields it handles. That note becomes the audit trail for pattern changes.
Step 4 — Test Against Real Records Before Activating
Run the cleaning module against your 10–20 sample records from Step 1 in a test scenario with a manual trigger. Check for three failure modes:
- Over-stripping. The pattern removes characters that belong in the output — for example, a RegEx targeting non-alphanumeric characters strips the hyphen from a hyphenated last name like
Smith-Jones. - Under-stripping. The pattern misses a dirty variant not present in your sample. Common with phone numbers that include country codes (
+1) or extensions (x204). - Format drift. The output matches your target format but fails downstream validation — for example, an email field that accepts lowercase only rejects a value because a trim step ran before the lowercase conversion instead of after.
Log every failure. Adjust the pattern. Re-run the full sample set. Do not move to Step 5 until zero failures remain across all records.
Step 5 — Deploy the Cleaning Module Into Your Live Scenario
Copy the tested cleaning module from your test scenario into the live scenario using Make’s copy-paste module feature — do not rebuild from scratch. Small transcription errors in RegEx patterns produce silent failures that are difficult to diagnose under production load.
Place the cleaning module immediately after the trigger and before any router, filter, or action step. Every downstream path in the scenario references clean_ variables, never raw trigger output.
After deploying, run a five-record live test using real but non-production data if your environment allows it. Confirm each output field matches the expected format. Enable the scenario for full production traffic only after that confirmation passes.
Quick-Reference: Five RegEx Patterns for HR Fields
| Field | Pattern | What It Does |
|---|---|---|
| Phone — strip non-digits | /[^0-9]/g |
Removes dashes, dots, parentheses, spaces |
| Email — strip invalid chars | /[^a-z0-9@._\-]/g |
Use after lower(trim()) |
| Date — strip non-date chars | /[^0-9\/\-]/g |
Pre-processes before parseDate |
| Name — trim whitespace | trim() |
Removes leading and trailing spaces |
| Free text — strip line breaks | /[\r\n]+/g |
Collapses multi-line fields to a single line |
What to Build Next
RegEx cleaning handles format normalization. The next layer is structural validation — confirming that a field contains a valid email pattern, not just lowercase characters, or that a hire date falls within an acceptable range before the record reaches payroll. That logic lives in Make’s filter and router modules, covered in the Master Data Filtering and Mapping pillar.
For HR teams running multiple intake sources — ATS webhooks, form submissions, spreadsheet imports — a cleaning module per source type scales better than a single universal pattern set. Each source has different dirty-data signatures, so matching the cleaning logic to the source keeps patterns tight and failures traceable. See how non-technical HR teams are building and maintaining these scenarios directly in Make without developer support.
If you are mapping which HR processes to automate before writing a single scenario, the OpsMap™ discovery process identifies the highest-leverage starting points — so cleaning effort goes toward fields that are actively blocking downstream value, not fields that feel messy but rarely cause errors in production.

