
Post: Master HR AI: Key Data and Analytics Terms Defined
Master HR AI: Key Data and Analytics Terms Defined
HR leaders are being asked to evaluate, implement, and govern AI tools at a pace that outstrips most teams’ technical vocabulary. The consequence is predictable: expensive vendor decisions made on vague assurances, data quality problems discovered after go-live, and AI-driven outcomes that cannot be explained or defended. This glossary exists to close that gap.
These 15 terms are the vocabulary layer beneath every AI tool in your HR stack. They map directly to the AI and ML in HR transformation framework — the automation-first, AI-second sequence that separates sustained workforce strategy from expensive failed pilots. Learn them in order. Each one builds on the last.
What Is Artificial Intelligence (AI)?
Artificial intelligence is the broad capability of a computer system to simulate human decision-making — perceiving inputs, recognizing patterns, and generating outputs that would otherwise require human judgment.
In HR, AI is the umbrella term covering every “smart” tool in your stack: resume screening engines, candidate ranking systems, chatbots, attrition prediction models, and compensation benchmarking tools. AI does not mean the system thinks like a human — it means the system has been designed to approximate specific human judgments at speed and scale.
Why it matters: AI is a category, not a product. When a vendor says their platform uses “AI,” that tells you almost nothing about how it works, what it was trained on, or how it fails. The terms below decode the actual mechanism.
What Is Machine Learning (ML)?
Machine learning is a subset of artificial intelligence in which a system learns patterns from historical data and improves its predictions over time — without being manually reprogrammed for each new scenario.
Instead of following explicit rules (“if candidate has X degree, score them Y”), an ML model is trained on thousands of past hiring outcomes and learns to weight variables based on which combinations correlated with success. Over time, with more data, it recalibrates those weights.
In HR practice: ML powers attrition prediction, candidate ranking, time-to-fill forecasting, and personalized learning path recommendations. The critical governance question is always: what data was this model trained on, and does that data reflect the workforce patterns you want to replicate?
Related: See the guide on predicting and stopping high-risk employee turnover for a step-by-step application of ML-powered attrition models.
What Is a Large Language Model (LLM)?
A large language model is a type of machine learning model trained on massive text datasets — books, articles, web content, documentation — that can generate, summarize, classify, and translate natural language at scale.
LLMs are the technology behind generative AI tools: the tools that draft job descriptions, summarize performance reviews, generate candidate outreach emails, or produce policy summaries on demand. They do not retrieve stored facts — they generate probabilistic text based on learned patterns.
Why it matters: LLM outputs are confident-sounding but not always accurate. In HR, where job descriptions create legal exposure and policy language must be precise, human review of LLM-generated content is not optional. The tool accelerates drafting; the HR professional owns the output.
What Is Natural Language Processing (NLP)?
Natural language processing is the branch of AI that enables computers to understand, interpret, and generate human language — converting unstructured text into structured, analyzable signals.
NLP is the engine behind resume parsing (extracting skills and experience from free-text documents), candidate chatbots (interpreting questions and generating relevant answers), sentiment analysis (classifying open-text survey responses as positive, neutral, or negative), and job description optimization (flagging gender-coded or exclusionary language).
In HR practice: Most HR teams interact with NLP dozens of times per day without realizing it. Understanding NLP clarifies why resume parsers fail on non-standard formats, why chatbots misinterpret regional phrasing, and why sentiment scores need trend context rather than point-in-time readings.
What Is Structured vs. Unstructured Data?
Structured data lives in defined, queryable fields — employee ID, hire date, salary band, department code, performance rating. Unstructured data has no fixed schema — exit interview transcripts, performance review narratives, email threads, resume text, Slack messages.
The HR implication: Nearly every AI analytics tool requires structured data as input. Unstructured data must be converted — via NLP or manual tagging — before it can feed a model. Organizations that rely heavily on narrative performance reviews or free-text offer letters have a structural data problem that no AI vendor can solve without a cleanup effort first.
This is the foundational argument for automation-before-AI: structured workflows produce structured data. Structured data enables reliable AI. The sequence is not negotiable.
What Is Data Quality and Why Does It Govern Everything?
Data quality is the degree to which HR data is accurate, complete, consistent, and current across all source systems.
The 1-10-100 rule — validated by Labovitz and Chang and cited across quality management literature — quantifies the cost curve: it costs $1 to prevent a bad data record at entry, $10 to correct it after the fact, and $100 to remediate the downstream consequences. In HR, those downstream consequences include payroll errors, compliance exposure, incorrect analytics, and AI models trained on corrupted baselines.
Gartner research consistently identifies poor data quality as the leading cause of failed HR analytics initiatives — not model selection, not tool selection, not budget. Data quality.
Related: The key HR metrics tracked with AI guide details which data fields must be clean before each metric category becomes trustworthy.
What Is a Data Warehouse?
A data warehouse is a centralized, structured repository that consolidates HR data from multiple source systems — HRIS, ATS, payroll, LMS, benefits platforms, engagement survey tools — into a single, query-ready environment.
It is the infrastructure prerequisite for people analytics. Without a data warehouse (or equivalent integration layer), HR data lives in siloed systems that cannot speak to each other, making cross-functional analysis impossible and AI models unreliable.
Common confusion: A data lake stores raw, unprocessed data in its native format. A data warehouse stores cleaned, structured, purpose-organized data. Most AI analytics tools require a warehouse, not a lake, as their upstream data source.
What Is People Analytics?
People analytics is the discipline of applying quantitative data analysis to human capital decisions — using workforce data to inform hiring, retention, performance management, compensation, and workforce planning.
McKinsey research links mature people analytics programs to measurably better hiring quality and lower voluntary attrition rates compared to organizations relying on managerial intuition alone. Harvard Business Review research identifies people analytics as one of the highest-ROI HR investments available to mid-market and enterprise organizations.
People analytics is the strategic layer. The data warehouse is the infrastructure layer. ML is the prediction engine. All three must be in place for the system to function.
Related: The AI-powered workforce planning and talent forecasting guide shows how people analytics feeds long-range headcount strategy.
What Is Predictive Analytics?
Predictive analytics uses historical data, statistical algorithms, and machine learning to forecast the probability of future outcomes — answering the question “what is likely to happen next?”
In HR, predictive analytics answers questions like: Which employees have the highest probability of resigning in the next 90 days? Which candidates are most likely to succeed in this role based on past hire data? Which teams are trending toward burnout based on workload and engagement signals?
Predictive analytics produces a probability score or forecast. It does not prescribe action — that is the role of prescriptive analytics. The distinction matters because teams that conflate the two often stop at prediction without building the response playbook needed to act on it.
What Is Prescriptive Analytics?
Prescriptive analytics goes beyond forecasting what will happen to recommending what action to take in response — answering “given this prediction, what should we do?”
A predictive model might flag that an employee has a 78% attrition probability in the next quarter. A prescriptive layer would then recommend a specific intervention — a compensation adjustment, a lateral move, a manager conversation — based on what has historically worked for employees with similar profiles.
In HR practice: Most HR AI tools today operate at the predictive layer. Prescriptive capability requires more mature data infrastructure, a larger historical dataset, and a feedback loop that captures which interventions actually worked. It is the next frontier, not the current standard.
What Is Descriptive Analytics?
Descriptive analytics is the baseline layer of data analysis — summarizing what has already happened using historical data. Standard HR dashboards, headcount reports, time-to-fill averages, and turnover rate calculations are all descriptive analytics.
It is the foundation of the analytics maturity ladder: descriptive → diagnostic → predictive → prescriptive. Most HR teams are sophisticated at descriptive reporting. The strategic value gap opens when they attempt to move to diagnostic and predictive layers without the data infrastructure to support it.
The common mistake: Presenting descriptive analytics as strategic insight. A 14% annual turnover rate is a descriptor, not an insight. The insight is why that rate is 14%, which segments drive it, and what actions are likely to change it — which requires diagnostic and predictive capability.
What Is Diagnostic Analytics?
Diagnostic analytics answers “why did this happen?” by drilling into historical data to identify the root causes of observed patterns — the connective layer between descriptive reporting and predictive modeling.
In HR, diagnostic analytics might reveal that turnover spikes in Q2 correlate with specific manager assignments, that time-to-fill increases are concentrated in a single job family, or that engagement score declines precede performance dips by approximately six weeks. These causal insights are what transform HR from a reporting function into a strategic advisory one.
What Is Algorithmic Bias?
Algorithmic bias is the systematic production of unfair outcomes for specific demographic groups by an AI model — not because of explicit discrimination in the code, but because the historical training data encoded historical inequities.
A hiring model trained on ten years of promotion data will learn to favor the profiles that were historically promoted. If that history underrepresented women in senior roles or excluded candidates from certain zip codes, the model will reproduce and scale those patterns — faster and at greater volume than any individual human decision-maker could.
SHRM and Forrester research both flag algorithmic bias as a primary HR AI governance risk. HR leaders are legally and ethically accountable for discriminatory hiring outcomes regardless of whether the decision was made by a human or a model.
Related: The satellite on stopping bias in workforce analytics covers audit frameworks and bias testing protocols in detail.
What Is a Training Dataset?
A training dataset is the historical data used to teach a machine learning model what patterns to recognize and what outcomes to predict. The model’s performance ceiling is determined almost entirely by the quality, volume, recency, and representativeness of its training data.
In HR vendor evaluation, training dataset questions are among the most important due-diligence questions available. A vendor whose attrition model was trained exclusively on Fortune 500 tech company data will likely underperform in a regional healthcare or mid-market manufacturing context. Always ask: Who was in the training data? How recent is it? How was it labeled?
What Is an Algorithm?
An algorithm is a defined sequence of rules or instructions that a computer system follows to complete a task or reach a decision. In the context of HR AI, algorithms are the decision logic that rank candidates, weight performance signals, calculate engagement risk scores, or recommend learning content.
Algorithms are not neutral. They encode the assumptions, priorities, and trade-offs of the humans who designed them. Understanding that algorithms have authors — and that those authors made explicit choices about what to optimize — is the first step toward meaningful AI governance in HR.
What Is Sentiment Analysis?
Sentiment analysis is an NLP application that classifies text — survey open responses, exit interview notes, performance narratives, internal communication samples — as positive, neutral, or negative based on language patterns.
In HR, sentiment analysis allows teams to process thousands of open-text employee feedback items at scale without manual review of each response. Deloitte research identifies real-time employee sentiment monitoring as a key capability of high-performing people analytics organizations.
Limitation to flag: Sentiment scores are aggregate indicators, not individual diagnostics. A team-level sentiment decline is a signal to investigate, not a conclusion about individual employees. Using sentiment scores in individual performance or promotion decisions without corroborating data creates both fairness and legal risk.
Common Misconceptions About HR AI Terms
Misconception 1: “AI” and “automation” mean the same thing.
Automation executes defined, deterministic rules — if this, then that — without learning or adapting. AI learns from data and adjusts its outputs based on new information. A workflow that routes a new hire’s paperwork to the right approver is automation. A system that predicts which new hires are at risk of leaving in their first 90 days is AI. Both are valuable. They serve different functions and require different governance.
Misconception 2: More data always means better AI.
More data only improves AI if the additional data is relevant, accurate, and representative. More of the same bad data produces a more confident bad model. Data quality and data relevance outrank data volume in every serious analytics implementation.
Misconception 3: If the model produced the decision, HR is not responsible for it.
This is legally and operationally false. HR leaders are accountable for every hiring, promotion, and compensation decision made using AI tools deployed in their organization — regardless of which vendor built the model. AI shifts the mechanism of decision-making; it does not transfer accountability.
Misconception 4: Predictive analytics is only for large enterprises.
Parseur’s manual data entry research and APQC benchmarking both document that mid-market HR teams with clean, structured data can generate actionable predictive signals from relatively modest datasets. The barrier is not company size — it is data structure and process discipline.
Why Vocabulary Precedes Strategy
The workforce planning glossary of AI and HR terms and this glossary share a common premise: you cannot govern what you cannot name. HR leaders who can precisely distinguish ML from AI, predictive from prescriptive, and algorithmic bias from data error are categorically more effective in vendor evaluations, implementation governance, and C-suite communication.
This vocabulary is also the prerequisite for the broader transformation covered in AI and ML in HR — the automation-first, AI-second sequence that prevents the most common and most expensive HR technology failure mode: applying sophisticated AI tools on top of unstructured, manually maintained data and expecting strategic results.
Lay the vocabulary. Build the data foundation. Apply AI at the judgment points where deterministic rules break down. That sequence works. The glossary you just read is step one.
Next step: Understand how these terms connect to measurable outcomes in the guide to measuring HR ROI with AI and people analytics.