Bias and Fairness Risk in Automated Decisions
The first time I sat with a hiring team that had a screening model, I asked them what “fair” meant for that model. The room went quiet. Three people gave three different answers. None of them were wrong, and none of them were compatible. That is the whole problem in one moment. Fairness is not a property you switch on; it is a choice you make with your eyes open, and the choice has costs.
Bias and fairness risk shows up in the places regulators look first: hiring, credit, healthcare triage, insurance pricing, benefits eligibility. The systems sit in the high-risk tier of the EU AI Act for a reason. When an automated decision says no, it says no at scale, and the people on the receiving end rarely get to argue the call.
Where bias actually enters the system
Most teams think bias is a model problem. It is occasionally a model problem. More often it is a problem upstream of the model that the model faithfully amplifies. Four entry points, in roughly the order I find them.
Training data. The historical record reflects who got hired, who got the loan, who got the diagnosis. If the record was biased, and it almost always was, the model learns that bias as ground truth. A resume screener trained on a decade of “successful hires” learns whatever the company actually hired for, which may not be what it claims to hire for.
Label definition. What does “good employee” mean for the label you trained on? “Stayed two years and got promoted once” already filters out anyone who left because the culture was hostile. The label is not neutral. It is a frozen opinion, and most teams pick the easiest one to extract.
Feature selection. Drop the protected characteristic and the model still finds it. Zip code maps to race in most US cities. First name maps to gender. School name maps to class. Removing the obvious column does not remove the signal; it makes the discrimination less legible. NIST calls these proxies, and proxies are why “we do not collect that data” is not a defence.
Deployment context. A credit model built on prime borrowers deployed in a near-prime channel rejects the new population at a different rate than it rejected the training population. The model did not change. The world it lives in did.
Three fairness definitions, you cannot have all three
Fairness has multiple mathematical definitions and they are mutually exclusive in any realistic setting. Three that matter:
- Equal opportunity (false negative parity). Among people who would have qualified, the model rejects them at the same rate across groups. Good when the cost of a missed yes is high.
- Demographic parity (equal selection rate). The model selects each group in proportion to its share of the applicant pool. Good when historical exclusion has shrunk who applies in the first place.
- Calibration (predicted probability matches outcome). A score of 0.7 means a 70% chance of the outcome, equally across groups. Good when the score is read downstream by a human and needs to mean the same thing for everyone.
The Kleinberg result, from a 2016 paper on risk scores in the US justice system, proved that if base rates differ between groups, you cannot satisfy all three at once. Pick two and you violate the third. There is no clever architecture that escapes this. It is a property of the problem, not a property of your model.
So “fair” stops being a single target and becomes a choice that depends on the context. A hiring screener probably wants equal opportunity. A risk score used by a judge probably wants calibration. A scholarship allocation might want demographic parity. State which one you picked and why, in writing, before you ship.
How to measure, not just claim
The honest version of bias testing is unglamorous and mostly arithmetic.
Two practical notes. The disparate impact ratio (the four-fifths rule the EEOC has applied since 1978) flags a selection rate for the protected group below 80% of the rate for the favoured group. It is blunt and it is still the legal floor in the US for employment decisions. Below 0.8, you have a problem you need to explain.
You cannot test bias if you do not have the protected attribute on the evaluation set. “We deleted gender so we cannot measure gender bias” is not a defence; it is a recipe for shipping bias you refuse to look at. Most jurisdictions allow you to collect protected attributes specifically for fairness testing under a separate legal basis. Use it.
Mitigation lives in three places, not one
When the numbers come back ugly, there are three places to intervene, and the right answer is usually a mix.
Pre-processing. Fix the data. Re-balance the training set, re-weight examples, audit the label. Cheapest if you catch it early, hardest to retrofit.
In-processing. Add a fairness constraint to the loss function so training penalises the disparity you care about. Cleaner in theory, requires you to have committed to a definition first.
Post-processing. Adjust the threshold per group so the deployed decision rate hits the parity target. The most common and the most contested, because the threshold is visibly different across groups. The discomfort is the point: you are making the trade-off explicit instead of hiding it inside a single global cutoff that produces a disparate outcome.
None of these are one-time fixes. The model drifts, the population shifts, the labels age. Bias testing is a quarterly job, not a launch checklist.
What the EU AI Act actually asks for
Article 10 of the EU AI Act on data governance for high-risk systems is where the bias requirements live. The text asks for training, validation, and test sets that are “relevant, sufficiently representative” and as far as possible “free of errors and complete.” It asks for examination of “possible biases” likely to affect health, safety, or fundamental rights, and measures to “detect, prevent and mitigate” them. It explicitly permits processing special-category personal data where strictly necessary to detect and correct bias.
Translated to the governance documentation you have to produce: the fairness definition you picked, the groups you tested against, results per group, the mitigation choices and their trade-offs, the residual risk you accepted in writing, the cadence for re-testing. The Act does not tell you which definition to pick. It does tell you to pick one, defend it, and show your work.
The honest close
I have stopped describing fairness work as solving bias. It is closer to managing it, the same way an AI risk program manages model drift or third-party exposure. You will not get a clean number. You will get a defensible trade-off, documented and revisited on a cadence. The teams that get hurt are the ones who shipped without picking a definition. The teams that hold up under scrutiny can show a regulator the choice they made, the cost of that choice, and the proof they are still watching.


