Intro
L1
Β·
Quiz
Β·
Lab
L2
Β·
Quiz
Β·
Lab
L3
Β·
Quiz
Β·
Lab
L4
Β·
Quiz
Β·
Lab
Module Test
Understanding AI Bias and Fairness Β· Introduction

The Machine That Decides

Algorithms already shape who gets hired, who gets a loan, and who gets flagged as a threat β€” this course asks whether they do so fairly.

When credit scoring arrived in the United States in the late 1950s β€” the Fair Isaac Corporation released its first scoring model in 1958 β€” it promised to replace the loan officer's gut feeling with objective mathematics. The promise was partly kept: lending did become more consistent. But the scores encoded the same residential segregation and employment discrimination already embedded in the data they were trained on. By the 1970s, consumer advocates were documenting how ZIP codes in redlined neighborhoods systematically produced lower scores regardless of individual repayment behavior. Congress responded with the Equal Credit Opportunity Act of 1974 and the Fair Credit Reporting Act, the first major legislative attempts to govern an automated decision system. The lesson was clear: a number that looks neutral can carry old prejudices forward at industrial scale.

That pattern is repeating now, faster and less visibly. Between 2014 and 2018, Amazon built and then quietly discarded a machine-learning hiring tool because it systematically downgraded rΓ©sumΓ©s that included the word "women's" β€” as in "women's chess club." In 2016, ProPublica published an investigation into COMPAS, a recidivism-prediction algorithm used in courtrooms across the United States, showing it falsely flagged Black defendants as future criminals at roughly twice the rate of white defendants. In 2019, researchers at UC Berkeley found that online mortgage lenders using algorithmic pricing charged Black and Latino borrowers about 11 basis points more than equally qualified white borrowers. The tools changed. The outcomes did not.

This course is not an argument that AI is broken beyond repair, nor that these systems should be abolished. It is a structured examination of how bias enters algorithmic systems, how it can be measured, and what technical and institutional tools exist to reduce it. You will encounter real documented cases, real mathematical definitions, and real trade-offs β€” because fairness in machine learning turns out to be not one thing but several, and they cannot all be satisfied simultaneously. That tension is where the most important work is happening, and it is where this course begins.

Understanding AI Bias and Fairness Β· Lesson 1

What Is Algorithmic Bias?

Defining the problem before we can measure or fix it.
How does a mathematical model β€” built from data, optimized for accuracy β€” end up systematically harming particular groups of people?

In 2015, a software engineer named Jacky AlcinΓ© noticed that Google Photos had tagged photos of him and his friend β€” both Black β€” as "gorillas." The label came not from a programmer's prejudice but from a neural network trained on images that dramatically underrepresented dark-skinned faces. Google's response was to remove the gorilla category from the classifier entirely β€” a fix that still held in 2023 when journalists re-tested the product. The underlying problem, insufficient representation in training data, was never solved. It was patched.

The Google Photos incident is memorable because it was viscerally offensive and easily photographed. But it represents only the most visible end of a wide spectrum of algorithmic bias β€” a spectrum that also includes quiet disparities in loan approval rates, healthcare resource allocation scores, and predictive policing heat maps that determine where officers are deployed.

Defining the Term

Algorithmic bias refers to systematic and repeatable errors in a computer system that create unfair outcomes for certain groups relative to others. The word "systematic" is doing important work here. Every predictive model makes errors. Bias is not about individual mistakes β€” it is about patterns of mistakes that fall disproportionately on people defined by race, gender, age, disability status, or other protected characteristics.

Three distinct meanings of "bias" collide in this field, and keeping them separate matters:

Statistical bias A formal property of an estimator: its expected value differs from the true value it is trying to estimate. A biased coin is predictably unfair. A biased estimator is systematically wrong in one direction.
Cognitive bias Systematic errors in human reasoning β€” confirmation bias, availability heuristic, anchoring. These often enter AI systems when human annotators label training data.
Societal / allocative bias Disproportionate harm to specific social groups produced by automated systems. This is what most people mean by "AI bias" in public discourse, and it is the primary focus of this course.

Where Bias Enters

Bias can enter an AI system at every stage of its development. The three most consequential entry points are data collection, problem framing, and feedback loops.

Data collection. Training data is not a neutral sample of reality. It reflects the world as it was recorded β€” by whom, with what instruments, for what purpose. ImageNet, the image dataset that powered the deep-learning revolution starting around 2009, was assembled largely from images tagged by English-speaking, US-based internet users. A 2019 study by Vinay Prabhu and Abeba Birhane found that ImageNet's person-category images dramatically overrepresented lighter-skinned, Western subjects. Models trained on it were correspondingly worse at tasks involving darker-skinned faces β€” not by design, but by data.

Problem framing. Before any data is collected, someone must decide what the algorithm is trying to predict. This choice embeds values. When Northpointe designed COMPAS in the 1990s, they defined "recidivism risk" as re-arrest within two years. Re-arrest and re-offending are not the same thing. Black defendants are arrested at higher rates for equivalent behavior due to differential policing. An algorithm trained on arrest data will therefore predict higher risk for Black defendants partly because policing patterns β€” not underlying behavior β€” produce more arrest records in those communities.

Feedback loops. When an algorithm's outputs influence future inputs, initial errors compound. Predictive policing tools deployed by the Santa Cruz, Chicago, and New Orleans police departments directed more officers to already over-policed areas, generating more arrests, which fed back into training data, which reinforced the model's belief that those areas required more policing. Santa Cruz became the first U.S. city to ban predictive policing software in June 2020, partly for this reason.

Documented Case

In 2019, a study published in Science by Ziad Obermeyer and colleagues found that a widely used healthcare algorithm β€” deployed by Optum and used to allocate care management resources for roughly 200 million people per year β€” systematically assigned lower risk scores to Black patients than to equally sick white patients. The algorithm used healthcare costs as a proxy for health need. Because Black patients had historically spent less on healthcare (due to access barriers, not lower need), the algorithm interpreted lower past costs as lower current need. The researchers estimated that correcting the bias would more than double the number of Black patients identified for extra care.

Bias vs. Error: Why the Distinction Matters

A common defense of algorithmic systems is that they are merely "as biased as their data" β€” implying the problem lies upstream with society, not with the system itself. This argument has some validity but misses the amplification effect. When a human loan officer holds a biased view, that view affects the people they personally review. When an algorithmic system encodes the same view, it affects every person processed through the system β€” at the speed of software, with no natural brake from fatigue or social accountability.

Scale changes the moral calculus. A biased algorithm deployed at national scale can cause more harm in a week than a biased human practitioner causes in a career. This is why researchers and regulators increasingly treat algorithmic bias as a distinct category of risk, not merely a technological reflection of social problems.

Key Takeaway

Algorithmic bias is systematic, repeatable, and scalable harm β€” it arises from data, design choices, and deployment context, not from malicious intent. The absence of intent does not reduce the harm. Lesson 2 will examine how researchers measure these disparities mathematically.

Lesson 1 Quiz

Four questions Β· Select the best answer Β· Immediate feedback
1. In 2015, Google Photos tagged photos of Black individuals as "gorillas." What was the primary technical cause of this error?
Correct. The model was trained on image data that skewed heavily toward lighter-skinned faces. The underrepresentation of darker-skinned individuals meant the model had poor feature representations for those inputs β€” a textbook case of representation bias in training data.
Not quite. The cause was a data problem, not intentional mislabeling or a model size issue. Google Photos' neural network learned from images that underrepresented dark-skinned people, leading to poor performance on those inputs.
2. The COMPAS recidivism tool defined its prediction target as "re-arrest within two years." Why is this definition itself a source of bias?
Correct. This is a problem-framing issue. Re-arrest is not a neutral proxy for re-offending because policing is not evenly distributed. Communities of color face more intensive policing, producing higher arrest rates for equivalent behavior. Training on arrest data encodes that disparity into the model's outputs.
The core issue is that re-arrest is a biased proxy for re-offending because policing intensity varies systematically by race and neighborhood. The model learns from arrest records, not from underlying behavior β€” so it inherits policing patterns.
3. Obermeyer et al. (2019) found that a healthcare algorithm systematically underestimated Black patients' health needs. The root cause was that the algorithm used past healthcare costs as a proxy for health need. This is an example of which type of bias entry point?
Correct. The decision to use healthcare cost as a stand-in for health need was a problem-framing choice. Because Black patients historically spent less on healthcare due to systemic access barriers β€” not lower need β€” the proxy was racially loaded before any data was collected.
This is a problem-framing issue. The designers chose healthcare cost as a proxy for health need. That choice embedded existing structural inequality into the model's objective before a single data point was processed.
4. Which statement best explains why scale changes the moral weight of algorithmic bias?
Correct. Scale is the key amplifier. A human with biased judgment affects the people they personally encounter. An algorithm encodes that same judgment and applies it uniformly, instantly, and repeatedly across entire populations β€” with no equivalent of social friction or individual reflection to slow the harm.
The critical point is about harm amplification, not detection ease or legal thresholds. An algorithm applies the same biased decision rule to every person it processes, simultaneously and at speed β€” which is fundamentally different from the localized impact of individual human bias.

Lab 1 β€” Identifying Bias Entry Points

Conversational AI exercise Β· Minimum 3 exchanges to complete

Your Task

In this lab you will examine a realistic scenario β€” a company deploying an AI hiring tool β€” and practice identifying where bias could enter at the data, framing, and feedback stages. Discuss the scenario with the AI assistant below. There are no trick questions; the goal is to think carefully and articulate your reasoning.

Scenario: A mid-size tech firm trains a rΓ©sumΓ© screening model on ten years of historical hiring data. The model predicts "likelihood of successful hire" based on which candidates were actually hired and promoted in the past. The team reports 94% accuracy on a held-out test set and declares the model ready for deployment. Where might bias have entered β€” and what questions would you ask before approving this system?
AI Lab Assistant
Bias Entry Points
Welcome to Lab 1. The scenario describes a hiring model trained on historical data with 94% accuracy. Before you approve it, you need to think carefully about what that accuracy number actually tells you β€” and what it doesn't. Start anywhere: What's your first concern about using ten years of past hiring decisions as training data?
Understanding AI Bias and Fairness Β· Lesson 2

Measuring Fairness: The Mathematical Definitions

Fairness is not one number. It is a family of constraints β€” and they conflict.
Can a single algorithm simultaneously give equal false positive rates, equal false negative rates, and equal positive predictive values across groups? The mathematics says no.

In May 2016, ProPublica published "Machine Bias," documenting that COMPAS assigned higher risk scores to Black defendants who did not re-offend and lower scores to white defendants who did. Northpointe, COMPAS's developer, responded two weeks later with their own analysis β€” arguing the tool was fair because it was equally accurate for Black and white defendants: both groups had roughly the same probability of re-offending when assigned a given risk score. Both claims were true simultaneously. They measured different things. This was not a dispute about facts; it was a dispute about which mathematical definition of fairness should govern a tool with life-altering consequences. The argument has never been fully resolved β€” not because the math is ambiguous, but because the choice between fairness criteria is a value judgment that mathematics alone cannot make.

The Core Metrics

All fairness metrics are built from the confusion matrix β€” the four-cell table tracking true positives, false positives, true negatives, and false negatives. The disagreement between ProPublica and Northpointe reduces to which cells of that matrix you require to be equal across demographic groups.

Demographic Parity The positive prediction rate is equal across groups. If the model approves 40% of Group A's applications, it must approve 40% of Group B's. Also called statistical parity. Requires no knowledge of actual outcomes.
Equal Opportunity Among people who actually qualify (true positives + false negatives), the rate of being correctly identified as qualifying is equal across groups. Focuses only on the beneficial outcome. Introduced by Hardt, Price, and Srebro in 2016.
Equalized Odds Both the true positive rate AND the false positive rate are equal across groups. Stronger than equal opportunity. ProPublica's central complaint about COMPAS was an equalized-odds violation: white and Black defendants had very different false positive rates.
Calibration / Predictive Parity Among people assigned a given risk score, the actual outcome rate is the same across groups. Northpointe argued COMPAS satisfied this. A score of 7 meant roughly the same actual recidivism probability regardless of race.
Individual Fairness Similar individuals should receive similar predictions. Formalized by Dwork et al. in 2012. Requires a task-specific similarity metric, which is often itself contested.

The Impossibility Results

In 2016, researchers Chouldechova and Kleinberg et al. independently proved that no algorithm can simultaneously satisfy calibration and equalized odds unless base rates β€” the actual prevalence of the outcome β€” are equal across groups. Because recidivism rates differed between demographic groups in the data COMPAS was trained on, satisfying one definition mathematically required violating the other. ProPublica and Northpointe were both correct β€” but they were measuring different things, and no single tool could satisfy both simultaneously.

This is not a limitation of COMPAS specifically. It is a mathematical fact about any binary classifier. It means that deploying a predictive system requires a prior decision about which errors are most costly and to whom β€” a decision that is fundamentally political and ethical, not statistical.

The Impossibility Theorem β€” Plain Statement

When base rates differ between groups, you cannot have all of: (1) equal false positive rates, (2) equal false negative rates, and (3) equal positive predictive value. Any two of these can be achieved, but achieving all three requires equal base rates β€” which the data often does not provide. The choice of which constraint to relax is a value judgment, not a technical one.

Which Metric Should You Use?

The answer depends on the decision domain and which errors cause more harm. In criminal justice, a false positive β€” predicting high risk for someone who will not re-offend β€” results in harsher bail conditions, longer sentences, or denied parole for an innocent person. That asymmetry argues for prioritizing equal false positive rates (an equalized-odds requirement). In medical screening, a false negative β€” missing a disease in someone who has it β€” may be more costly than a false positive. Equal opportunity (equal true positive rates) may be the appropriate constraint.

The EU's AI Act, which entered into force in August 2024, requires high-risk AI systems to be tested for bias before deployment and to document which fairness metrics were used and why β€” an implicit acknowledgment that the choice of metric is a consequential design decision, not a technicality.

Key Takeaway

Fairness cannot be reduced to a single number. Demographic parity, equalized odds, calibration, and individual fairness are all legitimate definitions that capture different moral intuitions β€” and they are mathematically incompatible when base rates differ across groups. Choosing which fairness constraint to prioritize is an ethical and political decision that precedes any technical implementation.

Lesson 2 Quiz

Four questions Β· Select the best answer Β· Immediate feedback
1. ProPublica argued COMPAS was unfair because Black defendants had higher false positive rates. Northpointe countered it was fair because the tool was equally calibrated across races. Which statement about this dispute is accurate?
Correct. This is the central lesson of the COMPAS controversy. Both teams ran valid analyses. They disagreed about which definition of fairness should apply β€” equalized odds vs. calibration β€” and the impossibility theorem shows those two criteria cannot both be satisfied when group base rates differ.
Both organizations were measuring real, mathematically valid properties of the tool. The conflict arose because they were using different definitions of "fair" β€” equalized odds vs. calibration β€” and those two definitions are mathematically incompatible when recidivism base rates differ between groups.
2. "Equal opportunity" as defined by Hardt, Price, and Srebro (2016) requires which condition?
Correct. Equal opportunity focuses on qualified individuals β€” those who belong to the positive class β€” and requires that the model identifies them at the same rate regardless of group membership. It does not constrain what happens among unqualified individuals.
Equal opportunity requires that among truly qualified individuals, the model's true positive rate is equal across demographic groups. Option B describes equalized odds (stronger). Option D describes calibration. Option A describes demographic parity.
3. The Chouldechova impossibility theorem states that a binary classifier cannot simultaneously satisfy calibration and equalized odds UNLESS which condition holds?
Correct. When base rates are equal across groups, the mathematical tension between calibration and equalized odds disappears. But when groups have different actual outcome rates β€” as was true for recidivism in the COMPAS data β€” satisfying one definition forces a violation of the other.
The theorem is about base rates β€” the actual frequency of the outcome in each group. When those frequencies are equal, both calibration and equalized odds can be satisfied simultaneously. When they differ, one must be sacrificed. Dataset size and model type are irrelevant to this result.
4. A medical screening algorithm is being evaluated for use in detecting a serious disease. In this context, which fairness criterion is most likely to be prioritized and why?
Correct. In medical screening, failing to detect a disease (false negative) is typically far more costly than a false alarm that leads to additional testing. Equal opportunity ensures that sick individuals from all demographic groups have the same chance of being correctly identified β€” which is the core equity concern in healthcare screening.
In medical screening, the most severe error is missing someone who is actually sick. Equal opportunity focuses on true positive rates β€” ensuring all sick individuals, regardless of group, are correctly identified at the same rate. The choice of fairness metric should reflect which type of error causes more harm.

Lab 2 β€” Fairness Criteria Trade-offs

Conversational AI exercise Β· Minimum 3 exchanges to complete

Your Task

You are advising a city government that wants to deploy a predictive algorithm to determine which residents qualify for a job-training subsidy program. The program has limited slots. Work through the fairness criteria with the assistant β€” decide which metric to prioritize and defend your reasoning.

Scenario: The algorithm must select 500 residents per month from 5,000 applicants. Historical data shows that Group A (lower-income residents) has a 60% completion rate for training programs, while Group B (higher-income residents) has a 75% completion rate. The city wants the algorithm to be "fair" but has not defined what that means. Which fairness criterion should govern selection, and what are the trade-offs?
AI Lab Assistant
Fairness Criteria
Good scenario to work through. Different base rates β€” 60% vs. 75% completion β€” mean you face the impossibility constraint we studied. Before choosing a criterion, tell me: what is this program actually trying to achieve? Is the goal to maximize total completions, or to ensure equitable access, or something else? Your answer determines which fairness metric is appropriate.
Understanding AI Bias and Fairness Β· Lesson 3

Bias in the Wild: Documented Cases by Domain

From hiring algorithms to facial recognition to healthcare β€” how bias appears differently depending on context.
What does bias actually look like when it leaves the research paper and enters a courtroom, a hiring platform, or a hospital?

In January 2020, Robert Julian-Borchak Williams was arrested at his home in Detroit while his daughters watched from the doorway. He was handcuffed, placed in a police car, and held overnight for a crime he did not commit. The identification that led to his arrest was made by a facial recognition algorithm. Detroit police had run a surveillance photo through a commercial system β€” later reported to be DataWorks Plus technology using an NEC algorithm β€” which returned a match to Williams's driver's license photo. A human investigator confirmed the match without conducting further verification. Williams was innocent; the charges were eventually dropped. He became the first documented case of a wrongful arrest caused by facial recognition in the United States, and the ACLU filed a complaint on his behalf in 2021.

Facial Recognition

The technical evidence for demographic disparities in facial recognition systems accumulated steadily before it reached public consciousness. In 2018, MIT researcher Joy Buolamwini and Timnit Gebru published "Gender Shades," testing commercial facial analysis APIs from IBM, Microsoft, and Face++ on a curated dataset of 1,270 faces with known gender labels. The systems achieved error rates below 1% on lighter-skinned male faces. For darker-skinned female faces, error rates reached 34.7% (IBM), 20.8% (Microsoft), and 34.5% (Face++). The disparity was not subtle.

The NIST Face Recognition Vendor Test, published in December 2019, examined 189 commercial facial recognition algorithms across a dataset of 18 million images. It found that many algorithms produced false positive rates 10 to 100 times higher for Black and Asian faces than for white faces when used for one-to-one verification. The systems that performed most poorly on these demographic groups were predominantly developed in the United States and Europe. Algorithms developed in China showed smaller disparities on East Asian faces β€” consistent with the hypothesis that the training data's demographic distribution drives performance gaps.

By 2020, three major cities β€” San Francisco (May 2019), Oakland (July 2019), and Boston (June 2020) β€” had banned government use of facial recognition technology entirely. In June 2020, IBM, Amazon, and Microsoft each announced they would halt or pause sales of facial recognition tools to law enforcement, citing accuracy disparities and the need for federal regulation.

Hiring Algorithms

Amazon's secret AI recruiting tool, reported by Reuters in October 2018, was built between 2014 and 2017 to automate the initial screening of job applications. The system was trained on rΓ©sumΓ©s submitted to Amazon over a ten-year period β€” a dataset overwhelmingly composed of male applicants, because the technology industry is male-dominated. The model learned to downgrade rΓ©sumΓ©s that included the word "women's" (as in "women's chess club") and to penalize graduates of two all-women's colleges. Amazon disbanded the team and scrapped the tool in 2017 after discovering these patterns. The tool was never used for actual hiring decisions, but its existence illustrated how training data composition directly shapes model outputs.

A 2019 study by the University of Washington and Princeton found that simply posting identical rΓ©sumΓ©s with stereotypically white names versus stereotypically Black names on a major job platform produced different callback rates β€” a discrimination pattern first documented in audit studies of human reviewers in 2004 by Marianne Bertrand and Sendhil Mullainathan, and now replicated in algorithmic systems that ostensibly removed human judgment from the process.

Documented Case β€” Healthcare

In 2021, researchers at Vanderbilt University Medical Center published findings that a commercial algorithm used to allocate dermatology clinic appointments systematically ranked Black patients as lower priority than white patients with equivalent clinical urgency. The algorithm used insurance type as one input β€” and because Black patients were more likely to hold Medicaid, which reimburses at lower rates, the tool effectively encoded a financial preference as a clinical judgment. The hospital discontinued use of the algorithm after the study was published.

Credit and Financial Services

Apple Card, launched in August 2019, attracted scrutiny in November of that year when David Heinemeier Hansson β€” the creator of Ruby on Rails β€” publicly reported that Apple Card's algorithm offered him a credit limit twenty times higher than his wife's, despite their filing taxes jointly and her having a higher credit score. Goldman Sachs, which issued the card, stated that gender was not used as an input. Investigators from the New York Department of Financial Services opened an inquiry. The case illustrated a persistent challenge: algorithms that do not explicitly use protected characteristics can still produce discriminatory outcomes through correlated variables. Gender was not in the model; proxies correlated with gender may have been.

The UC Berkeley study from 2019, examining 30 million mortgage records, found that algorithmic lenders β€” those using automated underwriting without human loan officers β€” charged Black and Latino borrowers approximately 11 basis points more than equally qualified white borrowers. This disparity translated to roughly $765 million in excess interest payments annually. The researchers concluded the disparity likely arose from algorithms trained on data reflecting existing wealth disparities, not from explicit race discrimination.

Key Takeaway

Algorithmic bias is not a single phenomenon. In facial recognition, it manifests as higher error rates for underrepresented groups. In hiring, it arises from historically skewed training populations. In healthcare, it appears through proxy variables that correlate financial factors with clinical priority. In credit, it surfaces through variables correlated with race even when race is excluded. Each domain requires domain-specific auditing methods and regulatory frameworks.

Lesson 3 Quiz

Four questions Β· Select the best answer Β· Immediate feedback
1. Joy Buolamwini and Timnit Gebru's "Gender Shades" study (2018) found that commercial facial analysis systems from IBM, Microsoft, and Face++ had dramatically higher error rates for which demographic group?
Correct. The "Gender Shades" study is a landmark piece of algorithmic auditing. The maximum error rate disparity was staggering: IBM's system hit 34.7% error for darker-skinned women versus less than 1% for lighter-skinned men. The cause was systematic underrepresentation of darker-skinned women in the training data.
The study found the highest error rates for darker-skinned women. Systems that worked nearly flawlessly on lighter-skinned male faces failed dramatically on darker-skinned female faces β€” a direct consequence of who was and wasn't well-represented in the training data these commercial systems used.
2. Amazon's AI recruiting tool was scrapped in 2017 after it was found to penalize rΓ©sumΓ©s containing the word "women's." What was the root cause of this behavior?
Correct. This is a straightforward case of training data composition bias. The model learned patterns from a decade of actual Amazon hiring β€” which disproportionately resulted in male hires. It then generalized those patterns to new rΓ©sumΓ©s, effectively learning that "female" signals were anti-predictive of what its training set defined as "success."
The cause was the training data, not intentional programming or a parsing failure. A decade of Amazon's actual hiring decisions reflected a male-dominated tech workforce. The model learned that male-associated rΓ©sumΓ© features correlated with the outcomes it was optimizing for β€” getting hired and promoted at Amazon.
3. The Apple Card credit limit controversy (2019) is significant partly because Goldman Sachs stated gender was not used as an input variable. What does this case illustrate about algorithmic discrimination?
Correct. This is the proxy variable problem β€” one of the most important concepts in AI fairness law. When a protected characteristic (gender) correlates with other variables in the model (spending patterns, account types, credit history), the model can reproduce discriminatory outcomes without ever explicitly "seeing" the protected variable.
The key lesson is about proxy variables. Removing gender from the model's inputs doesn't remove gender's influence if other included variables β€” like spending patterns, account types, or income sources β€” correlate with gender. The discrimination travels through the proxy, not through the explicit variable.
4. The 2019 NIST Face Recognition Vendor Test found that some algorithms produced false positive rates for Black and Asian faces that were how much higher than for white faces?
Correct. The NIST FRVT results were stark: not a modest gap but an order-of-magnitude disparity for many commercially deployed systems. A 100Γ— higher false positive rate means a system that incorrectly matches 1 in 10,000 white faces incorrectly matches 1 in 100 Black faces β€” an enormous difference in accuracy and consequential error in law enforcement applications.
The NIST FRVT found disparities ranging from 10 to 100 times higher false positive rates for Black and Asian faces compared to white faces. These were not marginal differences β€” they represented fundamental failures of these systems for certain demographic groups at the scale of law enforcement deployment.

Lab 3 β€” Auditing a Real System

Conversational AI exercise Β· Minimum 3 exchanges to complete

Your Task

You are conducting a bias audit of a facial recognition system proposed for use in a regional airport for access control to secure areas. The vendor has provided an accuracy report showing 97.5% overall accuracy on their test set. Work through the audit with the assistant β€” what questions do you ask, what additional data do you demand, and what would cause you to reject the system?

The vendor's report shows: Overall accuracy: 97.5%. Test set: 50,000 images. Demographics of test set: not disclosed. The vendor states "Our system performs consistently across all user types." They do not break out performance by demographic group. What are your next steps as an auditor?
AI Lab Assistant
Bias Auditing
97.5% overall accuracy sounds impressive, but overall accuracy can hide severe disparities at the group level β€” especially when the test set is not demographically balanced. Before we go further: what is the application here, and what is the cost of a false positive versus a false negative in this airport access control scenario? That will determine what data you need to demand from the vendor.
Understanding AI Bias and Fairness Β· Lesson 4

Mitigation Strategies and Regulatory Responses

What can actually be done β€” technically, organizationally, and legally.
If fairness criteria conflict and bias enters at multiple stages, what practical tools β€” and what governance structures β€” can actually reduce harm?

In March 2021, the U.S. Equal Employment Opportunity Commission launched an initiative specifically targeting algorithmic hiring tools, signaling that existing civil rights law β€” in particular, Title VII of the Civil Rights Act of 1964 and the concept of disparate impact β€” already applied to automated systems. The EEOC's 2022 technical assistance document on AI and disability discrimination stated plainly that employers cannot avoid liability by delegating a discriminatory decision to an algorithm. The defense that "the computer did it" had no legal standing. The regulatory landscape was catching up to the technology, even without new AI-specific legislation in the United States.

Technical Mitigation: Three Stages

Bias mitigation techniques are typically categorized by when in the machine learning pipeline they are applied: before training (pre-processing), during training (in-processing), or after training (post-processing).

Pre-processing Modifying the training data before the model sees it. Techniques include resampling (over-sampling underrepresented groups or under-sampling overrepresented ones), reweighting (assigning higher loss weight to errors on minority groups), and data augmentation. IBM's AI Fairness 360 toolkit, released in 2018, includes multiple pre-processing algorithms including "Reweighing" and "Optimized Preprocessing."
In-processing Adding fairness constraints directly to the model's optimization objective during training. Instead of minimizing only prediction error, the model simultaneously minimizes error disparity across groups. The "Adversarial Debiasing" method, also available in AI Fairness 360, trains a classifier and an adversary simultaneously β€” the adversary tries to predict demographic group from the model's predictions, and the classifier is penalized when the adversary succeeds.
Post-processing Adjusting the model's outputs after training to satisfy fairness constraints. The most common approach is threshold adjustment: using a different decision threshold for different demographic groups to equalize error rates. Hardt, Price, and Srebro's 2016 paper on equalized odds included a post-processing algorithm for achieving this. Critics note that group-specific thresholds may themselves violate anti-discrimination law in some jurisdictions.
Important Limit

All technical mitigation strategies face the impossibility constraints covered in Lesson 2. They can shift which fairness criterion is prioritized; they cannot satisfy all criteria simultaneously. Pre-processing methods also risk reducing overall model accuracy in exchange for more equitable error distribution. These trade-offs must be documented and justified β€” they are not automatically worth making.

Organizational Mitigation

Technical fixes applied to a poorly governed process tend to be undone by the next model update or data refresh. Organizational mitigation focuses on the institutional conditions that allow bias to accumulate and persist.

Algorithmic auditing. Independent third-party audits of AI systems have become a standard organizational practice in finance (where model risk management frameworks have existed since 2011 under OCC and Federal Reserve guidance) and are now required for high-risk AI systems under the EU AI Act. An audit examines training data composition, model performance across demographic segments, and whether documented fairness criteria match actual model behavior.

Diverse development teams. The "Gender Shades" finding that algorithms developed primarily by lighter-skinned engineers performed worst on darker-skinned faces is widely cited as evidence that team composition affects what developers notice and test. Google, after the 2015 Photos incident, began publishing annual diversity reports and publicly tracking progress on hiring. Whether team diversity alone is sufficient without structural changes to testing protocols is contested.

Impact assessments. Canada's Directive on Automated Decision-Making, which came into force in April 2019, requires federal government departments to complete an Algorithmic Impact Assessment before deploying any automated system. The assessment scores systems on risk level and mandates proportional oversight β€” including human review for high-impact decisions. The EU AI Act's risk classification system is built on a similar logic.

The Regulatory Landscape

The most comprehensive AI fairness regulation currently in force is the EU AI Act (August 2024). It classifies AI systems by risk tier. Systems used in employment decisions, credit scoring, law enforcement, and social benefits allocation are classified as "high risk" and must comply with requirements including bias testing on representative datasets, logging of system outputs for accountability, and human oversight mechanisms. Facial recognition in public spaces is largely prohibited outright for government use.

In the United States, no equivalent federal AI fairness law exists as of 2024, but the patchwork of existing civil rights law β€” Title VII, the Fair Housing Act, the Equal Credit Opportunity Act β€” applies to algorithmic systems under the disparate impact doctrine established in Griggs v. Duke Power Co. (1971). New York City's Local Law 144 (effective July 2023) requires employers using automated employment decision tools to commission independent bias audits and publish the results publicly. Illinois, California, and Colorado have passed similar but narrower requirements for specific sectors.

Key Takeaway

Bias mitigation requires action at the technical, organizational, and regulatory levels simultaneously. Technical methods can shift trade-offs but cannot eliminate them. Organizational practices β€” audits, diverse teams, impact assessments β€” create the conditions under which technical work has lasting effect. Regulation sets the floor. None of these is sufficient alone, and each requires explicit choices about which harms to prioritize reducing.

Lesson 4 Quiz

Four questions Β· Select the best answer Β· Immediate feedback
1. "Adversarial Debiasing" is an in-processing bias mitigation technique. How does it work?
Correct. Adversarial Debiasing uses a two-network setup: the main model learns to make accurate predictions while simultaneously making its outputs as uninformative as possible about demographic group membership. This in-processing approach incorporates the fairness constraint into the training objective itself, rather than fixing the data before or the outputs after.
Adversarial Debiasing is an in-processing technique. Options A and D describe pre-processing methods. Option B describes post-processing (threshold adjustment). The adversarial approach adds a second network that tries to infer demographic group from the main model's predictions β€” the main model is then trained to defeat the adversary, reducing demographic information in its outputs.
2. New York City's Local Law 144, effective July 2023, applies to which category of AI systems?
Correct. Local Law 144 specifically targets automated employment decision tools β€” software used to screen, rank, or select job applicants or assess current employees for promotion in New York City. Covered employers must commission independent annual bias audits and post the results publicly. It is the most specific AI fairness regulation at the municipal level in the United States.
Local Law 144 specifically covers automated employment decision tools β€” AI used in hiring and promotion decisions affecting NYC workers. It does not cover facial recognition, all large-employer AI, or government AI broadly. Its requirements are audit-and-disclose: the employer must have the tool audited by an independent party and post results publicly.
3. The EU AI Act classifies AI systems used in employment decisions, credit scoring, and law enforcement as "high risk." What does this classification require?
Correct. The EU AI Act's high-risk category imposes substantive obligations: pre-deployment testing (including on demographic subgroups), technical documentation, logging requirements so decisions can be reviewed, and human oversight so no fully automated decision is made without a human check. It does not require zero bias β€” an impossible standard β€” but requires demonstrated effort to identify and reduce it.
The EU AI Act takes a risk-proportionate approach. High-risk systems must meet specific technical and governance requirements: bias testing, output logging, and human oversight. Zero bias is not a legal standard (it is mathematically unachievable in most settings). The Act does not require open-sourcing model weights.
4. Canada's Directive on Automated Decision-Making (2019) introduced which mechanism for government AI systems?
Correct. Canada's Directive is notable for its risk-tiered approach: the higher a system's potential impact on individuals, the more oversight it requires. At the highest tier, human review of automated decisions is mandatory before they take effect. This framework became a model for the EU AI Act's risk classification system.
Canada's Directive on Automated Decision-Making requires departments to complete an Algorithmic Impact Assessment before deploying automated decision systems. The assessment generates a risk tier, and higher-tier systems require more oversight β€” including mandatory human review at the highest tier. It is a mandatory, not voluntary, framework.

Lab 4 β€” Designing a Mitigation Plan

Conversational AI exercise Β· Minimum 3 exchanges to complete

Your Task

You are the AI ethics officer at a regional bank. The bank's loan underwriting algorithm has been found β€” through an internal audit β€” to approve white applicants at a 12 percentage-point higher rate than equally creditworthy Black applicants. The CEO wants a mitigation plan within two weeks. Work through a structured response with the assistant below, covering technical, organizational, and regulatory dimensions.

The audit found: 12pp approval rate disparity between white and Black applicants with equivalent FICO scores and debt-to-income ratios. The model uses 47 input variables. Race is not an explicit variable but ZIP code, school attended, and employer type are included. The bank operates in three states with different AI fairness requirements. What is your mitigation plan?
AI Lab Assistant
Mitigation Planning
A 12 percentage-point disparity on equally creditworthy applicants is a serious finding β€” legally and ethically. Before we build the plan, I want to understand your constraints: Is this a two-week plan to present a mitigation roadmap, or does the CEO expect the disparity to be eliminated in two weeks? Those are very different problems, and being clear about that distinction matters for what you promise. What does the CEO actually need to see?

Module Test β€” What Is Algorithmic Bias

15 questions Β· 80% to pass Β· Covers all four lessons
1. Which of the following best defines "algorithmic bias" in the context of AI fairness?
Correct. The word "systematic" distinguishes bias from random error β€” biased systems produce patterns of mistakes that fall disproportionately on groups defined by protected characteristics.
Algorithmic bias is defined by its systematic, group-level pattern β€” not by individual errors, overfitting, or intent. The key distinguishing feature is that errors fall disproportionately on particular demographic groups.
2. The Google Photos "gorilla" labeling incident (2015) was primarily caused by which factor?
Correct. The model lacked sufficient representation of dark-skinned faces in training data, causing poor feature learning for that demographic group.
The cause was representation bias in the training data β€” not a bug, not intent, and not simply a size problem. Underrepresentation of darker-skinned individuals meant the model had poor learned representations for those faces.
3. Predictive policing tools that direct officers to already over-policed areas, generating more arrests that feed back into training data, illustrate which bias entry point?
Correct. When model outputs influence future training inputs, initial errors compound. This is a feedback loop β€” the algorithm's predictions change the world in ways that then validate those same predictions.
This is a feedback loop. The model's outputs (directing police deployment) alter the data-generating process (where arrests occur), which becomes future training data. The loop reinforces initial errors rather than correcting them.
4. Demographic parity requires which condition?
Correct. Demographic parity (also called statistical parity) requires that the same proportion of each group receives a positive prediction β€” without regard to whether those predictions are correct.
Demographic parity requires equal positive prediction rates across groups. Option A is equal opportunity. Option C is equalized odds. Option D is calibration. These are four distinct fairness criteria.
5. The Chouldechova impossibility theorem establishes that calibration and equalized odds cannot both be satisfied simultaneously unless which condition holds?
Correct. Equal base rates are the necessary and sufficient condition for simultaneous satisfaction of calibration and equalized odds. When groups have different actual outcome rates, these two criteria are mathematically incompatible.
The impossibility is a mathematical property that holds regardless of dataset size or model type. It disappears only when base rates are equal across groups β€” which the real-world data often does not provide.
6. ProPublica's 2016 investigation found that COMPAS had higher false positive rates for Black defendants. Northpointe countered that the tool was equally calibrated across races. Which conclusion is correct?
Correct. This is the core lesson of the COMPAS controversy. The impossibility theorem explains why both analyses could be simultaneously correct: ProPublica measured equalized odds (violated), Northpointe measured calibration (satisfied). With unequal base rates, you cannot have both.
Both analyses were valid β€” they measured different things. The impossibility theorem is the mathematical key: when recidivism base rates differ between racial groups (as they did in Florida's data), calibration and equalized odds are mutually exclusive. One organization cannot be "right" and the other "wrong" about a mathematical impossibility.
7. Joy Buolamwini and Timnit Gebru's "Gender Shades" study (2018) found the largest error rate disparities in commercial facial analysis APIs for which demographic intersection?
Correct. The error rate gap was not subtle β€” in some systems, over a third of darker-skinned women were misclassified, while lighter-skinned men were classified correctly more than 99% of the time. The cause was systematic underrepresentation in training data.
"Gender Shades" found the largest disparities for darker-skinned women β€” a double disadvantage from both the gender and skin-tone axes. IBM's system reached 34.7% error on this group; Microsoft's reached 20.8%. For lighter-skinned men, both systems were below 1% error.
8. The Obermeyer et al. (2019, Science) healthcare algorithm study found that a widely used system underestimated Black patients' health needs because it used healthcare costs as a proxy for health need. Why did this proxy produce racially biased results?
Correct. The proxy variable β€” cost β€” was structurally biased before the algorithm processed any individual. Existing healthcare access disparities caused Black patients to spend less for equivalent levels of illness, so the model learned to read "low past cost" as "low current need," compounding existing inequity.
The problem was a biased proxy variable. Healthcare cost reflects access and utilization patterns, not just underlying health status. Because systemic barriers reduced Black patients' healthcare utilization, their costs were lower β€” and the model mistook lower utilization for lower need. Race was not an explicit variable; cost was the proxy that carried racial disparity.
9. The NIST Face Recognition Vendor Test (December 2019) examined 189 algorithms and found that many produced false positive rates for Black and Asian faces that were how much higher than for white faces?
Correct. The NIST FRVT produced some of the most stark quantitative evidence of facial recognition bias: not a modest performance gap but an order-of-magnitude difference in false positive rates for many commercially deployed systems.
The NIST FRVT found 10 to 100 times higher false positive rates β€” not a slight or moderate difference, but a massive one. A system with a 1-in-10,000 false positive rate for white faces could have a 1-in-100 false positive rate for Black faces, with enormous consequences in law enforcement applications.
10. Amazon's AI recruiting tool, scrapped in 2017, penalized rΓ©sumΓ©s containing the word "women's." What category of bias entry point does this represent?
Correct. The model learned patterns from Amazon's actual historical hiring β€” which disproportionately resulted in male candidates being hired. It then generalized those patterns, treating female-associated rΓ©sumΓ© features as predictive of failure to be hired.
This is training data composition bias. The model was trained on what Amazon's historical hiring data defined as "success." Because that history was male-dominated, the model learned to devalue anything correlated with female identity. The problem originated in the data, not in the objective function or post-processing.
11. "Post-processing" bias mitigation refers to which technique?
Correct. Post-processing leaves the trained model unchanged and instead modifies how its outputs are translated into decisions β€” for example, by applying a lower decision threshold for a disadvantaged group to equalize their true positive rate with that of the advantaged group.
Post-processing operates on model outputs after training is complete. Option A describes pre-processing (reweighting). Option B describes in-processing (constrained optimization). Option D describes data augmentation (pre-processing). Post-processing typically means adjusting decision thresholds.
12. Which city became the first in the United States to ban government use of facial recognition technology?
Correct. San Francisco's Board of Supervisors voted in May 2019 to prohibit city agencies β€” including police β€” from using facial recognition technology. Oakland and Boston followed later in 2019 and 2020 respectively.
San Francisco was first, in May 2019. Oakland followed in July 2019 and Boston in June 2020. Detroit, notably, was the site of Robert Williams's wrongful arrest due to facial recognition β€” but did not ban the technology in 2020.
13. Canada's Directive on Automated Decision-Making (2019) introduced which governance mechanism for federal AI systems?
Correct. Canada's Directive is risk-tiered: lower-risk systems face lighter requirements, while the highest-risk tier requires human review of every automated decision before it takes effect. This framework influenced the EU AI Act's similar risk-classification approach.
The Directive requires a mandatory Algorithmic Impact Assessment β€” not voluntary, not a prohibition, not an open-source requirement. The assessment tiers systems by risk level and triggers proportional oversight requirements, including mandatory human review at the highest tier.
14. The Apple Card credit limit controversy (2019) illustrated which important concept in AI fairness?
Correct. Goldman Sachs confirmed gender was not a model input β€” but that does not prevent discrimination if included variables (spending patterns, account types, income sources) correlate with gender. This is the proxy variable problem: the protected characteristic's influence travels through correlated variables.
The key lesson is about proxy variables. Excluding a protected characteristic from explicit model inputs does not remove its influence when other included variables are correlated with it. The discrimination travels through the proxy, not through the variable itself.
15. The EU AI Act (August 2024) takes which approach to facial recognition in public spaces for government use?
Correct. The EU AI Act treats real-time biometric identification in public spaces as among the highest-risk AI applications and largely prohibits it for law enforcement use, with narrow exceptions (imminent terrorist threat, searching for missing children, etc.). This represents one of the most restrictive government AI provisions in any major jurisdiction.
The EU AI Act largely prohibits real-time facial recognition in public spaces for government use β€” this is a near-ban, not a regulation-with-audits approach. Narrow exceptions exist for specific defined security scenarios, but routine or widespread government facial recognition in public is prohibited.