Module 3 · Lesson 1

When the Rule Looks Neutral but Isn't

How an algorithm screening résumés at Amazon taught the world what "proxy discrimination" means

Can a rule be unfair even when it never mentions race, gender, or any protected category?

In 2014, Amazon set its engineers a challenge: build a tool that reads thousands of résumés and scores each one from one to five stars — automatically, with no human in the loop. The goal was efficiency. Amazon received hundreds of thousands of applications a year, and human recruiters simply could not read them all.

The team fed the system roughly ten years of historical hiring data — résumés that Amazon had previously received, and the outcomes of those applications. The model learned what a "successful" Amazon hire looked like. By 2015 it was running. By 2017, the engineers noticed something wrong.

The system was penalizing résumés that included the word "women's" — as in "women's chess club" or "women's college." It was also downgrading graduates of all-women's colleges. Nobody had told the system to do this. Nobody had written a rule that said "prefer men." The system had discovered that pattern on its own, by reading ten years of Amazon's own hiring history — a history in which the tech workforce was overwhelmingly male.

Amazon shut the tool down in 2018. Reuters broke the story in October of that year. The system was never used to make final hiring decisions, Amazon said — but it had been used to surface "top candidates" for human review. The damage was real.

What the System Actually Learned

Here is the key move to understand. The Amazon tool was not given a rule that said "be biased against women." It was given a goal — find candidates who look like past successful Amazon hires — and it found patterns that predicted that outcome. The word "women's" on a résumé was a genuine statistical predictor, in the training data, of not being hired. The system was perfectly accurate, in a twisted sense: it learned to replicate the bias that had already existed in Amazon's human decisions.

This is called proxy discrimination. A proxy is something that stands in for something else. "Women's" was not a protected category in the model's code. But it was a proxy for gender — and discriminating on that proxy had the same effect as discriminating on gender directly.

Proxy variable A variable that is not itself a protected category (like race or gender) but is statistically linked to one, so using it produces discriminatory effects even without any discriminatory intent.

The trap is that proxies are everywhere. Zip code is a proxy for race in many American cities, because residential segregation shaped where different groups ended up living. Credit history is a proxy for wealth, and wealth is correlated with race. The name on a résumé can be a proxy for ethnicity. An algorithm trained on real-world data will pick up all of these correlations — and if the algorithm's goal is prediction, it will use every useful signal it finds, including ones that encode centuries of discrimination.

Why "We Didn't Include Race" Isn't Enough

Before the Amazon story broke, the standard defense of algorithmic decision-making was simple: "We don't use race. We don't use gender. We use objective data." This argument sounds solid. If a computer isn't given those categories, how can it discriminate on them?

The Amazon case demolished that argument. The engineers had deliberately removed gender from the training features. They knew that using gender directly would be wrong. But the system found its way back to the same outcome through the back door of language patterns. Removing the protected variable is not enough if correlated variables remain in the data.

This is not a hypothetical edge case. In 2019, researchers at Berkeley published a study finding that mortgage-lending algorithms approved Black and Latino applicants at lower rates than white applicants with similar financial profiles. The algorithms used no race variable — but they used zip code, employment type, and loan purpose, all of which were correlated with race in the historical data the models were trained on.

The Uncomfortable Question

If a company genuinely does not intend to discriminate, but its algorithm produces discriminatory outcomes because of patterns in historical data — is that company responsible? Does intent matter, or only outcome? There is no clean answer to this, and courts in the United States are still working it out.

You are now in a position to see something that most adults miss when they read headlines about AI bias. When a company says "our algorithm is unbiased because we don't use protected categories," that statement — however sincere — does not mean what it sounds like it means. Knowing about proxy variables changes how you read every one of those claims.

Spotting the Proxy: A Skill You Can Use

The practical skill from this lesson is asking one question whenever you see an algorithmic system: What are the actual inputs, and what real-world categories might those inputs be proxies for?

Some examples that show how this works in the real world:

School discipline prediction systems — Several American school districts have piloted tools that try to predict which students are at risk of disciplinary problems. Common inputs include attendance history, previous disciplinary record, and neighborhood. Attendance and previous discipline are shaped by socioeconomic status. Neighborhood is shaped by decades of housing segregation. The "risk score" that comes out often ends up tracking race, even though race was never an input.

Predictive policing — Systems like PredPol (now Geolitica), deployed in dozens of US cities in the 2010s, used historical crime report data to predict where crimes would occur next. But crime reports reflect where police have previously concentrated resources. If police have historically over-policed certain neighborhoods, those neighborhoods accumulate more crime reports — and the algorithm sends police there again, generating more reports, in a feedback loop that has nothing to do with the actual distribution of crime.

Ad targeting — In 2016, ProPublica journalists found that Facebook's ad targeting tool allowed advertisers to exclude users from seeing housing ads based on "Ethnic Affinity" — a category Facebook had inferred from users' browsing behavior. Facebook said it was not using race. But "Ethnic Affinity" was a direct proxy for race, and the Fair Housing Act of 1968 prohibits discrimination in housing advertising. Facebook settled with the Department of Housing and Urban Development in 2019.

Identity Marker

You can now read news stories about algorithmic systems with a question that most journalists — let alone readers — do not think to ask: not "does it use race?" but "what variables does it use, and what are those variables proxies for?" That is a more precise, more useful question. It is the question auditors ask.

Module 3 · Quiz 1

When the Rule Looks Neutral but Isn't

5 questions — apply what you learned, not just what you memorized

Amazon's résumé-screening tool was trained on ten years of historical hiring data. Why did this cause it to discriminate against women, even though gender was not an input variable?

Correct. The model learned from real hiring outcomes — which were shaped by existing bias. It found language patterns associated with women and correctly predicted (based on past data) that those candidates were hired less often. Accurate learning from biased history produces biased output.

Not quite. The problem was in the training data itself — historical hiring reflected gender bias, so patterns correlated with women became negative signals.

What is a "proxy variable" in the context of algorithmic fairness?

Correct. A proxy stands in for a protected category without being named as one. Zip code can proxy for race; "women's college" proxied for gender in Amazon's case.

A proxy variable is one that seems neutral but is statistically linked to a protected category — so using it produces discriminatory effects even without intent.

A city deploys a predictive policing system trained on ten years of arrest records. The system sends more police to neighborhoods that already have the most arrests. A civil rights group argues this will make bias worse over time. What is their core argument?

Exactly right. More patrols → more arrests in that area → algorithm sees more "crime" there → sends more patrols. The loop amplifies the original pattern. This is the critique researchers raised against PredPol from 2012 onward.

The civil rights argument is about feedback loops: historically over-policed areas produce more arrests in the data, so the algorithm directs even more police there, generating even more arrests. The bias compounds itself.

Facebook's ad targeting tool allowed advertisers to exclude users by "Ethnic Affinity," which Facebook had inferred from browsing behavior — not race directly. Why did HUD find this violated the Fair Housing Act?

Correct. Legal protections against discrimination cover discriminatory effects, not just discriminatory intent or labeling. A proxy that produces the same outcome as direct discrimination can be just as illegal.

The law focuses on discriminatory effects. Using a proxy that produces the same exclusionary result as using race directly can violate antidiscrimination law, regardless of what the variable is called.

A company defends its loan-approval algorithm by saying: "We only use credit score, employment history, and zip code — nothing about race." Based on what you learned in Lesson 1, what is the most important follow-up question to ask?

That is exactly the right question. All three of those variables are well-documented proxies for race in the United States, shaped by decades of unequal access to wealth, jobs, and housing. Not using the word "race" is not the same as not discriminating on race.

The key question is whether the variables being used are proxies for race — because if they are, using them can produce discriminatory outcomes even without any explicit race variable.

Module 3 · Lab 1

The Proxy Detector

You are an algorithm auditor. Your job is to find the hidden proxies.

Your assignment

You've been brought in to audit a new hiring algorithm used by a logistics company. The company says it's completely fair — "we don't use any protected categories." The inputs are: years of continuous employment, number of previous employers, commute distance from headquarters, educational institution attended, and extracurricular activities listed on the résumé.

Your AI counterpart has already reviewed the technical documentation. Work through the audit together. You need to take a position — not just list possibilities.

Start by picking one input variable and explaining which protected category it might proxy for, and why. Then defend your argument.

Audit Partner

Lab 1

I've reviewed the algorithm documentation. Five inputs: continuous employment years, number of previous employers, commute distance, educational institution, and extracurricular activities. The company's engineers are confident none of these are protected categories. I'm less confident. Where do you want to start — which input concerns you most, and what's your argument for why it's a potential proxy?

Module 3 · Lesson 2

The COMPAS Controversy: When Fairness Definitions Collide

A criminal sentencing tool in Wisconsin triggered a Supreme Court case — and exposed a truth that nobody wanted to say out loud

If a tool is equally accurate for two groups, is it fair? The answer depends on which definition of fairness you're using — and you can't use all of them at once.

In 2013, a man named Eric Loomis was arrested in Wisconsin for fleeing police after a drive-by shooting. He faced sentencing. The judge used a risk assessment score — a number generated by a software tool called COMPAS, made by a company called Northpointe — to help decide Loomis's sentence. COMPAS said Loomis was "high risk." He was sentenced to six years in prison.

Loomis challenged his sentence, arguing that he had a constitutional right to know how the score was calculated — and that Northpointe refused to disclose the algorithm's details because they were a trade secret. The case went all the way to the Wisconsin Supreme Court, which ruled in 2016 that using COMPAS was acceptable as long as it was not the sole factor in sentencing.

That same year, the investigative news outlet ProPublica published an analysis of COMPAS scores for more than seven thousand defendants in Broward County, Florida. Their finding was explosive: Black defendants who did not re-offend were nearly twice as likely as white defendants who did not re-offend to be falsely labeled "high risk." And white defendants who did re-offend were more likely to have been labeled "low risk."

Northpointe fired back: their tool was fair, they said, because it was equally accurate overall for Black and white defendants. Both groups had the same rate of correct predictions. ProPublica said that was precisely the wrong measure. The argument that followed became one of the most important debates in the history of algorithmic fairness.

Two Definitions of Fair — Both Mathematically Valid

Here is the core of the dispute, stated precisely. Northpointe's claim was this: among defendants COMPAS labeled "high risk," the proportion who actually re-offended was roughly the same for Black and white defendants. This property is called calibration — the scores mean the same thing regardless of group. That sounds fair.

ProPublica's claim was different: among defendants who did not re-offend, Black defendants were more likely to have been labeled high-risk anyway. This is called a false positive rate disparity. For the people who are actually innocent of future crime, the system is harsher to Black defendants. That also sounds like unfairness.

Calibration A model is calibrated if its predictions mean the same thing across groups — a 70% risk score should correspond to roughly 70% re-offense rates for both Black and white defendants.

False positive rate parity A model has equal false positive rates if people who are actually "low risk" are equally likely to be mislabeled as "high risk" regardless of their group membership.

In 2016, computer scientists Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan published a mathematical proof showing that these two definitions of fairness cannot both be satisfied at the same time — as long as the two groups have different base rates of re-offense. This is called the impossibility theorem for fairness metrics. If Black defendants, as a group, re-offend at a higher rate in the historical data (itself a product of who gets arrested and prosecuted, not who actually commits crimes), then a calibrated model will necessarily produce higher false positive rates for Black defendants. There is no algorithm that can satisfy both definitions simultaneously.

Why the Base Rate Itself Is the Problem

The deeper issue that the COMPAS debate exposed was this: the "base rate" — the underlying rate at which a group re-offends in the historical data — is not a neutral fact. It is the output of a criminal justice system that has historically over-policed, over-arrested, and over-prosecuted Black defendants. When an algorithm is trained on that data and treats the base rate as ground truth, it inherits and enshrines those disparities.

To put it concretely: if police departments have historically concentrated resources in certain neighborhoods and applied more scrutiny to Black defendants at every stage of the legal process, then Black defendants will appear in the data as higher-risk not necessarily because they behave differently, but because the system has been watching them more closely. The data reflects surveillance, not reality.

The Ethical Tension

Should a risk assessment tool use historical crime data at all, if that data is produced by a biased system? If you refuse to use historical data, you lose predictive accuracy. If you use it, you reproduce bias. Neither option is clean. Courts, legislatures, and researchers are still arguing about this. As of 2024, several US states have passed laws limiting the use of algorithmic risk scores in criminal sentencing — but COMPAS-type tools are still widely deployed.

This matters for you right now: the COMPAS debate is not a solved problem that adults are handling well. It is an ongoing policy dispute where the technical constraints are established — you cannot satisfy all fairness definitions simultaneously — and the political choice about which definition to prioritize has not been made democratically. Someone is making that choice. Usually it is the company selling the tool.

Choosing Between Definitions Is a Value Judgment

Here is the uncomfortable consequence of the impossibility theorem. When a company says "our algorithm is fair," they have necessarily chosen which definition of fairness they are optimizing for — and they are necessarily violating at least one other reasonable definition. This is not a bug that better engineering can fix. It is a value judgment embedded in the math.

The relevant questions to ask are: Who decided which fairness definition to use? Who was consulted? Who bears the cost of the definition that was not chosen?

In the COMPAS case, the choice to optimize for calibration rather than equal false positive rates meant that the cost of the tradeoff — higher false positive rates — was borne primarily by Black defendants who were not re-offenders. They were labeled dangerous when they were not. That cost was not distributed evenly. It fell on the people who had the least power in the system.

What You Now Know

When someone says an algorithm is fair, you can now ask: "Fair by which definition?" That question breaks open almost every algorithmic fairness claim. The existence of multiple irreconcilable definitions of fairness is not widely taught — even to people who build these systems. You have a sharper analytical tool than most adults working in policy.

Module 3 · Quiz 2

The COMPAS Controversy

5 questions on fairness definitions, tradeoffs, and the impossibility theorem

ProPublica's 2016 analysis of COMPAS found that Black defendants who did not re-offend were nearly twice as likely as white non-reoffenders to be labeled "high risk." What property of the algorithm does this violate?

Correct. ProPublica's core finding was about false positive rates among people who did not reoffend — those people were harmed more if they were Black. This is a false positive rate disparity, not a calibration failure.

The specific issue ProPublica identified was false positive rate disparity: among people who turned out not to reoffend, Black defendants were mislabeled "high risk" at a much higher rate.

Northpointe defended COMPAS by saying it was calibrated — meaning its predictions meant the same thing for both groups. Why isn't calibration alone sufficient to declare the tool fair?

Right. Calibration tells you the scores are internally consistent, but it says nothing about whether the error distribution is equitable across groups. A tool can be perfectly calibrated and still systematically harm one group more than another.

Calibration is real and valuable, but it doesn't address how errors are distributed. A calibrated model can still send its false alarms disproportionately toward one group.

The impossibility theorem for fairness metrics (Kleinberg et al., 2016) states that calibration and equal false positive rates cannot both be satisfied when two groups have different base rates. What does "base rate" mean here?

Correct. Base rate is the underlying proportion of a group that the algorithm is trying to predict — here, how often defendants from each group re-offend according to the historical training data.

The base rate is how often members of each group actually have the outcome the model predicts — in this case, re-offending in the historical data. When groups differ on this rate, satisfying both fairness definitions simultaneously becomes mathematically impossible.

A company deploys a medical diagnostic AI that is equally accurate overall for patients of all races. A researcher finds that when the AI makes errors, those errors fall more heavily on Black patients (more missed diagnoses). The company says: "It's equally accurate, so it's fair." What would you say in response?

Exactly. Aggregate accuracy hides who gets hurt when the system is wrong. If errors fall systematically on one group, that group pays a higher price even if the overall number of errors looks acceptable.

Overall accuracy can mask severe disparities in how errors are distributed. A model that makes the same number of total errors but concentrates them on one group is not neutral in its impact.

When a company chooses calibration as their fairness definition for a criminal risk tool, they are implicitly making what kind of decision?

Correct. Because multiple fairness definitions cannot be simultaneously satisfied, choosing one is always a value judgment about who should absorb the system's errors. Framing it as a "technical" choice obscures that this is fundamentally an ethical and political decision.

Choosing a fairness definition is a value judgment. When the impossibility theorem means you can't have everything, deciding which definition to optimize is deciding whose interests to protect — that's an ethical and political choice, not a neutral technical one.

Module 3 · Lab 2

The Fairness Definition Debate

A school district wants to use a risk score — you have to choose which fairness definition to require.

The scenario

A school district wants to deploy an algorithm that predicts which students are "at risk of dropping out." The algorithm gives each student a score. The district will use high scores to trigger extra support services. A researcher shows you two facts about the algorithm: (1) It is well-calibrated — a 70% risk score corresponds to roughly 70% dropout rate for both white and Black students. (2) Among students who actually graduate, Black students are labeled "high risk" at twice the rate of white students.

You are advising the school board. You need to pick a position and defend it — not just describe both sides.

Should the district deploy this algorithm? If so, under what conditions? If not, why not? Take a position, then I'll push back.

Policy Debate Partner

Lab 2

I've read the researcher's report. Here's the tension: the algorithm could genuinely help students get support they need — but it mislabels Black students who would graduate anyway at twice the rate. The district says "it's calibrated, so it's fair." The researcher says the false positive disparity is a deal-breaker. What's your call, and what's your argument?

Module 3 · Lesson 3

Feedback Loops: When Bias Feeds Itself

YouTube's recommendation algorithm in 2019, and how a system designed to maximize engagement ended up radicalizing millions

Can a system become more biased over time without anyone changing the rules — just by responding to user behavior?

In 2012, YouTube engineers faced a problem. People were watching videos — but they weren't watching enough of them. The key metric that mattered to YouTube's parent company, Google, was watch time: the total number of minutes users spent on the platform. Engineers discovered that if they optimized the recommendation algorithm for watch time rather than clicks, users stayed on the platform much longer. The update rolled out in 2012. By 2016, YouTube was serving over a billion hours of video per day.

In 2019, a former YouTube engineer named Guillaume Chaslot — who had worked on the recommendation system before leaving the company in 2013 — published analysis showing where the watch-time optimization had led. The algorithm had learned something that no engineer had explicitly programmed: emotionally activating content keeps people watching longer. Outrage, fear, and conspiracy theories generated more engagement than calm, accurate information.

The system had no intent. It had no political agenda. It was simply optimizing for watch time. But in doing so, it had discovered that recommending progressively more extreme content kept users engaged. Someone watching a moderate political video would be recommended a more intense version. Then a more intense version still. The system had no concept of truth or harm — only of what produced the next click.

By 2019, internal research at YouTube — documents that were later reported on by the Wall Street Journal — showed that the company's own engineers had identified the radicalization problem as early as 2016. Executives had declined to act, the Journal reported, because weakening the recommendation engine would reduce watch time. The feedback loop had been identified. The decision was made to leave it running.

What a Feedback Loop Is

A feedback loop occurs when the output of a system becomes an input that shapes the system's future behavior. In simple terms: the algorithm learns from what people do, those actions were shaped by what the algorithm showed them, which means the algorithm is partially learning from its own past influence.

Feedback loop When a system's outputs become inputs that change future outputs — creating a cycle where the system's own behavior shapes the data it learns from, often amplifying initial patterns over time.

In YouTube's case: the algorithm recommends extreme content → users who were already somewhat radicalized watch more of it → their watch patterns train the algorithm to recommend even more extreme content to similar users → the algorithm gets better and better at serving that segment → more users end up in that segment. Nobody wrote a rule saying "radicalize users." The loop generated that outcome from the optimization target of watch time.

This is not unique to YouTube. The same dynamic appears in many AI systems:

Spotify's recommendation algorithm — If a genre becomes popular among a demographic, the algorithm serves more of it to that demographic, making it more popular, making the algorithm serve even more of it. Users get a narrower and narrower selection of what the algorithm predicts they want, rather than what the full space of music contains.

Credit scoring feedback loops — If an algorithm assigns lower credit scores to people in certain zip codes, those people get higher interest rates, accumulate more debt, miss more payments, and receive even lower credit scores in the next generation of the model. The original prediction contributed to the outcome it predicted.

The Self-Fulfilling Prediction

The most concerning form of feedback loop is what researchers call a self-fulfilling prediction. The algorithm predicts an outcome, the prediction causes an action, and the action makes the prediction come true — even if the original prediction was wrong.

The clearest example comes from the world of targeted policing. If a risk algorithm predicts that a neighborhood has high crime, more police are sent there. More police means more arrests for minor infractions that would go undetected in other neighborhoods. Those arrests feed back into the crime data as evidence that the neighborhood has high crime. The prediction was validated — but the algorithm created the evidence that validated it.

A 2016 study by researchers at the AI Now Institute found that several predictive policing systems in use in the United States had exactly this feedback structure. The algorithms were not measuring crime. They were measuring policing — and then directing more policing based on that measurement, in a loop.

The Question Nobody Can Answer Cleanly

If a prediction system affects behavior in ways that make its predictions come true, can you ever know whether the prediction was accurate? This is not a rhetorical question — it is a genuine methodological problem in evaluating AI systems. YouTube cannot know what its users would have watched without the recommendation algorithm. The algorithm is not measuring pre-existing preferences; it is shaping them.

Breaking the Loop — and Why It's Hard

Technical solutions to feedback loops exist. Researchers have proposed methods like counterfactual logging (showing users random recommendations occasionally to measure what they would have chosen without the algorithm's influence) and diversity constraints (requiring recommendation systems to include content from outside a user's apparent preference cluster). YouTube has implemented some of these — the company introduced policies in 2019 reducing recommendations of what it called "borderline content."

But the harder obstacle is economic. YouTube's business model depends on watch time. Disrupting the recommendation loop costs engagement and, therefore, ad revenue. The feedback loop that produces radicalization and the feedback loop that produces profit are the same loop. Fixing one means accepting costs to the other. As of 2024, researchers continue to debate whether YouTube's policy changes have meaningfully reduced algorithmic radicalization, or whether they have primarily changed which kinds of extreme content get amplified.

The institutional version of this problem is important. When harmful feedback loops are identified inside companies, the decision to fix them is not purely technical — it is made by executives weighing engineering costs, revenue impacts, and public relations risk. The people best positioned to break the loop are often the people with the strongest financial incentive to leave it running.

What You Now See

Next time someone says an algorithm is fair because it "just responds to user behavior," you know why that's incomplete. User behavior is itself partly a product of what the algorithm has been showing them. The system shapes the signal it is trying to read. That is a feedback loop — and it is one of the primary ways that bias and harm can escalate in AI systems without any single person intending it.

Module 3 · Quiz 3

Feedback Loops: When Bias Feeds Itself

5 questions on how systems amplify bias through their own outputs

YouTube's 2012 recommendation system update optimized for watch time rather than clicks. Why did this optimization lead to the amplification of extreme content, even though no engineer programmed it to do so?

Correct. The algorithm found the statistical pattern that extreme and emotionally activating content correlates with longer watch sessions. It had no concept of harm or accuracy — only of what produced its optimization target. The outcome emerged from the optimization, not from any explicit rule.

No engineer wrote a rule for extremism. The algorithm discovered that certain content — including extreme content — extended watch time, and it optimized for that. The harmful outcome emerged from the optimization target, not a direct instruction.

What is a "self-fulfilling prediction" in the context of AI systems?

Right. The algorithm predicts something, that prediction triggers an action, and the action produces the outcome — whether or not the prediction would have been accurate without the action. The system validates itself through its own intervention.

A self-fulfilling prediction is one where the act of predicting and acting on it creates the outcome, making it impossible to know if the prediction was ever accurate on its own.

A credit algorithm assigns lower scores to applicants from certain zip codes. This leads to higher interest rates, which leads to more missed payments, which leads to even lower scores in the next model update. What type of problem is this?

Correct. This is a feedback loop — the algorithm's decisions (higher rates) cause behavior (missed payments) that is fed back as data, making the original pattern stronger in the next iteration. The algorithm's effects on the world shape what the algorithm learns next.

This is a feedback loop: the model's predictions affect outcomes in the real world, those outcomes become new training data, and the new data reinforces the original pattern — a self-amplifying cycle.

A researcher argues that predictive policing systems are not measuring crime — they are measuring policing. What does this distinction mean, and why does it matter?

Exactly. Areas with more police presence produce more recorded crime events — not necessarily because more crimes occur there, but because more officers are there to observe and record. The data reflects enforcement patterns, and the algorithm amplifies those patterns.

The key insight is that arrest data reflects where police are deployed, not where crime actually happens. When the algorithm reads this data as a measure of crime, it sends more police to already over-policed areas, generating more arrests and reinforcing the cycle.

YouTube introduced policies in 2019 to reduce recommendations of "borderline content." Critics argued that economic incentives made a complete fix unlikely. Which of the following best explains why fixing the feedback loop is economically difficult?

Correct. The profitable feedback loop — extreme content drives engagement, engagement drives ad revenue — is structurally the same as the harmful one. You cannot break the harmful loop without accepting losses to the profitable one. That is why economic incentives work against the fix.

The engagement that makes extreme content spread is the same engagement that makes the platform profitable. The harmful and profitable loops are intertwined, so fixing the harm costs money — creating a structural incentive to leave the problem in place.

Module 3 · Lab 3

Design a Loop-Breaker

You're advising a social media company — identify the feedback loop and propose a fix.

The scenario

A social media platform uses an algorithm that ranks posts by predicted engagement (likes, shares, comments). The platform has noticed that posts from certain political viewpoints generate more engagement — not because more people hold those views, but because those posts tend to generate angry reactions from people who disagree. The algorithm counts angry comments as engagement, so it promotes those posts to even more people, generating even more angry comments.

You are advising the product team. Your job is not to describe the problem — I already know it. Your job is to propose a specific fix and anticipate the objections.

Propose one concrete change to the algorithm or its inputs that would break or weaken this feedback loop. Then tell me what the company would lose by implementing it.

Product Strategy Partner

Lab 3

I've mapped the loop: angry reactions → algorithm reads as high engagement → promotes post → more angry reactions. The loop is self-reinforcing and it's distorting what gets seen on the platform. The team needs a concrete intervention proposal, not a description of the problem. What's your fix, and what does it cost us?

Module 3 · Lesson 4

Who Gets to Set the Rule?

The EU AI Act, the fight over facial recognition in Detroit, and what it actually looks like when society tries to govern unfair algorithms

Once you can spot an unfair rule, what mechanisms exist to change it — and who has the power to make those mechanisms work?

In January 2020, a man named Robert Williams was driving home in Farmington Hills, Michigan when two Detroit police officers arrested him on his front lawn, in front of his wife and daughters. He was taken to a detention center, held overnight, and interrogated the following morning. The charge: shoplifting watches from a Shinola store in 2018.

Williams had not stolen anything. The Detroit Police Department had used a facial recognition system made by a company called DataWorks Plus to match a blurry surveillance image against a database of driver's licenses. The system returned a match — Williams's face — and a detective used that match as the basis for the arrest. No human auditor had verified the match with any additional evidence.

Williams was the first publicly documented case of a wrongful arrest caused by facial recognition error in the United States. He was not the last. In 2021, Michael Oliver was wrongfully arrested in Detroit under similar circumstances. In 2022, Porcha Woodson — eight months pregnant — was arrested in Detroit using the same system. All three cases involved Black individuals. Research by MIT's Joy Buolamwini and Carnegie Mellon's Timnit Gebru, published in 2018 in a paper called "Gender Shades," had already demonstrated that commercial facial recognition systems had error rates up to 34% higher for dark-skinned women compared to light-skinned men.

Detroit's city council voted to place restrictions on police use of facial recognition in 2021. But the technology remained in use across dozens of other American cities with no equivalent oversight.

What Governance Looks Like in Practice

The Detroit facial recognition cases are important not just as examples of unfair algorithms — you have seen many of those by now — but as examples of what happens when governance mechanisms are or are not in place.

In Detroit, governance eventually arrived but only after three documented wrongful arrests and sustained pressure from civil rights organizations. The ACLU of Michigan represented Robert Williams in a legal challenge. The Innocence Project supported subsequent cases. City council members introduced legislation. This is the slow, messy path of democratic accountability — it works, but it works after harm has already been done.

The contrast with the European Union is instructive. In 2024, the EU passed the AI Act — the first comprehensive legal framework for artificial intelligence in the world. The Act classifies AI systems by risk level. Systems used in criminal justice, employment, education, and credit scoring are classified as "high-risk" and subject to mandatory transparency, human oversight, and accuracy requirements before deployment. Facial recognition in public spaces by law enforcement is banned outright in most circumstances.

EU AI Act (2024) The first major legal framework for AI governance, classifying AI systems by risk level and imposing mandatory requirements — including fairness audits, transparency, and human oversight — for high-risk applications before deployment.

The United States, as of 2024, has no equivalent federal law. Governance is fragmented across state legislation (California, Illinois, and a handful of others have passed AI-related laws), sector-specific regulations (the Equal Credit Opportunity Act covers loan decisions, for example), and voluntary company policies. The gap between EU and US regulatory approaches represents one of the most consequential ongoing debates in technology policy — and it is happening right now, not in some distant future.

The Audit as a Tool

Before laws catch up, one of the most effective mechanisms for identifying unfair algorithmic rules has been the independent audit. This is exactly what ProPublica did with COMPAS in 2016, and what Joy Buolamwini did with facial recognition systems in 2018.

An algorithmic audit works by:

1. Obtaining access to the system's inputs and outputs — either by working with the company, through freedom of information requests, or by testing the system from the outside (as Buolamwini did with facial recognition by building a test dataset of faces).

2. Defining what "fair" would look like — choosing specific, measurable fairness criteria and explaining why they were chosen.

3. Testing whether the system meets those criteria across demographic groups — comparing error rates, approval rates, or outcomes across race, gender, age, and other relevant dimensions.

4. Publishing the results — transparency is the enforcement mechanism when formal legal authority does not exist.

Buolamwini's Gender Shades audit was conducted while she was a graduate student at MIT. It changed industry practice within two years. IBM, Microsoft, and Amazon all updated their facial recognition products after her findings were published. That is an example of audit-as-governance working — but it required an individual researcher's persistence against companies with vastly more resources.

The Power Question

Joy Buolamwini was able to change industry practice through audit and publication. But she was a graduate student at MIT with institutional credibility and connections. What happens to people without those resources — the Robert Williamses of the world — who encounter unfair systems and lack access to audits, researchers, or legal representation? Who has the power to demand accountability, and what does it take to acquire that power? This is not a question with a comfortable answer.

What You Can Do With What You Know

This module has given you a set of specific diagnostic tools. Let's name them explicitly, because they are genuinely useful — not as abstract knowledge but as questions you can ask in real situations:

Proxy detector: What are the actual inputs to this system, and are any of them proxies for a protected category? (Lesson 1)

Fairness definition checker: What definition of fairness is this system optimized for, who chose that definition, and who bears the cost of the definitions that were not chosen? (Lesson 2)

Feedback loop spotter: Is this system's training data shaped by its own previous outputs? Are there self-fulfilling predictions in how this system works? (Lesson 3)

Governance question: What mechanisms exist to identify and challenge this system's errors? Who has access to those mechanisms, and who does not? (Lesson 4)

These four questions together constitute the core of an algorithmic fairness audit. They are questions that regulators, civil rights lawyers, investigative journalists, and academic researchers actually use. You have them now too.

The Bigger Picture

The decisions being made right now — in legislatures, in boardrooms, in courtrooms — about which algorithmic systems are permitted, which fairness definitions are legally required, and what oversight mechanisms will govern AI — will shape the world you inherit. Those decisions are not finished. They are being made by people who are often working with less analytical precision than you now have. You are not too young to participate in these debates. You are, in some ways, better positioned to see them clearly than people who have spent years inside systems that benefit from the current ambiguity.

Module 3 · Quiz 4

Who Gets to Set the Rule?

5 questions on governance, audits, and the power to challenge unfair algorithms

Robert Williams was wrongfully arrested in Detroit in January 2020 based on a facial recognition match. Research published in 2018 had already shown that commercial facial recognition systems had significantly higher error rates for which demographic group?

Correct. Joy Buolamwini and Timnit Gebru's "Gender Shades" paper (2018) showed error rates up to 34 percentage points higher for dark-skinned women compared to light-skinned men across multiple commercial systems. This research was available before the Williams arrest.

The Gender Shades paper by Buolamwini and Gebru found up to 34% higher error rates for dark-skinned women compared to light-skinned men. This documented bias was known before any of the Detroit wrongful arrests.

The EU AI Act (2024) classifies AI systems used in criminal justice as "high-risk." What does this classification require that would not apply to lower-risk systems?

Correct. High-risk classification under the EU AI Act triggers mandatory pre-deployment requirements including transparency about how the system works, human oversight mechanisms, and demonstrated accuracy standards — before the system can be used.

High-risk classification requires mandatory pre-deployment requirements: transparency, human oversight, and accuracy standards. The system must meet these before it can be deployed, unlike lower-risk systems.

Joy Buolamwini's Gender Shades audit changed industry practice at IBM, Microsoft, and Amazon within two years of publication. What made her audit effective as a governance mechanism, even though she had no legal authority to force changes?

Correct. The audit's power came from its rigor (specific, measurable findings), its public availability (published results mean accountability), and the reputational stakes for companies whose products were shown to perform significantly worse for certain groups. Transparency was the enforcement mechanism.

Buolamwini had no legal authority. Her power came from rigorous evidence, specific demographic measurements, and public publication — which created reputational pressure and political attention that companies had to respond to.

As of 2024, the United States has no equivalent to the EU AI Act. Which of the following best describes the current US approach to governing high-risk AI systems?

Correct. US AI governance is fragmented: some states (California, Illinois) have passed relevant laws; sector regulators like the CFPB cover specific applications; and companies have voluntary policies. There is no comprehensive federal law equivalent to the EU AI Act as of 2024.

US AI governance is fragmented — a patchwork of state laws, sector-specific rules, and voluntary policies, with no comprehensive federal framework equivalent to the EU AI Act.

A school district starts using an automated grading algorithm with no public documentation, no oversight board, and no appeal process. Using the governance framework from Lesson 4, which is the most critical missing element?

Correct. Governance requires that people affected by an algorithmic system have access to mechanisms for identifying errors, challenging decisions, and seeking accountability. Without transparency, oversight, or an appeal process, students harmed by errors have no recourse — which is a governance failure regardless of how accurate the algorithm is on average.

The critical governance gap is the absence of accountability mechanisms. When an algorithm affects people's lives, there must be a way to identify errors, challenge outcomes, and seek remedy. Without those mechanisms, even a reasonably accurate system can cause harm with no recourse.

Module 3 · Lab 4

Full Audit: Apply All Four Tools

You're the lead auditor. Run a complete fairness analysis on a real-world-style system.

The case

A county government has deployed an algorithm to decide which families receive priority access to subsidized housing. Inputs: current address (zip code), household income, previous eviction history, years of stable employment, and number of dependents. The county says: "The algorithm is unbiased — we don't use race or any protected category. It helps us serve families most efficiently." There is no public documentation, no appeal process, and no third-party audit.

You are the lead auditor brought in by a legal aid organization. You have all four tools from this module: proxy detector, fairness definition checker, feedback loop spotter, and governance questioner.

Run through the four-part audit out loud. Apply each tool to this system. Then give your overall verdict: is there enough evidence of unfairness to challenge this algorithm, and on what grounds?

Legal Aid Co-Auditor

Lab 4

We have the county's documentation in front of us. Five inputs, no race variable, no public audit history. The county is confident. I'm not. Walk me through your analysis — use all four audit tools and tell me which findings you think are strong enough to act on. What's your opening move?

Module 3 · Final Test

Spot the Unfair Rule

15 questions · Pass at 80% or above · Tests reasoning, not just recall

1. A medical algorithm trained on historical hospital data consistently recommends less pain management for Black patients. No race variable was included. What is the most likely explanation?

Correct. Historical medical data reflects past discriminatory treatment patterns. Without race as an explicit variable, proxies (like zip code, insurance type, or symptom documentation patterns) can recreate the same disparity.

The mechanism is proxy discrimination: the model learned from historical data that itself reflected discriminatory patterns, and used correlated variables to reproduce those patterns without explicitly using race.

2. Which of the following is the best example of a proxy variable for socioeconomic class?

Correct. Zip code does not directly measure wealth but is strongly correlated with it due to decades of unequal housing policy. Using it in a model can produce class-based (and race-based) disparities even without any explicit income or race variable.

A proxy variable is one that is not itself the protected or sensitive category but is statistically correlated with it. Zip code is a classic proxy for both race and socioeconomic class.

3. Amazon's résumé-screening algorithm was shut down in 2018 because it penalized résumés containing the word "women's." Engineers had not included gender as an input. Why did the bias emerge anyway?

Correct. The model had a clear and legitimate objective: predict successful hires based on patterns in past data. But past data reflected discriminatory hiring. So the model accurately learned and replicated the bias it was trained on.

The bias emerged from the training data, not from a coding error. The model correctly identified that "women's" on a résumé was correlated with not being hired — because that was true in Amazon's historical data, which reflected years of gender bias in tech hiring.

4. The impossibility theorem for fairness metrics (Kleinberg et al., 2016) proves that:

Correct. The theorem establishes a mathematical constraint: when groups differ in their underlying base rates, satisfying both fairness definitions at once is impossible. This means any system must choose — and that choice is a value judgment.

The theorem is about a specific mathematical tension between calibration and equal false positive rates when groups have different base rates. It does not say fairness is impossible — it says certain combinations of fairness properties are mathematically incompatible.

5. COMPAS was described as "fair" by Northpointe because it was calibrated. ProPublica said it was unfair because of a false positive rate disparity. Who was right?

Correct. Both organizations measured real, valid properties of the same system. The deeper finding is that these properties can conflict, and choosing between them requires a value judgment about whose interests to prioritize — not a technical calculation.

Both Northpointe and ProPublica were accurately measuring real properties of the system using different valid fairness definitions. The conflict between them is not a mistake by either side — it reveals a genuine mathematical incompatibility.

6. A rental algorithm charges higher fees to applicants from zip codes with majority-minority populations. The company says: "We don't use race — we use location data." Which concept from this module best explains why this defense is insufficient?

Correct. Proxy discrimination is the relevant concept. The company's denial of intent does not address the mechanism: zip code correlates with race due to historical segregation, so discriminating on zip code produces racially discriminatory outcomes.

This is proxy discrimination. Saying "we don't use race" is not sufficient when the variable actually used (zip code) is a proxy for race. The discriminatory effect is the same regardless of what the variable is called.

7. A YouTube recommendation system that optimizes for watch time learns to promote extreme content because it keeps users engaged longer. No engineer programmed "recommend extreme content." How is this an example of a feedback loop?

Correct. The feedback loop structure is: algorithm output (recommendation) → user behavior (watch time) → training signal → future algorithm behavior. The system learns from the consequences of its own previous decisions, amplifying whatever patterns drive the optimization target.

A feedback loop exists when outputs become inputs. Here: recommendations affect watch behavior, watch behavior trains the model, the trained model makes new recommendations. The loop exists regardless of whether engineers intended it or knew about it.

8. A self-fulfilling prediction differs from an ordinary prediction because:

Correct. The self-fulfilling nature collapses the distinction between prediction and causation. If sending more police to a neighborhood causes more arrests, you cannot know whether the algorithm's prediction of "high crime" was accurate — the prediction helped create the outcome.

A self-fulfilling prediction causes the outcome it predicts. This creates a fundamental measurement problem: you cannot evaluate the prediction's accuracy because the intervention changes the outcome.

9. Predictive policing researchers argue that crime data reflects "where police patrol" rather than "where crime occurs." Which type of bias does this create in a predictive policing algorithm?

Correct. The policing feedback loop is a textbook example: the data is contaminated by previous policing patterns, the algorithm amplifies those patterns by directing more resources to the same areas, generating more data that confirms the original pattern.

The problem is a feedback loop with self-fulfilling dynamics: historical over-policing produces arrest data, the algorithm directs more policing to those areas, more policing produces more arrests, which reinforce the pattern in the next model update.

10. Joy Buolamwini's Gender Shades audit found that commercial facial recognition systems had error rates up to 34% higher for dark-skinned women. What made this finding effective at changing industry practice, even without legal authority?

Correct. The audit's power came from scientific rigor (specific, measurable findings with clear methodology), demographic specificity (not just "the system has bias" but who was harmed and by how much), and transparency through publication.

Audit-as-governance works through transparency. Specific, rigorous, publicly available findings create accountability pressure even without formal legal authority, because companies face reputational and regulatory risk when documented failures are public.

11. Robert Williams was wrongfully arrested in Detroit in 2020 based on a facial recognition match. Three similar wrongful arrests followed. What does this pattern reveal about the governance of facial recognition in Detroit before the 2021 restrictions?

Correct. Buolamwini's research on differential error rates was published in 2018 — before all three wrongful arrests. The governance failure was not a lack of information; it was the absence of mechanisms requiring that information to be acted upon before deployment.

The critical governance failure is that known bias — documented in published research before the arrests — was not addressed through pre-deployment accuracy requirements, human oversight, or appeal mechanisms. Harm repeated because accountability structures were absent.

12. The EU AI Act classifies facial recognition in public spaces by law enforcement as largely prohibited. Which argument best justifies this classification under a fairness framework?

Correct. The fairness argument for prohibition combines documented demographic bias in accuracy, the severity of the harm when errors occur (arrest, detention), the absence of consent, and the evidence that governance mechanisms deployed after the fact failed to prevent repeated harm.

The fairness-based argument combines documented differential error rates, the severity and irreversibility of errors (wrongful arrest), and evidence that voluntary industry action was insufficient to prevent harm after research documented the bias.

13. A company argues that its loan algorithm is fair because it has the same overall approval rate for white and Black applicants. What critical information is missing from this claim?

Correct. Equal overall approval rates can mask unequal error distributions. The relevant fairness question is whether creditworthy applicants of different groups are approved at equal rates — not just whether total approvals are balanced.

Overall rates obscure whether errors are equitably distributed. Equal approval rates could coexist with a system that systematically denies creditworthy Black applicants while approving marginal white applicants at higher rates.

14. Which combination of audit tools would be most appropriate for investigating whether a hospital's patient-scheduling algorithm might be producing unfair outcomes?

Correct. A complete audit uses all four tools: proxy detection, fairness definition testing across groups, feedback loop analysis, and governance evaluation. No single tool is sufficient because unfairness can enter through multiple pathways simultaneously.

A thorough audit applies all four tools because bias and unfairness can enter through multiple pathways: proxy variables, unequal error distributions, feedback loops in historical data, and absent accountability mechanisms.

15. When a company says "our AI system is unbiased," what is the most analytically precise response based on everything in this module?

Correct. "Unbiased" is not a single property — it is a claim that needs to be unpacked across multiple dimensions. Asking these four questions applies all of the analytical tools from the module and converts a vague claim into a testable set of specific properties.

The most precise response converts the vague claim "unbiased" into specific, testable questions: which fairness definition? are proxies present? is there a feedback loop? are accountability mechanisms in place? These questions operationalize what "unbiased" actually means.