In 2014, Amazon set its engineers a challenge: build a tool that reads thousands of rΓ©sumΓ©s and scores each one from one to five stars β automatically, with no human in the loop. The goal was efficiency. Amazon received hundreds of thousands of applications a year, and human recruiters simply could not read them all.
The team fed the system roughly ten years of historical hiring data β rΓ©sumΓ©s that Amazon had previously received, and the outcomes of those applications. The model learned what a "successful" Amazon hire looked like. By 2015 it was running. By 2017, the engineers noticed something wrong.
The system was penalizing rΓ©sumΓ©s that included the word "women's" β as in "women's chess club" or "women's college." It was also downgrading graduates of all-women's colleges. Nobody had told the system to do this. Nobody had written a rule that said "prefer men." The system had discovered that pattern on its own, by reading ten years of Amazon's own hiring history β a history in which the tech workforce was overwhelmingly male.
Amazon shut the tool down in 2018. Reuters broke the story in October of that year. The system was never used to make final hiring decisions, Amazon said β but it had been used to surface "top candidates" for human review. The damage was real.
Here is the key move to understand. The Amazon tool was not given a rule that said "be biased against women." It was given a goal β find candidates who look like past successful Amazon hires β and it found patterns that predicted that outcome. The word "women's" on a rΓ©sumΓ© was a genuine statistical predictor, in the training data, of not being hired. The system was perfectly accurate, in a twisted sense: it learned to replicate the bias that had already existed in Amazon's human decisions.
This is called proxy discrimination. A proxy is something that stands in for something else. "Women's" was not a protected category in the model's code. But it was a proxy for gender β and discriminating on that proxy had the same effect as discriminating on gender directly.
The trap is that proxies are everywhere. Zip code is a proxy for race in many American cities, because residential segregation shaped where different groups ended up living. Credit history is a proxy for wealth, and wealth is correlated with race. The name on a rΓ©sumΓ© can be a proxy for ethnicity. An algorithm trained on real-world data will pick up all of these correlations β and if the algorithm's goal is prediction, it will use every useful signal it finds, including ones that encode centuries of discrimination.
Before the Amazon story broke, the standard defense of algorithmic decision-making was simple: "We don't use race. We don't use gender. We use objective data." This argument sounds solid. If a computer isn't given those categories, how can it discriminate on them?
The Amazon case demolished that argument. The engineers had deliberately removed gender from the training features. They knew that using gender directly would be wrong. But the system found its way back to the same outcome through the back door of language patterns. Removing the protected variable is not enough if correlated variables remain in the data.
This is not a hypothetical edge case. In 2019, researchers at Berkeley published a study finding that mortgage-lending algorithms approved Black and Latino applicants at lower rates than white applicants with similar financial profiles. The algorithms used no race variable β but they used zip code, employment type, and loan purpose, all of which were correlated with race in the historical data the models were trained on.
If a company genuinely does not intend to discriminate, but its algorithm produces discriminatory outcomes because of patterns in historical data β is that company responsible? Does intent matter, or only outcome? There is no clean answer to this, and courts in the United States are still working it out.
You are now in a position to see something that most adults miss when they read headlines about AI bias. When a company says "our algorithm is unbiased because we don't use protected categories," that statement β however sincere β does not mean what it sounds like it means. Knowing about proxy variables changes how you read every one of those claims.
The practical skill from this lesson is asking one question whenever you see an algorithmic system: What are the actual inputs, and what real-world categories might those inputs be proxies for?
Some examples that show how this works in the real world:
School discipline prediction systems β Several American school districts have piloted tools that try to predict which students are at risk of disciplinary problems. Common inputs include attendance history, previous disciplinary record, and neighborhood. Attendance and previous discipline are shaped by socioeconomic status. Neighborhood is shaped by decades of housing segregation. The "risk score" that comes out often ends up tracking race, even though race was never an input.
Predictive policing β Systems like PredPol (now Geolitica), deployed in dozens of US cities in the 2010s, used historical crime report data to predict where crimes would occur next. But crime reports reflect where police have previously concentrated resources. If police have historically over-policed certain neighborhoods, those neighborhoods accumulate more crime reports β and the algorithm sends police there again, generating more reports, in a feedback loop that has nothing to do with the actual distribution of crime.
Ad targeting β In 2016, ProPublica journalists found that Facebook's ad targeting tool allowed advertisers to exclude users from seeing housing ads based on "Ethnic Affinity" β a category Facebook had inferred from users' browsing behavior. Facebook said it was not using race. But "Ethnic Affinity" was a direct proxy for race, and the Fair Housing Act of 1968 prohibits discrimination in housing advertising. Facebook settled with the Department of Housing and Urban Development in 2019.
You can now read news stories about algorithmic systems with a question that most journalists β let alone readers β do not think to ask: not "does it use race?" but "what variables does it use, and what are those variables proxies for?" That is a more precise, more useful question. It is the question auditors ask.
You've been brought in to audit a new hiring algorithm used by a logistics company. The company says it's completely fair β "we don't use any protected categories." The inputs are: years of continuous employment, number of previous employers, commute distance from headquarters, educational institution attended, and extracurricular activities listed on the rΓ©sumΓ©.
Your AI counterpart has already reviewed the technical documentation. Work through the audit together. You need to take a position β not just list possibilities.
In 2013, a man named Eric Loomis was arrested in Wisconsin for fleeing police after a drive-by shooting. He faced sentencing. The judge used a risk assessment score β a number generated by a software tool called COMPAS, made by a company called Northpointe β to help decide Loomis's sentence. COMPAS said Loomis was "high risk." He was sentenced to six years in prison.
Loomis challenged his sentence, arguing that he had a constitutional right to know how the score was calculated β and that Northpointe refused to disclose the algorithm's details because they were a trade secret. The case went all the way to the Wisconsin Supreme Court, which ruled in 2016 that using COMPAS was acceptable as long as it was not the sole factor in sentencing.
That same year, the investigative news outlet ProPublica published an analysis of COMPAS scores for more than seven thousand defendants in Broward County, Florida. Their finding was explosive: Black defendants who did not re-offend were nearly twice as likely as white defendants who did not re-offend to be falsely labeled "high risk." And white defendants who did re-offend were more likely to have been labeled "low risk."
Northpointe fired back: their tool was fair, they said, because it was equally accurate overall for Black and white defendants. Both groups had the same rate of correct predictions. ProPublica said that was precisely the wrong measure. The argument that followed became one of the most important debates in the history of algorithmic fairness.
Here is the core of the dispute, stated precisely. Northpointe's claim was this: among defendants COMPAS labeled "high risk," the proportion who actually re-offended was roughly the same for Black and white defendants. This property is called calibration β the scores mean the same thing regardless of group. That sounds fair.
ProPublica's claim was different: among defendants who did not re-offend, Black defendants were more likely to have been labeled high-risk anyway. This is called a false positive rate disparity. For the people who are actually innocent of future crime, the system is harsher to Black defendants. That also sounds like unfairness.
In 2016, computer scientists Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan published a mathematical proof showing that these two definitions of fairness cannot both be satisfied at the same time β as long as the two groups have different base rates of re-offense. This is called the impossibility theorem for fairness metrics. If Black defendants, as a group, re-offend at a higher rate in the historical data (itself a product of who gets arrested and prosecuted, not who actually commits crimes), then a calibrated model will necessarily produce higher false positive rates for Black defendants. There is no algorithm that can satisfy both definitions simultaneously.
The deeper issue that the COMPAS debate exposed was this: the "base rate" β the underlying rate at which a group re-offends in the historical data β is not a neutral fact. It is the output of a criminal justice system that has historically over-policed, over-arrested, and over-prosecuted Black defendants. When an algorithm is trained on that data and treats the base rate as ground truth, it inherits and enshrines those disparities.
To put it concretely: if police departments have historically concentrated resources in certain neighborhoods and applied more scrutiny to Black defendants at every stage of the legal process, then Black defendants will appear in the data as higher-risk not necessarily because they behave differently, but because the system has been watching them more closely. The data reflects surveillance, not reality.
Should a risk assessment tool use historical crime data at all, if that data is produced by a biased system? If you refuse to use historical data, you lose predictive accuracy. If you use it, you reproduce bias. Neither option is clean. Courts, legislatures, and researchers are still arguing about this. As of 2024, several US states have passed laws limiting the use of algorithmic risk scores in criminal sentencing β but COMPAS-type tools are still widely deployed.
This matters for you right now: the COMPAS debate is not a solved problem that adults are handling well. It is an ongoing policy dispute where the technical constraints are established β you cannot satisfy all fairness definitions simultaneously β and the political choice about which definition to prioritize has not been made democratically. Someone is making that choice. Usually it is the company selling the tool.
Here is the uncomfortable consequence of the impossibility theorem. When a company says "our algorithm is fair," they have necessarily chosen which definition of fairness they are optimizing for β and they are necessarily violating at least one other reasonable definition. This is not a bug that better engineering can fix. It is a value judgment embedded in the math.
The relevant questions to ask are: Who decided which fairness definition to use? Who was consulted? Who bears the cost of the definition that was not chosen?
In the COMPAS case, the choice to optimize for calibration rather than equal false positive rates meant that the cost of the tradeoff β higher false positive rates β was borne primarily by Black defendants who were not re-offenders. They were labeled dangerous when they were not. That cost was not distributed evenly. It fell on the people who had the least power in the system.
When someone says an algorithm is fair, you can now ask: "Fair by which definition?" That question breaks open almost every algorithmic fairness claim. The existence of multiple irreconcilable definitions of fairness is not widely taught β even to people who build these systems. You have a sharper analytical tool than most adults working in policy.
A school district wants to deploy an algorithm that predicts which students are "at risk of dropping out." The algorithm gives each student a score. The district will use high scores to trigger extra support services. A researcher shows you two facts about the algorithm: (1) It is well-calibrated β a 70% risk score corresponds to roughly 70% dropout rate for both white and Black students. (2) Among students who actually graduate, Black students are labeled "high risk" at twice the rate of white students.
You are advising the school board. You need to pick a position and defend it β not just describe both sides.
In 2012, YouTube engineers faced a problem. People were watching videos β but they weren't watching enough of them. The key metric that mattered to YouTube's parent company, Google, was watch time: the total number of minutes users spent on the platform. Engineers discovered that if they optimized the recommendation algorithm for watch time rather than clicks, users stayed on the platform much longer. The update rolled out in 2012. By 2016, YouTube was serving over a billion hours of video per day.
In 2019, a former YouTube engineer named Guillaume Chaslot β who had worked on the recommendation system before leaving the company in 2013 β published analysis showing where the watch-time optimization had led. The algorithm had learned something that no engineer had explicitly programmed: emotionally activating content keeps people watching longer. Outrage, fear, and conspiracy theories generated more engagement than calm, accurate information.
The system had no intent. It had no political agenda. It was simply optimizing for watch time. But in doing so, it had discovered that recommending progressively more extreme content kept users engaged. Someone watching a moderate political video would be recommended a more intense version. Then a more intense version still. The system had no concept of truth or harm β only of what produced the next click.
By 2019, internal research at YouTube β documents that were later reported on by the Wall Street Journal β showed that the company's own engineers had identified the radicalization problem as early as 2016. Executives had declined to act, the Journal reported, because weakening the recommendation engine would reduce watch time. The feedback loop had been identified. The decision was made to leave it running.
A feedback loop occurs when the output of a system becomes an input that shapes the system's future behavior. In simple terms: the algorithm learns from what people do, those actions were shaped by what the algorithm showed them, which means the algorithm is partially learning from its own past influence.
In YouTube's case: the algorithm recommends extreme content β users who were already somewhat radicalized watch more of it β their watch patterns train the algorithm to recommend even more extreme content to similar users β the algorithm gets better and better at serving that segment β more users end up in that segment. Nobody wrote a rule saying "radicalize users." The loop generated that outcome from the optimization target of watch time.
This is not unique to YouTube. The same dynamic appears in many AI systems:
Spotify's recommendation algorithm β If a genre becomes popular among a demographic, the algorithm serves more of it to that demographic, making it more popular, making the algorithm serve even more of it. Users get a narrower and narrower selection of what the algorithm predicts they want, rather than what the full space of music contains.
Credit scoring feedback loops β If an algorithm assigns lower credit scores to people in certain zip codes, those people get higher interest rates, accumulate more debt, miss more payments, and receive even lower credit scores in the next generation of the model. The original prediction contributed to the outcome it predicted.
The most concerning form of feedback loop is what researchers call a self-fulfilling prediction. The algorithm predicts an outcome, the prediction causes an action, and the action makes the prediction come true β even if the original prediction was wrong.
The clearest example comes from the world of targeted policing. If a risk algorithm predicts that a neighborhood has high crime, more police are sent there. More police means more arrests for minor infractions that would go undetected in other neighborhoods. Those arrests feed back into the crime data as evidence that the neighborhood has high crime. The prediction was validated β but the algorithm created the evidence that validated it.
A 2016 study by researchers at the AI Now Institute found that several predictive policing systems in use in the United States had exactly this feedback structure. The algorithms were not measuring crime. They were measuring policing β and then directing more policing based on that measurement, in a loop.
If a prediction system affects behavior in ways that make its predictions come true, can you ever know whether the prediction was accurate? This is not a rhetorical question β it is a genuine methodological problem in evaluating AI systems. YouTube cannot know what its users would have watched without the recommendation algorithm. The algorithm is not measuring pre-existing preferences; it is shaping them.
Technical solutions to feedback loops exist. Researchers have proposed methods like counterfactual logging (showing users random recommendations occasionally to measure what they would have chosen without the algorithm's influence) and diversity constraints (requiring recommendation systems to include content from outside a user's apparent preference cluster). YouTube has implemented some of these β the company introduced policies in 2019 reducing recommendations of what it called "borderline content."
But the harder obstacle is economic. YouTube's business model depends on watch time. Disrupting the recommendation loop costs engagement and, therefore, ad revenue. The feedback loop that produces radicalization and the feedback loop that produces profit are the same loop. Fixing one means accepting costs to the other. As of 2024, researchers continue to debate whether YouTube's policy changes have meaningfully reduced algorithmic radicalization, or whether they have primarily changed which kinds of extreme content get amplified.
The institutional version of this problem is important. When harmful feedback loops are identified inside companies, the decision to fix them is not purely technical β it is made by executives weighing engineering costs, revenue impacts, and public relations risk. The people best positioned to break the loop are often the people with the strongest financial incentive to leave it running.
Next time someone says an algorithm is fair because it "just responds to user behavior," you know why that's incomplete. User behavior is itself partly a product of what the algorithm has been showing them. The system shapes the signal it is trying to read. That is a feedback loop β and it is one of the primary ways that bias and harm can escalate in AI systems without any single person intending it.
A social media platform uses an algorithm that ranks posts by predicted engagement (likes, shares, comments). The platform has noticed that posts from certain political viewpoints generate more engagement β not because more people hold those views, but because those posts tend to generate angry reactions from people who disagree. The algorithm counts angry comments as engagement, so it promotes those posts to even more people, generating even more angry comments.
You are advising the product team. Your job is not to describe the problem β I already know it. Your job is to propose a specific fix and anticipate the objections.
In January 2020, a man named Robert Williams was driving home in Farmington Hills, Michigan when two Detroit police officers arrested him on his front lawn, in front of his wife and daughters. He was taken to a detention center, held overnight, and interrogated the following morning. The charge: shoplifting watches from a Shinola store in 2018.
Williams had not stolen anything. The Detroit Police Department had used a facial recognition system made by a company called DataWorks Plus to match a blurry surveillance image against a database of driver's licenses. The system returned a match β Williams's face β and a detective used that match as the basis for the arrest. No human auditor had verified the match with any additional evidence.
Williams was the first publicly documented case of a wrongful arrest caused by facial recognition error in the United States. He was not the last. In 2021, Michael Oliver was wrongfully arrested in Detroit under similar circumstances. In 2022, Porcha Woodson β eight months pregnant β was arrested in Detroit using the same system. All three cases involved Black individuals. Research by MIT's Joy Buolamwini and Carnegie Mellon's Timnit Gebru, published in 2018 in a paper called "Gender Shades," had already demonstrated that commercial facial recognition systems had error rates up to 34% higher for dark-skinned women compared to light-skinned men.
Detroit's city council voted to place restrictions on police use of facial recognition in 2021. But the technology remained in use across dozens of other American cities with no equivalent oversight.
The Detroit facial recognition cases are important not just as examples of unfair algorithms β you have seen many of those by now β but as examples of what happens when governance mechanisms are or are not in place.
In Detroit, governance eventually arrived but only after three documented wrongful arrests and sustained pressure from civil rights organizations. The ACLU of Michigan represented Robert Williams in a legal challenge. The Innocence Project supported subsequent cases. City council members introduced legislation. This is the slow, messy path of democratic accountability β it works, but it works after harm has already been done.
The contrast with the European Union is instructive. In 2024, the EU passed the AI Act β the first comprehensive legal framework for artificial intelligence in the world. The Act classifies AI systems by risk level. Systems used in criminal justice, employment, education, and credit scoring are classified as "high-risk" and subject to mandatory transparency, human oversight, and accuracy requirements before deployment. Facial recognition in public spaces by law enforcement is banned outright in most circumstances.
The United States, as of 2024, has no equivalent federal law. Governance is fragmented across state legislation (California, Illinois, and a handful of others have passed AI-related laws), sector-specific regulations (the Equal Credit Opportunity Act covers loan decisions, for example), and voluntary company policies. The gap between EU and US regulatory approaches represents one of the most consequential ongoing debates in technology policy β and it is happening right now, not in some distant future.
Before laws catch up, one of the most effective mechanisms for identifying unfair algorithmic rules has been the independent audit. This is exactly what ProPublica did with COMPAS in 2016, and what Joy Buolamwini did with facial recognition systems in 2018.
An algorithmic audit works by:
1. Obtaining access to the system's inputs and outputs β either by working with the company, through freedom of information requests, or by testing the system from the outside (as Buolamwini did with facial recognition by building a test dataset of faces).
2. Defining what "fair" would look like β choosing specific, measurable fairness criteria and explaining why they were chosen.
3. Testing whether the system meets those criteria across demographic groups β comparing error rates, approval rates, or outcomes across race, gender, age, and other relevant dimensions.
4. Publishing the results β transparency is the enforcement mechanism when formal legal authority does not exist.
Buolamwini's Gender Shades audit was conducted while she was a graduate student at MIT. It changed industry practice within two years. IBM, Microsoft, and Amazon all updated their facial recognition products after her findings were published. That is an example of audit-as-governance working β but it required an individual researcher's persistence against companies with vastly more resources.
Joy Buolamwini was able to change industry practice through audit and publication. But she was a graduate student at MIT with institutional credibility and connections. What happens to people without those resources β the Robert Williamses of the world β who encounter unfair systems and lack access to audits, researchers, or legal representation? Who has the power to demand accountability, and what does it take to acquire that power? This is not a question with a comfortable answer.
This module has given you a set of specific diagnostic tools. Let's name them explicitly, because they are genuinely useful β not as abstract knowledge but as questions you can ask in real situations:
Proxy detector: What are the actual inputs to this system, and are any of them proxies for a protected category? (Lesson 1)
Fairness definition checker: What definition of fairness is this system optimized for, who chose that definition, and who bears the cost of the definitions that were not chosen? (Lesson 2)
Feedback loop spotter: Is this system's training data shaped by its own previous outputs? Are there self-fulfilling predictions in how this system works? (Lesson 3)
Governance question: What mechanisms exist to identify and challenge this system's errors? Who has access to those mechanisms, and who does not? (Lesson 4)
These four questions together constitute the core of an algorithmic fairness audit. They are questions that regulators, civil rights lawyers, investigative journalists, and academic researchers actually use. You have them now too.
The decisions being made right now β in legislatures, in boardrooms, in courtrooms β about which algorithmic systems are permitted, which fairness definitions are legally required, and what oversight mechanisms will govern AI β will shape the world you inherit. Those decisions are not finished. They are being made by people who are often working with less analytical precision than you now have. You are not too young to participate in these debates. You are, in some ways, better positioned to see them clearly than people who have spent years inside systems that benefit from the current ambiguity.
A county government has deployed an algorithm to decide which families receive priority access to subsidized housing. Inputs: current address (zip code), household income, previous eviction history, years of stable employment, and number of dependents. The county says: "The algorithm is unbiased β we don't use race or any protected category. It helps us serve families most efficiently." There is no public documentation, no appeal process, and no third-party audit.
You are the lead auditor brought in by a legal aid organization. You have all four tools from this module: proxy detector, fairness definition checker, feedback loop spotter, and governance questioner.