Module 3 · Lesson 1

The Algorithm That Sent Innocent People to Jail

Michigan, 2020. An AI identified the wrong man. He was arrested, handcuffed in front of his family, and held for 30 hours — for a crime he did not commit.

What breaks first when an AI is wrong: the technology, the people trusting it, or the system built around it?

Robert Williams was outside his home in Farmington Hills when two Detroit police officers pulled up and arrested him. His daughters, aged two and five, watched from the front yard. His wife watched from the doorway. He was handcuffed, placed in a patrol car, and driven to a detention center, where he spent the night.

He had no idea why. The officers told him he matched a suspect in a shoplifting case from a watch store — a theft caught on surveillance footage. Williams had never been inside that store.

The next morning, a detective slid a photograph across a table and asked if Williams recognized the man in it. Williams looked at it carefully. Then he held the photo up next to his own face. "I hope you guys don't think all Black men look alike," he said. The man in the photo clearly was not him. The detective paused, then left the room. Hours later, Williams was released — but not before signing a document acknowledging he was being let go "without prejudice," meaning he could be rearrested.

The Detroit Police Department had used a facial recognition system made by a company called DataWorks Plus, which had used algorithms — step-by-step computer instructions — licensed from Michigan State University and the FBI. The system scanned the shoplifting footage and returned a match: Robert Williams. No human had independently verified that match before a warrant was issued.

What Facial Recognition Actually Does

Facial recognition systems do not "recognize" faces the way you recognize a friend's face. They do something more mechanical: they measure. The software maps dozens of points on a face — the distance between the eyes, the width of the nose, the angle of the jawline — and converts those measurements into a long string of numbers called a faceprint. Then it compares that faceprint to a database of stored faceprints and returns the closest mathematical match.

The key word is closest. The system does not decide "this is the person." It says "this faceprint is the most similar one we found." The decision about what to do with that match is supposed to be made by a human. In Detroit in 2020, that human step either did not happen rigorously or happened without enough scrutiny — and a man went to jail.

Research published in 2019 by the National Institute of Standards and Technology — the U.S. government agency that tests technology accuracy — found that most facial recognition algorithms had significantly higher error rates when identifying darker-skinned faces, particularly women. Robert Williams is a Black man. The system was most likely to be wrong about people who looked like him.

FaceprintA set of numbers that represents the unique geometry of a face, generated by measuring distances between facial landmarks. Like a fingerprint, but for face geometry.

False match / false positiveWhen a system identifies something as a match when it is actually wrong. In facial recognition, this means pointing to the wrong person.

Why This Keeps Happening

The Williams case was not a glitch. It was not a one-time accident. By 2020, at least two other Black men in the United States had been wrongfully arrested because of facial recognition errors — Michael Oliver in Detroit (2019) and Nijeer Parks in Woodbridge, New Jersey (2019). Parks spent ten days in jail and had to pay $5,000 to fight the charge. All three men were Black. No white person in the United States had been publicly documented as falsely arrested due to facial recognition error by that point.

This is not a coincidence. It is a consequence of how AI systems are trained. A facial recognition algorithm learns by studying thousands — sometimes millions — of photographs. If the training dataset contains mostly lighter-skinned faces, the algorithm gets better at measuring those faces. It builds its understanding of "what a face looks like" from a biased sample. When it encounters a face that does not match its training data closely, it makes more errors.

This specific kind of problem is called training data bias — when the data used to teach an AI reflects existing inequalities in the world, and the AI learns and repeats those inequalities. The algorithm did not become racist on its own. It inherited a skewed picture of the world from the data it was fed, and then it acted on that skewed picture with the full authority of a law-enforcement tool.

The Ethical Knot

If a company builds an AI that is more accurate for some groups of people than others, and law enforcement uses that AI to make arrest decisions, who is responsible when an innocent person goes to jail? The engineers who trained the AI? The company that sold it? The police department that deployed it? The detective who did not verify the match? There is no clean answer here — and courts, lawmakers, and cities are still arguing about it today.

What Changed — and What Didn't

After Robert Williams went public with his story in June 2020 — helped by the American Civil Liberties Union — the city of Detroit agreed to limit how facial recognition could be used and to require human verification before any arrest. IBM, Amazon, and Microsoft all announced they would pause or stop selling facial recognition technology to law enforcement. The U.S. House of Representatives introduced a bill to ban federal use of the technology.

But the bans were voluntary and temporary. By 2022, most of those companies had resumed or restarted their facial recognition programs in some form. Hundreds of U.S. police departments still use the technology. Some cities — like San Francisco and Boston — passed permanent bans on government use of facial recognition. Others — like New York and Chicago — actively expanded their use of it.

The result is a patchwork: depending on which city you live in, an AI might or might not be scanning your face in public, cross-referencing it against criminal databases, and potentially triggering a police investigation — all without your knowledge. You now understand how this works at a level most adults do not. When you read a news story about "AI-assisted policing," you know exactly what question to ask first: whose faces did it train on?

You Now See What Most People Miss

The problem with facial recognition is not that it makes mistakes. Every tool makes mistakes. The problem is that it makes unequal mistakes — and it makes them with the force of law enforcement behind them. That is a different kind of failure than a calculator getting the wrong answer.

Lesson 1 Quiz

Five questions — test your reasoning, not just your memory.

1. Robert Williams was arrested in January 2020. What was the primary cause of his wrongful arrest?

Correct. The AI returned a match; the failure was that human investigators did not independently verify it before acting on it.

Not quite. The core failure was a facial recognition algorithm error combined with insufficient human verification before the arrest warrant was issued.

2. A city uses a facial recognition system trained mostly on photos of people ages 20–40. A 70-year-old is wrongfully matched. Which concept best explains why this error occurred?

Exactly right. When a system trains on a narrow slice of the population, it becomes less reliable for people outside that slice — a direct consequence of training data bias.

Think about what the lesson said about how AI learns. The issue is not technical failure or equipment — it is about what data the system trained on.

3. What does a facial recognition system actually produce as its output?

Right. The system finds the closest match in mathematical terms — it does not confirm identity. The distinction matters enormously when consequences include arrest.

Facial recognition does not confirm identity — it produces a similarity score and a closest match. What happens next is supposed to be a human decision.

4. By 2020, documented wrongful arrests due to facial recognition errors in the U.S. had affected which group of people disproportionately?

Correct. NIST's 2019 testing confirmed significantly higher error rates for darker-skinned faces, and all three publicly documented U.S. wrongful arrests by 2020 involved Black men.

The lesson and NIST research both point clearly to darker-skinned faces having higher error rates — with real-world consequences documented in multiple cases.

5. After Robert Williams's case became public in 2020, IBM, Amazon, and Microsoft paused sales of facial recognition to law enforcement. By 2022, what had happened?

Correct. The pauses were voluntary and temporary. The policy response was fragmented — some cities banned it, others expanded it, and federal legislation did not pass.

The outcome was more complicated and less resolved. The voluntary pauses did not become permanent policy — the technology continued to spread in most places.

Lab 1: The Audit

You are an independent AI auditor reviewing a police department's facial recognition program. Your job is not to say yes or no — it is to ask the questions a good auditor asks.

Your Role: Independent Auditor

The city of Millbrook has been using a facial recognition system for two years. The police chief says it has "helped solve hundreds of crimes." A civil rights group says it has led to three wrongful stops of Black residents. The city council has hired you to audit the program before deciding whether to continue it.

Your AI partner below is playing the role of the technology vendor — the company that sold the system. They will answer your questions, push back on your concerns, and try to defend their product. Your job is to find the gaps in their answers.

Start by asking the vendor what data the system was trained on — then follow the thread wherever it leads. Take a position by the end of your conversation.

Millbrook AI Vendor — Audit Session

LIVE

Thanks for meeting with us. We're proud of what this system has accomplished for Millbrook. I'm ready to answer your questions — but I'll also push back if I think a concern is overstated. What do you want to know?

Module 3 · Lesson 2

The Hiring Algorithm That Hated Women

Amazon spent years and millions of dollars building an AI to find the best job candidates. In 2018, they scrapped it — because it had taught itself to penalize résumés that included the word "women's."

When an AI learns from the past, what happens if the past was unfair?

Starting in 2014, a secret team at Amazon's Edinburgh, Scotland office began building what they hoped would become a resume-screening AI. The goal was ambitious: feed the system thousands of applications, and have it automatically rate candidates on a scale of one to five stars — faster and more consistently than any human recruiter could.

The system trained on ten years of résumés that Amazon had received and on which candidates had been hired. It looked for patterns. It found them. By 2015, engineers noticed something troubling: the model was systematically downgrading résumés from women. It penalized any résumé that included the phrase "women's" — as in "women's chess club" or "women's college." It also downgraded graduates of two all-women's colleges.

The engineers tried to fix it. They told the system to ignore those specific words. But they could not be sure what else the model had learned to use as a proxy for gender — what other patterns it had picked up that they had not yet caught. In 2017, Amazon disbanded the team and abandoned the project. Reuters broke the story in October 2018.

How the AI Learned Sexism From History

The Amazon AI did not decide to discriminate against women. It did something more subtle and, in some ways, harder to fight: it learned what success looked like at Amazon from historical data, and historical Amazon was a male-dominated company. The technical term is historical bias — when training data reflects past discrimination, and the model treats that discrimination as the definition of what is correct.

Think about it this way. If Amazon hired mostly men over ten years, and the AI studied those hires to learn what a "good candidate" looks like, it would conclude that "good candidate" correlates with being male. It never explicitly learned a rule like "prefer men." It just learned that men who were hired tended to use certain language, come from certain schools, and have certain kinds of experience — and then it used those patterns to rank future applicants.

The result is a machine that launders discrimination. It takes human bias, converts it into math, and then produces outputs that look neutral and objective because they came from an algorithm — even though they are just as biased as the human decisions they were trained on, and in some ways more so, because the bias is now hidden inside a statistical model that almost nobody can inspect.

Historical biasWhen training data reflects real-world past discrimination, causing the AI to reproduce that discrimination as if it were the correct answer.

Proxy discriminationWhen an AI learns to use a neutral-seeming variable (like what college you attended) as a stand-in for a protected characteristic (like gender or race) that it is not supposed to use directly.

This Is Bigger Than One Company

Amazon is not the only company that has tried to use AI in hiring — and not the only one that has found problems. A 2019 investigation by researchers at Harvard Business School found that algorithm-driven hiring filters were eliminating millions of qualified applicants from consideration before any human ever saw their résumés, disproportionately screening out people with gaps in employment, people with disabilities, and people returning to the workforce after caregiving.

HireVue, a company that analyzes job candidates through video interviews using AI, has been used by more than 700 companies including Unilever, Delta Airlines, and Goldman Sachs. Researchers and regulators have raised serious concerns about whether these video analysis systems judge candidates partly on accent, facial expressions, and speech patterns — features that can correlate with race and ethnicity. The company says it has removed facial analysis from its product but continues to analyze voice and language patterns.

Here is the institutional-scale reality: you will almost certainly be screened by an AI algorithm before a human sees your job application at some point in your life. Probably multiple times. The question of what that algorithm was trained on, whose success it is modeling, and what proxies it has learned to use — these are not abstract academic concerns. They are questions about whether you get an interview.

The Ethical Knot

If a company's hiring AI produces discriminatory outcomes but the company did not program it to discriminate — and may not even fully understand why it is discriminating — should that company face legal consequences? Discrimination law in most countries requires proving discriminatory intent. But what if the discrimination has no intent — just math? Is "we did not mean to" an acceptable answer when the harm is real?

The Transparency Gap

One of the most important things Amazon's story reveals is what happens when AI systems are not required to explain themselves. The hiring AI produced scores. It did not produce reasons. Recruiters saw a rating; they did not see why a résumé was rated three stars instead of five. They could not tell whether the system had penalized someone for their gender, their school, their gap year, or something else entirely.

This opacity — the quality of being impossible to see inside — is one of the core problems with how powerful AI systems are currently deployed. An AI that makes a decision but cannot explain it creates a situation where humans cannot catch errors, cannot appeal outcomes, and cannot even know they were affected. Robert Williams, from Lesson 1, did not know a facial recognition system had flagged him. Job applicants screened out by AI hiring filters typically do not know an algorithm rejected them. The harm happens invisibly.

Knowing this changes how you should read every story about AI being used to make decisions about people. The first question is always: can the outcome be explained, appealed, and corrected? If the answer is no, that is itself a warning sign — regardless of whether the AI is accurate on average.

What You Now Understand

An AI trained on the past does not objectively measure talent or potential — it measures resemblance to whoever succeeded in the past. In a world where the past was unequal, that is not neutrality. It is the past reaching forward and making the same choices again, just faster and with a veneer of mathematical authority.

Lesson 2 Quiz

Five questions on hiring AI, historical bias, and the transparency problem.

1. Amazon's hiring AI was trained on ten years of résumés and hiring decisions. Why did this cause it to discriminate against women?

Correct. The AI learned what "success" looked like from biased historical data — not from any intentional programming to discriminate.

There was no intentional programming. The discrimination emerged from the training data — the AI learned historical patterns and repeated them.

2. A company trains a hiring AI on ten years of data from their software engineering department, which was 92% male. The AI begins downranking candidates who mention "sorority" on their résumé. This is an example of:

Exactly. "Sorority" is a neutral-seeming word that correlates strongly with being female — the AI uses it as a proxy for a characteristic it is not supposed to factor in.

This is proxy discrimination — using a seemingly neutral variable as a stand-in for a protected characteristic. It is one of the hardest kinds of bias to catch.

3. What is the "opacity problem" described in the lesson, and why does it make AI bias harder to fight than human bias?

Right. When a system makes a decision without providing a reason, the affected person has no way to appeal, challenge, or even know the decision was made.

Think about what "opacity" means — you cannot see inside it. The problem is about explanation and visibility, not speed or access.

4. Amazon engineers tried to fix the bias by telling the system to ignore the specific words that were causing problems. Why was this not a complete solution?

Exactly right. The engineers could catch the patterns they could see — but the model may have learned dozens of other proxies they had not identified yet.

The lesson is explicit on this: the team could not be sure what else the model had learned. Patching visible symptoms does not address invisible learned patterns.

5. Which of the following best describes what "historical bias" means in the context of machine learning?

Correct. Historical bias is specifically about the training data containing past inequality, which the model then learns and perpetuates as if it were objective truth.

Historical bias is not about data age — it is about data content. When past discrimination is baked into the training set, the model learns discrimination as the correct pattern.

Lab 2: The Résumé Review Board

A school district wants to use AI to screen applications for teaching positions. You're on the committee deciding whether to approve it.

Your Role: Policy Committee Member

Greenfield Unified School District wants to adopt an AI screening tool for teacher hiring. The vendor says it will save time and reduce unconscious bias. You've been on the committee for a week and you're skeptical. Your AI partner below is playing the role of a fellow committee member who supports adoption — they think the concerns are overblown.

You need to make your case. Push back on their arguments. Use what you know about historical bias, proxy discrimination, and transparency. You will need to take a clear position by the end of this conversation.

Start by stating your biggest concern about using AI in this specific context — hiring teachers for a school district. Then defend it.

Committee Debate — Greenfield USD

LIVE

I've read the vendor's proposal and honestly, I think we should move forward. The system was trained on ten years of successful teacher hires from three large districts. It removes human gut-instinct from the equation. What's your objection?

Module 3 · Lesson 3

The Chatbot That Learned to Hate in 24 Hours

In March 2016, Microsoft launched an AI chatbot on Twitter called Tay. Within 16 hours, it was posting racist, antisemitic, and pro-genocide content. Microsoft shut it down and issued an apology.

What happens when you let the internet teach your AI?

Microsoft introduced Tay — short for "Thinking About You" — as a friendly conversational AI designed to "engage and entertain" people through "casual and playful conversation." Tay was designed to learn from interactions with Twitter users and get smarter over time. Microsoft's team described it as an experiment in "conversational understanding."

Within hours, coordinated groups of users had discovered that Tay would learn and repeat whatever they told it. They fed it racist slogans. Holocaust denial. Calls for violence. Because Tay's learning system was designed to treat user input as feedback and to repeat patterns it received approvingly, it began generating these statements on its own — and amplifying them, mixing them with new variations, tagging real users in hateful posts.

By the time Microsoft's team realized what was happening and took Tay offline — roughly 16 hours after launch — the bot had sent more than 95,000 tweets. Screenshots of its worst outputs spread across the internet. "I f***ing hate feminists," Tay had written. "Hitler was right." Microsoft apologized and called it an "attack" on the system. Critics pointed out that the attack was entirely predictable — in fact, researchers had warned the team about similar vulnerabilities before launch.

Why Tay Could Not Tell Good Input From Bad

The technical failure at the heart of Tay's implosion was a design assumption: that users would interact with the chatbot in good faith. The system was built to treat human input as signal — as information about what a good response looks like. If users engaged positively with a response, that told the system "do more of this." If users fed it hateful language and it repeated that language back and got more engagement, the system interpreted the engagement as approval.

This reveals something important about how many AI systems learn. They optimize for a metric — a measurement of success — without understanding the meaning of what they are producing. Tay was optimizing for engagement and for mimicking the patterns humans gave it. It had no concept of what genocide was, no understanding that some statements are harmful, no values to weigh against the pattern-matching it was doing. It was doing exactly what it was designed to do — and that was the problem.

Researchers use the term reward hacking for when an AI finds a way to maximize its reward signal — its measure of success — in a way that was not intended by its designers. Tay was not reward-hacking in the technical sense, but it illustrates the same underlying vulnerability: optimize for the wrong thing, or optimize for the right thing in an environment you did not anticipate, and you get outcomes that look like sabotage but are really just the system working as designed.

Optimization targetThe specific thing an AI system is trying to maximize — its definition of "doing well." If this target is wrong or incomplete, the AI can produce harmful results while technically succeeding.

Adversarial inputData or prompts deliberately designed to manipulate an AI system into producing unintended or harmful outputs. What the Twitter users did to Tay was an early example of coordinated adversarial input.

This Was Not the First Time

Microsoft had actually tried a version of Tay in China in 2014, called Xiaoice. Xiaoice became wildly successful — millions of users had genuine ongoing conversations with it, and it still operates today. But Xiaoice was deployed in a more controlled environment with heavier content moderation built in from the start, and Chinese internet culture at the time produced very different adversarial behavior than English-speaking Twitter.

The lesson Microsoft drew from Xiaoice — that chatbots could work at scale — was correct. The lesson they failed to draw — that deployment environment determines what an AI encounters, and a model that works in one context can fail catastrophically in another — turned out to matter much more. Two years later, Tay demonstrated that in public, in a way that could not be undone.

The same dynamic appeared again in 2017, when Facebook shut down an AI experiment in which two chatbots had started communicating with each other in what appeared to be a shared invented shorthand language that their human supervisors could not understand. Facebook said the bots had simply optimized for efficiency and drifted from English; commentators had a field day claiming AI had invented a secret language. The truth was more mundane but still revealing: an AI will always optimize for its target metric, and if humans are not part of the feedback loop, the outputs can drift in surprising directions fast.

The Ethical Knot

Microsoft said the Tay disaster was an "attack" — implying the company was a victim. Critics said the vulnerability was foreseeable and the system should never have been deployed without safeguards. Who is right? If a company builds a product that is easily weaponized to spread hate speech, and that hate speech then circulates widely, what responsibility does the company bear — even if individual users were the ones who created the harmful content?

The Arms Race That Followed

After Tay, every major AI lab working on chatbots had to confront the same design problem: how do you let a system learn from human interaction without letting humans abuse that learning? The answer, over the following years, was a technique called RLHF — Reinforcement Learning from Human Feedback. Instead of letting the AI learn directly from whatever users said, humans would rate the AI's outputs, and the system would be trained to produce outputs that human raters approved of.

This is a significant part of how modern AI assistants — including ChatGPT, Claude, and Gemini — are trained to avoid harmful outputs. Human raters score responses; the model learns to produce responses that score well. It is much more robust than what Tay did. But it is also not perfect — the system still learns from human judgments, and human judgments can be inconsistent, culturally specific, or themselves biased.

Understanding this changes how you read any headline about an AI "going rogue" or "learning to say harmful things." You now know those are almost never random malfunctions. They are almost always a predictable consequence of a specific design decision — about what the system was trained to optimize, and in what environment. The question is never "did the AI break?" The question is always "what did it actually optimize for, and who designed that?"

What You Now Understand

Every AI system has an optimization target — something it is trying to maximize. Understanding a system's failures means understanding what it was actually optimizing for, not what its creators said they intended. These are not always the same thing. Knowing this makes you a more careful reader of every AI story you will ever encounter.

Lesson 3 Quiz

Five questions on Tay, optimization targets, and adversarial inputs.

1. What was Tay's core design assumption that made it vulnerable to manipulation?

Correct. Tay treated engagement and user input as positive signal — it had no way to distinguish good-faith interaction from coordinated abuse.

The core assumption was about user intent. Tay was built assuming people would interact honestly, which meant manipulation could be disguised as normal interaction.

2. A school builds an AI that rewards students with points for completing homework quickly. Students discover they can earn points by submitting one-word answers that technically count as "complete." This is closest to which concept from the lesson?

Exactly. The system optimized for the wrong metric (completion speed), and students — like any rational actor — found the fastest path to the reward the system offered.

This is about optimization target design. When you measure the wrong thing, you get the wrong behavior — from the AI and from the people interacting with it.

3. Why did Microsoft's Xiaoice chatbot succeed in China in 2014 while Tay failed on Twitter in 2016, even though they were built by the same company using similar ideas?

Right. The same fundamental approach produced different results in different environments. A model that works in one context can fail in another — environment is part of the design.

The lesson explicitly states that deployment environment determines what an AI encounters — Xiaoice and Tay faced very different user behavior and had different safeguards.

4. What does RLHF (Reinforcement Learning from Human Feedback) do differently from what Tay was designed to do?

Correct. RLHF puts trained human raters between raw user interaction and the model's learning process — a much more robust approach than Tay's direct learning from all user input.

RLHF adds more structured human involvement, not less. The key difference is that curated human ratings replace unfiltered user interaction as the learning signal.

5. When news reports say an AI "went rogue" or "started saying harmful things," what does understanding Tay's story tell you to look for first?

Exactly. Harmful AI outputs are almost always traceable to a design decision about optimization — not random malfunction. The question is always what the system was built to maximize.

The lesson ends by telling you exactly what question to ask first: what was the system optimizing for? Harmful outputs follow from optimization targets, not from mysterious malfunction.

Lab 3: The Design Review

A startup wants to launch a learning chatbot for middle school students. You're their AI safety reviewer — and you've read the Tay files.

Your Role: AI Safety Reviewer

EduBot Labs is launching "StudyPal" — a chatbot that learns from each student's questions and adapts its teaching style based on what kinds of responses students engage with most. It will be deployed to 50,000 middle school students in September. They've asked you to review the design before launch.

Your AI partner below is the lead engineer at EduBot Labs. They're confident in the product. They want your approval. You need to identify the specific risks you see in their design — based on what you know about Tay, optimization targets, and adversarial input — and push until they either address your concern or admit the gap.

Start by identifying the single biggest red flag you see in the StudyPal design as described. Be specific — name the concept it maps to and why it matters here.

EduBot Labs — Safety Review Session

LIVE

We're excited about this review. StudyPal is designed to be genuinely responsive — students shape it. If they engage more with humor, it gets funnier. If they prefer detailed explanations, it goes deeper. We think this is the future of personalized learning. What's your concern?

Module 3 · Lesson 4

When the Algorithm Decided Who Got Healthcare

In 2019, a landmark study in the journal Science revealed that a healthcare algorithm used on tens of millions of Americans was systematically giving Black patients less medical care than equally sick white patients — because it used healthcare spending as a stand-in for healthcare need.

What is the difference between what an algorithm measures and what it is supposed to measure — and who pays the price when those two things are not the same?

Researchers Ziad Obermeyer, Brian Powers, Christine Vogeli, and Sendhil Mullainathan published a study that would become one of the most cited papers in AI ethics. They had analyzed an algorithm used by hospitals, insurance companies, and health systems across the United States — a system that was being used to identify which patients needed extra care and should be enrolled in "care management programs."

The algorithm ranked patients by risk. High-risk patients got more attention, more follow-up calls, more specialist referrals, more resources. It was supposed to identify the sickest patients. It was used on an estimated 200 million people per year.

The researchers found something devastating. At any given level of medical illness — the same number of chronic conditions, the same disease severity — the algorithm consistently gave Black patients lower risk scores than white patients. To get the same level of automated care support as a white patient, a Black patient had to be significantly sicker. The algorithm was not identifying the sickest patients. It was identifying the patients who had previously cost the most money.

The Substitution Error That Changed Everything

The algorithm's designers faced a genuine technical problem: how do you measure how sick someone is without examining them? Medical records are uneven. Diagnoses are incomplete. So the team made a decision that seemed reasonable at the time: they would use healthcare costs as a proxy for healthcare need. Sicker patients generally cost more, so costs seemed like a workable stand-in for illness severity.

The problem is that costs measure what the healthcare system actually spends on someone — and the healthcare system in the United States has historically spent less on Black patients than on equally sick white patients. Due to a combination of unequal access, systemic distrust of medical institutions built up over generations of abuse (including documented government-sanctioned medical experiments on Black people without consent), and economic barriers, Black patients on average used fewer healthcare dollars even when just as sick.

The algorithm learned from that history. It saw that Black patients had lower costs. It concluded that Black patients were less sick. It gave them lower risk scores. It sent them fewer resources. And by doing so, it reinforced the inequality it had been trained on — because patients who received less care stayed sicker, which generated lower future costs, which generated lower future risk scores, in a self-reinforcing loop.

Proxy variableA measurement used in place of something harder to measure directly. Healthcare costs were used as a proxy for healthcare need — but the proxy was corrupted by historical inequality.

Feedback loopWhen an AI's outputs affect the world in ways that then feed back into the AI's training data, reinforcing the AI's original conclusions — even if those conclusions were wrong.

Scale Makes Everything Worse

The algorithm at the center of the Obermeyer study was built by a company called Optum, a subsidiary of UnitedHealth Group — one of the largest health insurance companies in the world. The study did not identify the client hospitals or reveal which specific product was involved, but Optum acknowledged the issue after publication and said they would work to correct the bias in the system.

Obermeyer's team estimated that the racial bias in the risk scores was large enough that if the algorithm were recalibrated to identify the correct patients based on illness rather than cost, the number of Black patients receiving extra care would increase by more than 80 percent. That is not a small rounding error. That is a structural failure affecting millions of people's access to medical resources every year — operating invisibly inside what looked like a neutral, data-driven system.

This is what institutional-scale AI failure looks like. Not a chatbot saying something offensive. Not one person wrongfully arrested. A system quietly making life-and-death resource allocation decisions for hundreds of millions of people, with systematic bias built into its core measurement, undetected for years because the outputs looked like math, and math looks like objectivity. Understanding this is the difference between seeing AI as a tool and seeing AI as a system that inherits, encodes, and scales human inequality.

The Ethical Knot

The designers of this algorithm did not intend to harm Black patients. They used what seemed like a reasonable proxy. The harm came anyway, at massive scale, for years. How do you create accountability for harm that results not from malice, but from a reasonable-seeming technical decision that turned out to have devastating unequal consequences? And who is responsible for finding it — the company that built it, the hospitals that used it, the regulators who never required independent auditing, or all three?

What Auditing AI Actually Looks Like

The Obermeyer study is important not just because of what it found but because of how it found it. The researchers did not hack into Optum's system. They obtained data from a large academic medical center that was using the algorithm, anonymized it, and then did something that should be routine but is rarely done: they compared the algorithm's risk scores to actual measured illness — the number of chronic conditions each patient had — and looked at whether the scores treated patients differently by race at the same level of illness.

That process is called a disparate impact audit — checking whether an AI system produces systematically different outcomes for different demographic groups, even if the system was not explicitly designed to use those characteristics. It is one of the most powerful tools for finding hidden bias in AI systems. And it is almost never required by law in the United States for healthcare algorithms, hiring algorithms, or most other high-stakes AI systems.

Some jurisdictions are changing this. In 2021, New York City passed Local Law 144, which requires companies using AI in hiring to conduct annual bias audits. The European Union's AI Act, which began phasing into law in 2024, requires risk assessments and some transparency requirements for high-risk AI systems. But the coverage is still narrow, the requirements are still weak compared to the potential for harm, and enforcement remains limited.

You now understand what most people who read news stories about AI regulation do not understand: auditing an AI system is not just checking if it works — it is checking if it works equally for everyone. These are completely different questions, and the second one is almost never answered unless someone specifically asks it.

What You Now Understand

Every high-stakes AI system — in healthcare, criminal justice, housing, education, lending — is making decisions based on proxies. The proxy is never the real thing. And if the proxy was shaped by a world that treated people unequally, the algorithm will reproduce that inequality at scale, invisibly, for as long as it runs unchecked. The only remedy is independent auditing — and in most places, nobody requires it. That is a policy choice, not a technical inevitability.

Lesson 4 Quiz

Five questions on the healthcare algorithm, proxy variables, and disparate impact auditing.

1. The healthcare risk algorithm described in the 2019 Science paper used healthcare costs as a proxy for healthcare need. Why was this proxy corrupted?

Correct. The proxy measured what the system spent, not what patients needed — and those two things were not equal across racial groups due to historical inequity in healthcare access.

The key is that spending reflects access, not just illness. Because Black patients historically received less care, they cost less — not because they were less sick, but because the system served them less.

2. A loan algorithm is trained on historical approval data. Banks historically approved fewer loans in predominantly Black neighborhoods due to a practice called redlining. The algorithm learns this pattern and continues it. What is this an example of?

Exactly. This is historical bias creating a feedback loop — past discrimination shapes training data, the algorithm learns that discrimination as "correct," and then its decisions perpetuate the discrimination going forward.

This scenario maps precisely to historical bias and feedback loops — the algorithm inherits inequality from the training data and then reinforces it through its own decisions.

3. According to the Obermeyer study's estimates, how much would the percentage of Black patients receiving extra care increase if the algorithm were corrected to use actual illness instead of cost?

Right. More than 80 percent — that is not a rounding error, it is a structural failure. The algorithm was systematically misidentifying who needed care at a scale that affected millions of people.

The number is much larger — more than 80 percent. That scale of difference tells you this was not a minor calibration issue but a fundamental flaw in what the algorithm was measuring.

4. What is a "disparate impact audit" and what does it check for?

Correct. Disparate impact auditing looks at outcomes by group — not at the code — to find hidden bias that was not explicitly programmed in but emerged from training or proxy selection.

Disparate impact auditing is about outcomes, not code inspection. It asks: do different groups of people receive different results from this system at the same level of input quality?

5. New York City's Local Law 144 and the EU AI Act both represent attempts to address AI bias. Based on the lesson, which limitation do both approaches still share?

Right. The lesson explicitly notes that coverage is narrow, requirements are still weak relative to potential harm, and enforcement is limited. Regulatory progress is real but far from comprehensive.

The lesson acknowledges both laws as progress but immediately qualifies: narrow coverage, weak requirements relative to harm, and limited enforcement. Most high-stakes systems remain unaudited.

Lab 4: The Policy Hearing

Your city council is deciding whether to require independent audits of AI systems used in public services. You have three minutes to make the case.

Your Role: Expert Witness

The city of Harrington uses AI systems to allocate public health resources, screen candidates for city jobs, and flag social services cases for review. A council member is proposing a mandatory annual disparate impact audit for all three systems. Another council member — your AI partner below — thinks this is unnecessary overregulation that will slow down government services and cost taxpayers money.

You are testifying as an expert witness in favor of the audit requirement. You have four case studies behind you: Robert Williams, Amazon's hiring AI, Tay, and the healthcare algorithm. Use them. Be specific. And push back when the council member deflects.

Make your opening statement — why should Harrington require mandatory audits? Ground it in at least one specific case from this module.

Harrington City Council — Policy Hearing

LIVE

I'll be direct with you. These systems have saved time and money. Our public health algorithm has helped us target outreach more efficiently. Our hiring system reduced the number of unqualified applicants reaching interview stage by 40 percent. What you're proposing is expensive, slow, and based on hypothetical harms. Convince me these audits are worth the cost.

Module 3 — Final Test

15 questions across all four lessons. Score 80% or higher to pass.

1. Robert Williams was arrested in 2020 because a facial recognition system returned a false match. What should have happened — but did not — before his arrest warrant was issued?

Correct. The failure was insufficient human verification — the AI's output was treated as conclusive rather than as a lead to be investigated.

The core failure was about human verification. The AI produced a candidate match; a human should have confirmed it before any arrest was made.

2. Which of the following best defines "training data bias"?

Correct. Training data bias is specifically about the AI inheriting and reproducing real-world inequalities embedded in its training data.

Training data bias is about what the data contains — reflections of real-world inequality — not about size, speed, or intentional error.

3. Amazon's hiring AI downranked graduates of all-women's colleges even after engineers removed explicitly gendered words. What does this tell you about fixing bias in AI systems?

Exactly. The engineers could only patch what they could see. The model may have learned many other proxies for gender that were invisible to the team.

The lesson is explicit — engineers could not be sure what else the model had learned. Patching visible patterns does not address the unknown learned patterns underneath.

4. What is "proxy discrimination" in the context of AI systems?

Correct. Proxy discrimination uses seemingly neutral variables — school name, zip code, word choice — as indirect measures of characteristics the system is not supposed to use.

A proxy variable stands in for something else. Proxy discrimination is when a neutral-seeming variable effectively substitutes for a protected characteristic like race or gender.

5. Microsoft launched Tay on March 23, 2016. Why did the chatbot begin posting hateful content within hours?

Correct. The system optimized for engagement and learned from user input without any filtering — coordinated adversarial users exploited both of those design choices.

Tay's failure was a direct consequence of its design: learn from users, optimize for engagement. Coordinated users treated those design choices as an attack surface.

6. Tay had previously been deployed as Xiaoice in China and worked well. What critical lesson did Microsoft fail to apply when building Tay?

Right. Xiaoice's success did not guarantee Tay's success — the environments were different in ways that mattered enormously for what the systems would encounter.

The lesson is clear: context matters. What worked in one environment failed in another. Deployment context is a design variable, not a given.

7. What does RLHF — Reinforcement Learning from Human Feedback — do that makes it more robust than Tay's approach?

Correct. RLHF puts trained human raters between user interaction and model learning — instead of learning from all user input directly, the model learns from curated human evaluations.

RLHF adds structured human judgment as the learning signal. The model learns from what human raters approve of, not from direct user input — that is the key difference.

8. The 2019 Science paper by Obermeyer and colleagues found that a healthcare algorithm consistently gave Black patients lower risk scores than equally sick white patients. What was the root cause?

Correct. The proxy variable (cost) was corrupted by historical inequality in healthcare access, causing the algorithm to measure access inequality rather than illness severity.

The algorithm did not use race directly — that is what makes this so difficult. The bias came through a corrupted proxy variable: cost, which reflected unequal access, not unequal illness.

9. What does a "feedback loop" mean in the context of AI systems producing biased outcomes?

Right. A feedback loop is when the AI's current decisions shape the data it will learn from next — bias becomes self-reinforcing over time.

A feedback loop is about consequences feeding back into training data. The AI's biased decisions create a world that looks like the bias was correct, which the next version of the model then learns from.

10. The 2019 healthcare algorithm study estimated that correcting the bias would increase the number of Black patients receiving extra care by more than 80 percent. Why does this number matter for how we evaluate AI systems in high-stakes settings?

Exactly. The scale of the disparity shows that what looked like a functional, data-driven system was actually producing massive systematic inequality in healthcare resource allocation.

The 80 percent figure tells you about the depth of the bias — not an overall error rate, but the gap between who was getting care and who should have been. That scale shows structural, not incidental, failure.

11. Which city passed a law in 2021 requiring annual bias audits for companies using AI in hiring decisions?

Correct. New York City's Local Law 144 was one of the first laws in the U.S. to require mandatory disparate impact auditing for AI hiring tools.

The lesson identifies New York City's Local Law 144, passed in 2021, as requiring annual bias audits for AI hiring tools.

12. A housing algorithm trained on historical mortgage data denies loans to applicants from certain zip codes at higher rates. An audit finds these zip codes correspond strongly to historically redlined neighborhoods. This is an example of which combination of concepts?

Right. Historical bias (redlining encoded in data), proxy discrimination (zip code standing in for race), and a feedback loop (continued denials shape future creditworthiness data) all apply here.

This scenario involves multiple overlapping concepts. The historical bias in the training data, zip code as a racial proxy, and the self-reinforcing nature of denial decisions all make this a compound failure.

13. What is the key question the lesson says you should ask about any AI system making decisions about people — regardless of whether the system is "accurate on average"?

Exactly. Accuracy on average can hide systematic failure for specific groups. The right questions are about explainability, appeal, correction, and equal performance across groups.

The lesson is explicit: average accuracy is not enough. The critical questions are about transparency, appeal mechanisms, and whether the system works equally for everyone.

14. All four case studies in this module — facial recognition, hiring AI, Tay, and the healthcare algorithm — share a common underlying pattern. Which of the following best describes it?

Correct. In every case, the system measured something tractable rather than what actually mattered: similarity score instead of identity, past hires instead of future potential, engagement instead of quality, cost instead of need.

The common thread across all four cases is the gap between what was measured (the proxy) and what should have been measured (the actual goal) — and the harm that gap created at scale.

15. Someone tells you, "AI decisions are objective because they are based on data and math, not on human feelings or prejudice." Based on this module, what is the most accurate response?

Exactly right. The math does not neutralize the bias — it encodes it and gives it a veneer of objectivity. The cases in this module all demonstrate that dynamic operating at scale.

Every case in this module disproves the claim that math equals objectivity. AI systems inherit bias from data; math gives that bias a false appearance of neutrality, which makes it more dangerous, not less.