Module 2 · Lesson 1

The Internet Grew Up Lopsided

Training data is a mirror — and the mirror was built by some people, not all people.

If a machine learns from what humans wrote, what happens when humans were biased first?

In 2016, researchers at Harvard, Boston University, and the University of Washington ran an unusual experiment. They pointed a set of AI language tools at ordinary job listings and asked the tools to complete phrases about professions.

The results were jarring. When the AI completed the phrase "a doctor walked into the room and he..." it continued smoothly, confidently. When the same prompt used "she," the AI reached for nurses, receptionists, and assistants instead. When asked to associate names with pleasant or unpleasant words, the tools matched European-American names with pleasant terms and African-American names with unpleasant ones — at rates strikingly similar to results from the Implicit Association Test, a psychology tool used to measure unconscious human bias.

The AI hadn't been programmed to discriminate. Nobody told it that doctors are usually men or that some names were "better." It had simply read the internet — billions of pages of text written by humans across decades — and absorbed what was there. The machine learned bias the same way a kid might pick up a prejudice without anyone explicitly teaching it: by soaking in a world that already had one.

The researchers called their paper "Man is to Computer Programmer as Woman is to Homemaker?" It became one of the most cited AI fairness papers of the decade.

What Is Training Data, Exactly?

Before an AI can do anything — recognize your face, rank job applicants, suggest what video to watch next — it has to learn. And the way most modern AI learns is by studying enormous collections of examples. That collection is called training data.

Think of it this way. Imagine you wanted to teach a younger sibling what a "good essay" looks like, and you gave them a stack of 10,000 essays to read. Whatever patterns appear in that stack — what kinds of openings get praised, what topics show up most, whose writing style gets called elegant — your sibling will start to treat as the definition of good. They won't know they're being shaped. They'll just think they have opinions.

AI does the same thing, except instead of 10,000 essays, it might study 100 billion sentences. The scale is incomprehensible, which makes it feel neutral and vast. But vast doesn't mean balanced. It just means the patterns are bigger.

Training dataThe collection of examples — text, images, numbers, audio — that an AI studies in order to learn patterns and make predictions.

Word embeddingA technique where an AI represents words as positions in mathematical space. Words that appear together often end up "close" to each other — which means stereotyped associations get encoded as geometry.

Who Built the Internet, and Who Got Left Out?

The internet that AI learns from was not built by everyone equally. In 2023, women wrote roughly 26% of Wikipedia articles in English. Speakers of Mandarin, Hindi, Bengali, and Swahili — languages representing billions of people — account for a fraction of the text that major AI systems were trained on. The overwhelming majority of pre-2020 internet text was in English, written by people with internet access, in wealthier countries, mostly in their 20s and 30s.

This matters because the gaps in the training data become gaps in what the AI knows and values. An AI trained mostly on American and British text will "understand" English-language idioms better than Nigerian English. It will have seen more articles about Wall Street than about the Lagos Stock Exchange. It won't be wrong about everything outside its training — but it will be uncertain, shallow, and sometimes outright wrong in ways the system itself can't detect.

The Harvard/BU study showed something even more troubling: it's not just that AI knows more about some groups. It's that the associations it learned — who belongs in which job, which names sound trustworthy, which accents signal intelligence — reflect a world that was already unequal.

The Hidden Assumption

When engineers say their AI "learned from real-world data," they're often trying to sound scientific and objective. But "real-world data" means the data that existed, was collected, and was digitized — which is very different from data that represents all real people equally. The word "real" is doing a lot of heavy lifting there.

The Ethical Knot: Can You Fix a Biased Dataset?

Here's where it gets genuinely hard. Suppose you know your training data is biased. You can try to fix it — add more text written by women, more articles in non-English languages, more examples from underrepresented communities. Researchers call this rebalancing or debiasing the dataset.

But there's a problem. If you add synthetic data — text that wasn't really written by the people it's supposed to represent — you risk creating a different kind of distortion. And if you remove biased examples, you might accidentally remove the only evidence that bias existed, making the AI seem fairer than it is without actually being so.

Some researchers argue that any human-generated dataset will carry human prejudice, full stop, and that the only honest response is radical transparency: tell everyone exactly what was in the training data, who created it, and what its known gaps are.

Others argue that transparency without action is useless — that disclosing a problem you don't fix is just a legal shield.

Sit With This

If every large dataset reflects the biases of the world that created it, is there such a thing as a truly fair AI — or are we always just choosing whose biases get encoded? There is no clean answer here. Researchers, lawyers, and ethicists argue about this right now.

You now understand something most people miss when they read a headline like "AI Is Biased." They picture an engineer deliberately making a racist machine. What you know is more subtle and more important: bias usually isn't programmed in on purpose. It's absorbed, quietly, from data that reflects a world that was already tilted. Knowing that changes how you evaluate every claim that an AI is "objective" or "data-driven."

Lesson 1 Quiz

Five questions — test your reasoning, not just your memory.

1. The 2016 Harvard/BU/UW study found that AI language tools associated African-American names with unpleasant words more than European-American names. What was the PRIMARY reason for this?

Correct. The AI was not programmed to discriminate — it learned associations from patterns in the text it studied, which already reflected human bias.

Not quite. The study's key finding was that the bias came from the training data itself, not from intentional programming decisions.

2. A company says their hiring AI is "objective because it was trained on real-world data." Based on Lesson 1, what's the strongest challenge to that claim?

Correct. "Real-world data" just means data that existed and was collected — it doesn't mean it represents all people equally.

Think about who creates the data that gets collected. Is the internet written by everyone equally?

3. What is a "word embedding" in the context of AI bias?

Correct. Word embeddings turn words into positions in mathematical space. Words that co-occur frequently end up close together — which encodes stereotyped associations as geometry.

Re-read the key terms. Word embeddings are about how AI represents the relationships between words mathematically.

4. A researcher proposes fixing a biased training dataset by adding synthetically generated text from underrepresented groups. What is the main concern with this approach?

Correct. Representing people through synthetic data risks replacing one kind of bias with another — the bias of whoever wrote the synthetic text about those communities.

Think about who would write that synthetic text and whose assumptions would shape it.

5. In 2023, approximately what percentage of English Wikipedia articles were written by women?

Correct. Women wrote roughly 26% of English Wikipedia articles in 2023 — meaning the dominant reference source that many AI systems learn from skews heavily toward male perspectives.

The actual figure is around 26% — significantly less than half, which means one of the internet's largest knowledge sources is shaped mostly by male contributors.

Lab 1 — Dataset Detective

You are an auditor. Your job is to interrogate the data, not accept it.

The Scenario

A startup has built an AI that recommends which college applicants get a second look from human admissions officers. They trained it on 15 years of successful alumni data from their partner universities. They say it's "purely merit-based."

Your lab partner — CLIO, an AI research assistant — is waiting to dig into this with you. CLIO won't lecture you. CLIO will push back, ask hard questions, and expect you to take a position.

Opening move: Tell CLIO one specific reason why "15 years of alumni data" might not be a neutral starting point for a fairness audit. Then ask CLIO what you should investigate next.

CLIO — Research Partner

Dataset Bias Audit

Auditor. I've been handed the same dataset brief you have. "Fifteen years of alumni data from partner universities." Before we dive in — what's your first red flag? Don't be vague. Give me something specific to work with.

Module 2 · Lesson 2

The Label Makers

Before an AI learns, a human decides what counts as right and wrong. That decision is never innocent.

Who decides what "correct" looks like — and what do they bring with them when they do?

In 2019, journalist and researcher Mary Gray and computer scientist Siddharth Suri published a book called Ghost Work. It documented a system most people never see: the army of contractors — often in Kenya, Venezuela, the Philippines, and rural America — who teach AI by labeling data for a few dollars an hour.

These workers, called annotators, do tasks that sound deceptively simple. Is this image of a sidewalk "accessible" or "not accessible"? Is this sentence "toxic" or "not toxic"? Is this loan application "risky" or "low risk"? The answers these workers give become the ground truth that AI systems are trained to reproduce. If the annotators decide it's toxic, the AI learns toxic. If they miss it, the AI learns to miss it too.

One company, Scale AI, was paying annotators in Kenya around $1–3 per hour in 2022, according to an investigation by TIME magazine published in January 2023. These workers were labeling data for AI products used by major American corporations — making multi-million-dollar systems run while earning less per hour than the coffee that engineers drank while reviewing their work.

But here's the question that runs deeper than wages: whose understanding of "toxic," "risky," or "accessible" is being encoded into these systems? A worker in Nairobi, trained on American content guidelines, applying American social media company standards, to rate speech patterns they may have never encountered in daily life — what does that process actually produce?

What Is a Label — and Why Does It Matter?

Most AI that makes real decisions — spam filters, content moderation tools, medical diagnostic systems — is trained using supervised learning. That means the training data doesn't just contain examples; it contains examples with answers attached. This email is spam. That tumor scan is malignant. This tweet is hate speech.

Those attached answers are called labels, and they are created by people. Sometimes by domain experts (doctors labeling medical images). More often by cheaper labor: crowdworkers clicking through thousands of examples on platforms like Amazon Mechanical Turk or Appen.

Labels seem objective — a fact attached to a piece of data. But every label is actually a judgment, made by a specific person, in a specific context, using criteria that someone else wrote. The AI never learns that a tweet "is" hate speech. It learns that certain humans, at a certain moment, reading certain guidelines, called it that.

AnnotationThe process of adding labels to training data — tagging images, sentences, or records with categories the AI should learn to recognize.

Annotator disagreementWhen different human labelers give different answers to the same example. Researchers often resolve this by majority vote — which buries minority perspectives in the data.

The Power Hidden in the Guidelines

When a company hires annotators, it gives them a document — sometimes hundreds of pages long — called an annotation guideline. It defines every category. What counts as violence? What level of sexual content is "explicit"? When is criticism of a government "political speech" versus "dangerous misinformation"?

These guidelines are written by policy teams, lawyers, and product managers at major tech companies — almost exclusively based in California and New York. The values embedded in those documents are the values of specific professional cultures at specific companies in specific American cities.

Researchers have documented concrete problems this creates. In 2021, a paper in Proceedings of the ACL found that hate speech detection AI performed significantly worse on tweets written in African-American Vernacular English (AAVE) — because the annotators, guided by standard English norms, were more likely to label AAVE text as toxic even when the content wasn't. The AI then learned to associate a dialect with harm.

This is not a small edge case. It's a systematic pattern where the people writing the rules and the people applying the rules are different from the people most affected by the rules — a recipe for encoded injustice that's invisible in the final product.

Institutional Stakes — For Older Readers

Content moderation AI shapes what speech is allowed on platforms used by billions of people. Loan-risk AI determines who gets credit. Medical AI determines who gets flagged for follow-up care. The labeling decisions made by underpaid contractors today become policy decisions that affect real institutions tomorrow. The people with the least power in the annotation pipeline bear the most consequences from its outputs.

The Ethical Tension: Whose Judgment Counts?

Here's the knot: AI systems need labels to learn. Labels require human judgment. Human judgment varies — by culture, by language, by experience, by the guidelines someone handed you on your first day of a gig-economy job. There is no view from nowhere.

Some researchers have proposed participatory design: involving the communities most affected by an AI system in writing the annotation guidelines, not just in doing the labeling work. Others argue this adds complexity without fixing the core problem — that scale requires standardization, and standardization always privileges some norms over others.

The hardest version of the question is this: if two annotators disagree about whether a piece of content is harmful, whose answer should count? Most current systems resolve disagreement by majority vote — meaning minority perspectives, minority dialects, and minority cultural contexts are systematically outvoted in the very data that will govern them.

Sit With This

Is it possible to build a labeling system that doesn't privilege some cultural framework over others — or is that always unavoidable? If it's unavoidable, who should be the ones to make the choice, and how should they be held accountable?

You can now see what most people miss when they use a content moderation tool or a spam filter: somewhere behind that decision, a human being got paid a few dollars to make a call, following guidelines written by someone else, and the AI turned that call into a rule applied to millions of people. The objectivity was always an illusion. The question is just how visible we let the subjectivity be.

Lesson 2 Quiz

Five questions on labels, annotators, and hidden power.

1. According to the 2021 ACL paper mentioned in Lesson 2, what happened when hate speech AI was applied to tweets written in African-American Vernacular English (AAVE)?

Correct. Annotators using standard English guidelines were more likely to label AAVE as toxic, and the AI then learned to associate a dialect with harm.

The research found the opposite — the AI performed worse and more unjustly on AAVE, associating a dialect with harm.

2. What does "annotator disagreement" mean, and why does how it's resolved matter for fairness?

Correct. Majority-vote resolution means minority perspectives, dialects, and cultural contexts are systematically outvoted in the very data that will govern them.

Think about what "disagreement" means when multiple humans label the same thing differently — and what happens to the minority opinion when you resolve it by voting.

3. A medical AI is trained to flag "high-risk" patients using labels created by hospital administrators who work primarily with patients from wealthy urban areas. A rural community hospital starts using this AI. What's the most likely fairness problem?

Correct. Labels encode the experience of the people who created them. When those people's context doesn't match the deployment context, the AI's judgments can systematically fail new communities.

Think about what "high-risk" meant to the people who wrote the labels. Does their experience match rural patients?

4. What is an "annotation guideline" and who typically writes them for major AI products?

Correct. The values embedded in annotation guidelines reflect the professional cultures of the companies that write them — which are predominantly based in a small number of American cities.

Annotation guidelines are written by humans at specific companies in specific places — not neutrally or collaboratively.

5. "Ghost Work" by Mary Gray and Siddharth Suri documented annotators in countries like Kenya and the Philippines earning $1–3 per hour. Why does the economic power gap between annotators and AI companies matter for fairness?

Correct. The economic gap is also a power gap: the people doing the labeling have no say in the guidelines, no visibility into how the AI is deployed, and often bear consequences from systems their labor helped create.

Think beyond wages. Who has power to shape the guidelines? Who has no power? And who ends up living under the AI's decisions?

Lab 2 — The Annotation Hearing

You are an independent reviewer. A company's labeling process is under scrutiny.

The Scenario

A major social media company's content moderation AI is flagging political speech from users in Southeast Asia at five times the rate of equivalent content from US users. A watchdog group has asked you to review the annotation process. You have access to CLIO, who has been briefed on the case.

CLIO will not tell you what to conclude. CLIO will challenge your reasoning and ask you to back up your claims.

Start by identifying: what is the first document or piece of evidence you would request from the company, and why? Tell CLIO your reasoning — not just your answer.

CLIO — Research Partner

Annotation Process Review

I've reviewed the watchdog group's preliminary report. Southeast Asian political content is being flagged at five times the US rate. That's a significant disparity. Before we make any accusations — what's the first evidence you want from this company, and what would it tell you?

Module 2 · Lesson 3

Feedback Loops and Frozen Worlds

Sometimes an AI doesn't just reflect the past — it locks it in place.

What happens when a biased AI's decisions become the next round of training data?

By 2018, a software product called PredPol — later renamed Geolitica — was being used by police departments in Los Angeles, Atlanta, Santa Cruz, and dozens of other American cities. The software analyzed historical crime data and predicted where crime was most likely to happen next, down to a 500-square-foot area. Police would then patrol those areas more heavily.

The idea seemed logical: use data to put officers where they're most needed. But researchers at the Human Rights Data Analysis Group, led by statistician Kristian Lum, published a 2016 analysis that exposed a critical flaw. The historical crime data PredPol used was not a record of where crime happened. It was a record of where crime was reported — and where police had previously patrolled.

In cities like Oakland and Los Angeles, decades of data showed heavy policing in Black and Latino neighborhoods — not necessarily because those neighborhoods had more crime, but because that's where police had historically concentrated. PredPol absorbed that history and concluded: send officers to those same neighborhoods. More officers meant more arrests. More arrests meant more data points confirming those neighborhoods as "high crime." Which meant the algorithm sent more officers there next time.

The bias didn't just reflect the past. It amplified it and justified it — turning a historical policing pattern into a mathematical recommendation that looked objective, scientific, and inevitable.

What Is a Feedback Loop?

A feedback loop happens when the output of a system feeds back in as the input for the next cycle. In the PredPol case: algorithm output (more patrols) → more arrests in those areas → more data from those areas → algorithm reinforced to keep sending patrols there.

Feedback loops are not always bad. Your body uses feedback loops constantly — when you get too hot, you sweat, which cools you down, which tells your body to stop sweating. That's a corrective feedback loop. The problem with many AI systems is that they create amplifying feedback loops — ones where the output makes the original pattern stronger, not weaker.

In a criminal justice system, an amplifying feedback loop is particularly dangerous because the people being targeted have no way to prove they were over-policed rather than genuinely higher-crime. The arrest record exists. The algorithm reads the arrest record. The arrest record becomes evidence. The pattern is self-sealing.

Feedback loopWhen an AI's outputs become inputs for its next training cycle, reinforcing and amplifying whatever patterns existed in the original data — including biased ones.

Proxy variableA data point that stands in for something the AI can't directly measure. "Arrest history in an area" is a proxy for "crime in an area" — but proxies can encode exactly the bias you were trying to avoid.

The Proxy Problem: Measuring the Wrong Thing

Feedback loops often work through proxy variables. When you can't measure what you actually want to know, you measure something you can — and hope it's a reliable stand-in.

PredPol wanted to measure crime. It couldn't directly count crimes that happened (since most crime is never reported). So it used arrest records and police incident reports — which are proxies for crime, but are actually measuring police activity as much as criminal activity. The proxy was corrupted by the same bias the system was supposed to transcend.

This same problem appears across AI systems. A healthcare algorithm studied by Obermeyer et al. in Science in 2019 was designed to identify high-risk patients who needed extra care. It used healthcare costs as a proxy for health need — because sicker patients should cost more to treat. But Black patients received less medical care than white patients with the same health conditions, due to systemic inequities in healthcare access. So their costs were lower. The algorithm read low costs as low risk, and ranked them as needing less additional care — compounding an existing disparity with algorithmic authority.

The Scale Problem

What made PredPol and similar systems especially consequential is that they were applied at scale — across entire cities, affecting thousands of people's encounters with police, which in turn shaped their records, their opportunities, their lives. A biased human officer affects some encounters. A biased algorithm running 24 hours a day, applied to an entire metropolitan area, encodes that bias into the infrastructure of a city.

The Ethical Impasse: Can You Audit a Loop You're Inside?

Here's the hardest part: feedback loops are nearly impossible to audit from inside the system. If you only have the current data — arrests, patrol patterns, flagged incidents — how do you distinguish "this neighborhood has more crime" from "this neighborhood was policed more, so it has more records"? The answer requires data you don't have: what would crime look like if policing had been distributed equally from the start?

That counterfactual — the world that didn't happen — is exactly what a feedback loop destroys. It overwrites the pre-loop reality with the post-loop data, making the past inaccessible. Some cities have responded by abandoning predictive policing software entirely. Santa Cruz, California, banned predictive policing tools in 2020, the first city in the US to do so. Other jurisdictions continue using them.

The question isn't just technical. It's about what systems of accountability we want to build before we deploy AI — not after we discover the loop is already running.

Sit With This

If a feedback loop has been running for ten years, can you ever fully unwind it — or has the biased data permanently altered the landscape the AI operates in? And if you can't fully unwind it, what does "fixing" the AI even mean?

Knowing this changes how you read every headline about AI and policing, healthcare, or credit scoring. When a company claims its AI is "data-driven and objective," you can now ask: what data? From where? Does that data reflect a world that was already unequal? And does the AI's current output become tomorrow's training set? Those questions are not cynical — they're responsible.

Lesson 3 Quiz

Five questions on feedback loops, proxies, and frozen inequities.

1. Kristian Lum's 2016 analysis of PredPol found a critical flaw. What was it?

Correct. The fundamental flaw was using patrol/arrest history as a proxy for actual crime — which encoded decades of biased policing patterns into an algorithm presented as objective prediction.

Re-read the story section. The issue was about what the "crime data" was actually measuring — which wasn't the same as where crime occurred.

2. A student recommendation algorithm is trained on data from a school district for five years. Students who receive recommendations attend college at higher rates and get better jobs. The school then uses those alumni outcomes to further train the algorithm. What type of bias risk does this create?

Correct. This is a textbook amplifying feedback loop — the algorithm's outputs (who gets recommended) shape the outcomes that then train the next version, making the system increasingly confident in its original biases.

Think about which students' outcomes are being collected. Only the ones the algorithm already recommended — not the ones it passed over.

3. The 2019 Obermeyer et al. study in Science found a healthcare algorithm that underestimated the risk of Black patients. What was the flawed proxy variable at the center of this problem?

Correct. The proxy (cost) was supposed to represent health need, but it also captured inequality in healthcare access — meaning the algorithm learned the disparity rather than the underlying health difference.

The key insight was about what "cost" actually measures when healthcare is distributed unequally. Re-read the proxy variable section.

4. Santa Cruz, California, became the first US city to ban predictive policing tools in 2020. Critics of this decision argued that without algorithmic tools, policing would be even more subject to individual officer bias. Which principle from Lesson 3 best responds to this argument?

Correct. The scale, persistence, and self-reinforcing nature of algorithmic bias are qualitatively different from individual bias — which is exactly why feedback loops are so dangerous in high-stakes contexts like policing.

Think about scale. How many people does a biased officer affect versus a biased algorithm applied to an entire city?

5. Why is it so difficult to audit a feedback loop from inside the system that created it?

Correct. To audit a feedback loop, you'd need to know what crime rates looked like before the biased policing pattern — but the loop has overwritten that reality with its own data, making the baseline inaccessible.

The deeper issue is epistemological — you've lost the counterfactual. How do you prove what "should" have been when the loop has been running for years?

Lab 3 — Loop Breaker

You are a policy advisor. A city council wants to know whether to keep using a predictive tool.

The Scenario

A mid-sized US city has been using a predictive policing algorithm for six years. The police chief argues it has reduced crime by 12%. A community coalition says the same neighborhoods have been over-policed for six consecutive years, and arrest rates in those neighborhoods are now being used to justify continued heavy patrolling. The city council has asked you for a recommendation.

CLIO has the same briefing you do. CLIO will push you to think through the consequences of your recommendation before you commit to it.

Begin by telling CLIO: is the 12% crime reduction claim good evidence that the algorithm is fair and effective? Explain your reasoning — don't just say yes or no.

CLIO — Research Partner

Feedback Loop Policy Review

I've read the briefing. The police chief's 12% crime reduction is the headline number everyone's going to cite. Before you give the council any recommendation — is that number actually evidence that the algorithm is working fairly? Walk me through your reasoning carefully. I'll push back if I think you're skipping steps.

Module 2 · Lesson 4

Historical Harm, Encoded

Some bias in AI didn't start with the internet. It started with redlining, Jim Crow, and centuries of documented exclusion.

When an AI learns from history, does it inherit history's injustices?

In November 2019, tech entrepreneur David Hansson — co-creator of the Ruby on Rails programming language — posted a thread on Twitter that went viral almost immediately. He had applied for an Apple Card, a credit card launched by Apple and Goldman Sachs that year. His wife, Jamie Heinemeier Hansson, applied separately. Their credit scores were similar. Their shared assets were significant. But the algorithm granted David a credit limit twenty times higher than Jamie's.

David's tweet triggered an avalanche of similar stories. Within days, Apple co-founder Steve Wozniak reported the same pattern — his wife received a credit limit ten times lower than his, despite sharing all accounts jointly.

New York State's Department of Financial Services launched an investigation. Goldman Sachs said their algorithm didn't use gender as a variable — it was, officially, "gender blind." But investigators noted that many of the factors the algorithm did use — employment history patterns, credit utilization rates, certain spending categories — historically tracked gender differences produced by decades of discrimination. The algorithm wasn't using gender. It was using the effects of gender discrimination that had accumulated in economic data over generations.

The case was closed in 2021 without Goldman Sachs being found to have violated any law. No algorithmic fix was publicly announced.

When the Past Lives in the Data

The Apple Card story is a window into one of the most difficult concepts in AI fairness: historical bias. This is what happens when the data an AI is trained on reflects not just neutral human behavior, but the accumulated outcomes of past discrimination.

Consider credit data. For most of American history, women could not obtain a credit card in their own name without a male co-signer. This didn't change until the Equal Credit Opportunity Act of 1974 — meaning that anyone whose credit history began before the mid-1970s was building it under discriminatory rules. Even after 1974, research documented that women, Black Americans, and Hispanic Americans systematically received less credit, at higher interest rates, in smaller amounts — producing the disparate credit histories that an AI model trained on that data will "see" as just the way things are.

The same pattern runs through housing data (shaped by federally-sanctioned redlining from the 1930s–1960s), employment data (shaped by discriminatory hiring), and educational data (shaped by unequal school funding tied to property taxes in segregated neighborhoods). An AI trained on any of these domains is, in a meaningful sense, learning from the documented outcomes of documented injustice.

Historical biasWhen training data reflects the accumulated outcomes of past discrimination — meaning an AI can reproduce the effects of injustice without any programmer intending it.

Disparate impactWhen a system produces different outcomes for different groups, even if it doesn't use group membership as an explicit variable. It's a legal standard and an ethical concept — you don't have to intend to discriminate to produce discriminatory results.

The "Gender-Blind" Fallacy

Goldman Sachs said it: the algorithm doesn't use gender. This is what technologists call being facially neutral — the system doesn't explicitly look at race, gender, or religion. And in the United States, many anti-discrimination laws are written to evaluate whether protected characteristics were used, not whether their effects were reproduced.

But researchers in the field of algorithmic fairness have documented extensively that you don't need to use a protected variable to encode its effects. Employment gaps, because they historically run along gender lines. Neighborhood credit patterns, because neighborhoods were racially segregated by law. Spending categories, because income disparities follow racial wealth gaps. Every one of these factors is a legitimate-sounding, non-discriminatory variable that carries historical discrimination inside it like a hidden passenger.

This is sometimes called redundant encoding — the protected characteristic is technically absent but functionally present, because correlated variables reconstruct it. The algorithm says it doesn't see gender. But its variables effectively do.

The Legal Gap

In 2019, US anti-discrimination law was largely unprepared for disparate impact from algorithms that don't explicitly use protected characteristics. The legal standard for housing discrimination (the Fair Housing Act) includes disparate impact; credit law is more complicated. As of 2024, regulatory agencies and courts are still actively working out what "discrimination" means when the discriminating entity is a mathematical model trained on historical data. This is a genuinely unsettled area of law — and policy decisions about it are being made right now by real institutions.

The Deepest Ethical Question: Repair or Reproduce?

Here's the question that sits at the center of everything in this module: if an AI learns from historical data, it learns from a world that was shaped by injustice. To be truly accurate about that world, the AI must model the inequalities it contains. But to be deployed fairly in this world, the AI must not reproduce those inequalities as recommendations.

Some researchers argue for algorithmic affirmative action — deliberately adjusting AI outputs to correct for historical bias, so that the AI's decisions would look like what decisions should have been in a fair world, rather than reflecting the unfair world that actually existed. Critics of this approach argue it introduces its own distortions, and raises the question of who gets to define what the "fair world" counterfactual looks like.

Others argue the problem is upstream: we should never deploy AI in high-stakes domains like credit, housing, criminal justice, and healthcare until we can demonstrate, rigorously, that its outputs don't produce disparate outcomes — regardless of what variables it uses. This is the regulatory approach taken by some European countries under the EU AI Act, which requires risk assessments for high-stakes AI applications before deployment.

What's clear is that "the algorithm doesn't see race" is not a defense, not a proof of fairness, and not a substitute for actually measuring what outcomes the algorithm produces — for whom, and compared to whom.

Sit With This

If correcting for historical bias in an AI system means deliberately giving some groups better outcomes than the "neutral" model would produce — is that fair? Is it reparative? Is it a new kind of discrimination? Different people — with equally serious ethical commitments — land in very different places on this question.

You now understand something that shapes every major AI policy debate happening in governments and courts right now. The question isn't just whether an AI uses protected variables. It's whether the data it learned from carries the effects of historical discrimination — and whether "accuracy" in a biased world is the same thing as fairness. That distinction is at the center of real lawsuits, real regulations, and real decisions that will affect millions of people in the next decade.

Lesson 4 Quiz

Five questions on historical bias, disparate impact, and encoded injustice.

1. In the 2019 Apple Card controversy, Goldman Sachs stated their algorithm did not use gender as a variable. Why did critics argue this was insufficient as a defense against bias claims?

Correct. This is the concept of redundant encoding — protected characteristics are absent as explicit variables but functionally present through correlated variables that carry their historical effects.

The issue is deeper than which variables were listed. Think about what those neutral-sounding variables actually contain when the world that generated them was discriminatory.

2. The Equal Credit Opportunity Act was passed in 1974. Why does this date matter for understanding bias in modern AI credit systems?

Correct. The historical starting point of the data matters enormously. Credit data generated under discriminatory rules carries those rules' effects long after the rules are officially abolished.

Think about what "credit history" means for people who were legally prevented from building it before 1974.

3. What is "disparate impact" and why is it relevant to AI fairness?

Correct. Disparate impact is both a legal concept and an ethical one — discrimination can occur through outcomes, not just intent. An AI doesn't have to be trying to discriminate to produce discriminatory results.

Disparate impact is about outcomes, not intent — and it can occur even when no protected variable is explicitly used. Re-read the key terms section.

4. A researcher proposes "algorithmic affirmative action" — adjusting AI credit scores upward for groups historically excluded from credit, to correct for historical bias. A critic responds that this introduces new unfairness by giving some people advantages others don't get. What is the strongest tension this debate illustrates?

Correct. This is one of the deepest tensions in algorithmic fairness: what counts as the "neutral" or "fair" outcome when the historical baseline was itself the product of discrimination?

Think about what "fairness" means here. Fairness compared to what? The world as it is, or the world as it should have been?

5. The EU AI Act requires risk assessments for high-stakes AI applications before deployment. How does this approach differ from the "we don't use protected variables" defense?

Correct. The regulatory difference is between input-focused fairness ("we don't use gender") and outcome-focused fairness ("we can demonstrate our outputs don't produce discriminatory results"). The EU AI Act pushes toward outcome measurement before deployment.

Think about where the focus is — on the inputs to the model, or on the outputs it produces for different groups of people.

Lab 4 — The Fairness Brief

You are advising a regulator. The question is what "fair" actually requires.

The Scenario

A bank is launching a mortgage AI. They've submitted a compliance document stating: "The model uses no protected variables (race, gender, national origin) and therefore meets anti-discrimination standards." A housing rights organization has challenged this. You're advising the regulatory agency on whether the bank's statement is sufficient.

CLIO is acting as a sharp senior analyst who has read both the bank's filing and the housing group's challenge. CLIO will test your reasoning and won't let you get away with vague answers.

Tell CLIO: is the bank's statement — "we use no protected variables" — sufficient evidence of fairness? Make an argument, not just a claim. Then ask CLIO what the housing group's strongest counterargument would be.

CLIO — Research Partner

Historical Bias Regulatory Review

I've read both documents. The bank's position is clean and simple: no protected variables, therefore compliant. Before I tell you what the housing group says — what's your initial read? Is "no protected variables" a meaningful guarantee of fairness in a mortgage model? Build an argument, not just an instinct.

Module 2 Test

15 questions — 80% required to pass. Covers all four lessons.

1. The 2016 study "Man is to Computer Programmer as Woman is to Homemaker?" demonstrated AI bias arising from what source?

Correct. The AI learned associations from text written by humans, inheriting the biases already embedded in that text.

The key finding was about how the AI learned from existing human-written text, not about intentional programming.

2. What does a "word embedding" encode when trained on biased text?

Correct. Word embeddings turn word relationships into geometry — which means stereotyped associations from biased text get encoded as mathematical proximity.

Word embeddings capture patterns from the text they're trained on, not just dictionary meanings.

3. In 2023, women wrote approximately what percentage of English Wikipedia articles — a key source many AI systems learn from?

Correct. About 26% — meaning the world's dominant reference source skews heavily toward male authorship.

The figure is approximately 26% — significantly below equal representation.

4. Mary Gray and Siddharth Suri's "Ghost Work" documented annotation workers earning $1–3 per hour. Beyond the wage issue, why does this power gap matter for AI fairness?

Correct. The economic gap is also a power gap — annotators shape the training data that governs people's lives while having no power over the guidelines they follow or the systems they help build.

Think about who has power to define the guidelines, and who has to follow them without any input.

5. A content moderation AI flags political speech from users in one country at five times the rate of equivalent content from another country. A company says this is because annotators from the under-flagged country wrote the guidelines. This is an example of what problem?

Correct. This is a labeling/annotation bias — when guidelines written by one cultural group are applied universally, they encode that group's norms as the global standard.

The root cause here is in who wrote the guidelines and whose cultural norms they reflect. Which lesson does that come from?

6. Predictive policing software like PredPol used historical arrest and patrol data as a proxy for crime. What was the fundamental flaw in this proxy?

Correct. The proxy was corrupted from the start — measuring police presence as much as crime, and thereby encoding the historical pattern of where police had been deployed.

Think about who creates arrest records. They exist because police were there. But does police presence equal crime?

7. An amplifying feedback loop in an AI system means:

Correct. An amplifying feedback loop makes the original pattern stronger, not weaker — the opposite of a self-correcting system.

An amplifying loop makes patterns stronger. Think about PredPol — what happened with each cycle of patrol → arrest → data?

8. The 2019 Obermeyer et al. healthcare study found that a risk-prediction algorithm underestimated risk for Black patients. The proxy variable at fault was:

Correct. Healthcare costs were supposed to represent health need but actually represented care access — encoding systemic healthcare inequality into the algorithm as if it were a biological fact.

The proxy was healthcare cost — but cost didn't mean what the designers assumed it meant when care access is unequal.

9. Santa Cruz became the first US city to ban predictive policing tools in 2020. Which principle from Lesson 3 best justifies why a complete ban might be preferable to trying to fix the algorithm?

Correct. The feedback loop destroys the baseline you'd need to audit it — making a clean break potentially more honest than an "improvement" that still operates inside a corrupted data environment.

Think about what you'd need in order to prove the loop had been corrected. Can you get back to the pre-loop reality?

10. What is "historical bias" in AI training data?

Correct. Historical bias is about the legacy of documented injustice living in the data — credit discrimination, housing segregation, employment barriers — shaping what the AI sees as "normal."

Historical bias isn't about time periods — it's about the injustices that shaped the world the data was drawn from.

11. In the 2019 Apple Card controversy, Steve Wozniak reported his wife received a credit limit ten times lower than his despite sharing all accounts. Goldman Sachs stated the algorithm used no gender variable. What concept from Lesson 4 explains how gender discrimination could still occur?

Correct. Redundant encoding means a protected characteristic is absent as a variable but functionally present through correlated factors that carry its historical effects.

The key concept here is about how protected characteristics can be reconstructed through other variables. Re-read the "Gender-Blind Fallacy" section.

12. The Equal Credit Opportunity Act was passed in 1974, giving women the right to credit in their own names. An AI mortgage tool trained on credit data from 1960–2024 is described as "using historical data for accuracy." What is the fairness problem with this framing?

Correct. "Accuracy" in a discriminatory world means encoding discrimination as a feature, not a flaw. The temporal range of the training data carries the legal discrimination of that era forward.

Think about what the data from 1960–1974 actually reflects. Who was allowed to build credit history during that period?

13. "Disparate impact" refers to:

Correct. Disparate impact shifts focus from intent to outcomes — a system can be discriminatory through its results even if no one intended to discriminate.

Disparate impact is about what the system produces, not what it was designed to do. Intent is irrelevant to the definition.

14. The EU AI Act requires risk assessments for high-stakes AI before deployment. How does this differ from the approach of evaluating only which input variables an AI uses?

Correct. Output-focused accountability demands measurement of who is actually affected and how — rather than accepting a list of absent variables as sufficient evidence of fairness.

Think about where the burden of proof falls. Input-focused: "we don't use X." Output-focused: "here's what our system actually does to different groups."

15. Across all four lessons, what is the single most common thread connecting training data bias, annotation bias, feedback loops, and historical bias?

Correct. The unifying insight across this module is that AI doesn't generate bias from nothing — it inherits it from the human world and the human history embedded in its training data, often invisibly.

Think about what connects all four lessons. Where does the bias come from in each case? What's the common origin?