In 2016, researchers at Harvard, Boston University, and the University of Washington ran an unusual experiment. They pointed a set of AI language tools at ordinary job listings and asked the tools to complete phrases about professions.
The results were jarring. When the AI completed the phrase "a doctor walked into the room and he..." it continued smoothly, confidently. When the same prompt used "she," the AI reached for nurses, receptionists, and assistants instead. When asked to associate names with pleasant or unpleasant words, the tools matched European-American names with pleasant terms and African-American names with unpleasant ones โ at rates strikingly similar to results from the Implicit Association Test, a psychology tool used to measure unconscious human bias.
The AI hadn't been programmed to discriminate. Nobody told it that doctors are usually men or that some names were "better." It had simply read the internet โ billions of pages of text written by humans across decades โ and absorbed what was there. The machine learned bias the same way a kid might pick up a prejudice without anyone explicitly teaching it: by soaking in a world that already had one.
The researchers called their paper "Man is to Computer Programmer as Woman is to Homemaker?" It became one of the most cited AI fairness papers of the decade.
Before an AI can do anything โ recognize your face, rank job applicants, suggest what video to watch next โ it has to learn. And the way most modern AI learns is by studying enormous collections of examples. That collection is called training data.
Think of it this way. Imagine you wanted to teach a younger sibling what a "good essay" looks like, and you gave them a stack of 10,000 essays to read. Whatever patterns appear in that stack โ what kinds of openings get praised, what topics show up most, whose writing style gets called elegant โ your sibling will start to treat as the definition of good. They won't know they're being shaped. They'll just think they have opinions.
AI does the same thing, except instead of 10,000 essays, it might study 100 billion sentences. The scale is incomprehensible, which makes it feel neutral and vast. But vast doesn't mean balanced. It just means the patterns are bigger.
The internet that AI learns from was not built by everyone equally. In 2023, women wrote roughly 26% of Wikipedia articles in English. Speakers of Mandarin, Hindi, Bengali, and Swahili โ languages representing billions of people โ account for a fraction of the text that major AI systems were trained on. The overwhelming majority of pre-2020 internet text was in English, written by people with internet access, in wealthier countries, mostly in their 20s and 30s.
This matters because the gaps in the training data become gaps in what the AI knows and values. An AI trained mostly on American and British text will "understand" English-language idioms better than Nigerian English. It will have seen more articles about Wall Street than about the Lagos Stock Exchange. It won't be wrong about everything outside its training โ but it will be uncertain, shallow, and sometimes outright wrong in ways the system itself can't detect.
The Harvard/BU study showed something even more troubling: it's not just that AI knows more about some groups. It's that the associations it learned โ who belongs in which job, which names sound trustworthy, which accents signal intelligence โ reflect a world that was already unequal.
When engineers say their AI "learned from real-world data," they're often trying to sound scientific and objective. But "real-world data" means the data that existed, was collected, and was digitized โ which is very different from data that represents all real people equally. The word "real" is doing a lot of heavy lifting there.
Here's where it gets genuinely hard. Suppose you know your training data is biased. You can try to fix it โ add more text written by women, more articles in non-English languages, more examples from underrepresented communities. Researchers call this rebalancing or debiasing the dataset.
But there's a problem. If you add synthetic data โ text that wasn't really written by the people it's supposed to represent โ you risk creating a different kind of distortion. And if you remove biased examples, you might accidentally remove the only evidence that bias existed, making the AI seem fairer than it is without actually being so.
Some researchers argue that any human-generated dataset will carry human prejudice, full stop, and that the only honest response is radical transparency: tell everyone exactly what was in the training data, who created it, and what its known gaps are.
Others argue that transparency without action is useless โ that disclosing a problem you don't fix is just a legal shield.
If every large dataset reflects the biases of the world that created it, is there such a thing as a truly fair AI โ or are we always just choosing whose biases get encoded? There is no clean answer here. Researchers, lawyers, and ethicists argue about this right now.
You now understand something most people miss when they read a headline like "AI Is Biased." They picture an engineer deliberately making a racist machine. What you know is more subtle and more important: bias usually isn't programmed in on purpose. It's absorbed, quietly, from data that reflects a world that was already tilted. Knowing that changes how you evaluate every claim that an AI is "objective" or "data-driven."
A startup has built an AI that recommends which college applicants get a second look from human admissions officers. They trained it on 15 years of successful alumni data from their partner universities. They say it's "purely merit-based."
Your lab partner โ CLIO, an AI research assistant โ is waiting to dig into this with you. CLIO won't lecture you. CLIO will push back, ask hard questions, and expect you to take a position.
In 2019, journalist and researcher Mary Gray and computer scientist Siddharth Suri published a book called Ghost Work. It documented a system most people never see: the army of contractors โ often in Kenya, Venezuela, the Philippines, and rural America โ who teach AI by labeling data for a few dollars an hour.
These workers, called annotators, do tasks that sound deceptively simple. Is this image of a sidewalk "accessible" or "not accessible"? Is this sentence "toxic" or "not toxic"? Is this loan application "risky" or "low risk"? The answers these workers give become the ground truth that AI systems are trained to reproduce. If the annotators decide it's toxic, the AI learns toxic. If they miss it, the AI learns to miss it too.
One company, Scale AI, was paying annotators in Kenya around $1โ3 per hour in 2022, according to an investigation by TIME magazine published in January 2023. These workers were labeling data for AI products used by major American corporations โ making multi-million-dollar systems run while earning less per hour than the coffee that engineers drank while reviewing their work.
But here's the question that runs deeper than wages: whose understanding of "toxic," "risky," or "accessible" is being encoded into these systems? A worker in Nairobi, trained on American content guidelines, applying American social media company standards, to rate speech patterns they may have never encountered in daily life โ what does that process actually produce?
Most AI that makes real decisions โ spam filters, content moderation tools, medical diagnostic systems โ is trained using supervised learning. That means the training data doesn't just contain examples; it contains examples with answers attached. This email is spam. That tumor scan is malignant. This tweet is hate speech.
Those attached answers are called labels, and they are created by people. Sometimes by domain experts (doctors labeling medical images). More often by cheaper labor: crowdworkers clicking through thousands of examples on platforms like Amazon Mechanical Turk or Appen.
Labels seem objective โ a fact attached to a piece of data. But every label is actually a judgment, made by a specific person, in a specific context, using criteria that someone else wrote. The AI never learns that a tweet "is" hate speech. It learns that certain humans, at a certain moment, reading certain guidelines, called it that.
When a company hires annotators, it gives them a document โ sometimes hundreds of pages long โ called an annotation guideline. It defines every category. What counts as violence? What level of sexual content is "explicit"? When is criticism of a government "political speech" versus "dangerous misinformation"?
These guidelines are written by policy teams, lawyers, and product managers at major tech companies โ almost exclusively based in California and New York. The values embedded in those documents are the values of specific professional cultures at specific companies in specific American cities.
Researchers have documented concrete problems this creates. In 2021, a paper in Proceedings of the ACL found that hate speech detection AI performed significantly worse on tweets written in African-American Vernacular English (AAVE) โ because the annotators, guided by standard English norms, were more likely to label AAVE text as toxic even when the content wasn't. The AI then learned to associate a dialect with harm.
This is not a small edge case. It's a systematic pattern where the people writing the rules and the people applying the rules are different from the people most affected by the rules โ a recipe for encoded injustice that's invisible in the final product.
Content moderation AI shapes what speech is allowed on platforms used by billions of people. Loan-risk AI determines who gets credit. Medical AI determines who gets flagged for follow-up care. The labeling decisions made by underpaid contractors today become policy decisions that affect real institutions tomorrow. The people with the least power in the annotation pipeline bear the most consequences from its outputs.
Here's the knot: AI systems need labels to learn. Labels require human judgment. Human judgment varies โ by culture, by language, by experience, by the guidelines someone handed you on your first day of a gig-economy job. There is no view from nowhere.
Some researchers have proposed participatory design: involving the communities most affected by an AI system in writing the annotation guidelines, not just in doing the labeling work. Others argue this adds complexity without fixing the core problem โ that scale requires standardization, and standardization always privileges some norms over others.
The hardest version of the question is this: if two annotators disagree about whether a piece of content is harmful, whose answer should count? Most current systems resolve disagreement by majority vote โ meaning minority perspectives, minority dialects, and minority cultural contexts are systematically outvoted in the very data that will govern them.
Is it possible to build a labeling system that doesn't privilege some cultural framework over others โ or is that always unavoidable? If it's unavoidable, who should be the ones to make the choice, and how should they be held accountable?
You can now see what most people miss when they use a content moderation tool or a spam filter: somewhere behind that decision, a human being got paid a few dollars to make a call, following guidelines written by someone else, and the AI turned that call into a rule applied to millions of people. The objectivity was always an illusion. The question is just how visible we let the subjectivity be.
A major social media company's content moderation AI is flagging political speech from users in Southeast Asia at five times the rate of equivalent content from US users. A watchdog group has asked you to review the annotation process. You have access to CLIO, who has been briefed on the case.
CLIO will not tell you what to conclude. CLIO will challenge your reasoning and ask you to back up your claims.
By 2018, a software product called PredPol โ later renamed Geolitica โ was being used by police departments in Los Angeles, Atlanta, Santa Cruz, and dozens of other American cities. The software analyzed historical crime data and predicted where crime was most likely to happen next, down to a 500-square-foot area. Police would then patrol those areas more heavily.
The idea seemed logical: use data to put officers where they're most needed. But researchers at the Human Rights Data Analysis Group, led by statistician Kristian Lum, published a 2016 analysis that exposed a critical flaw. The historical crime data PredPol used was not a record of where crime happened. It was a record of where crime was reported โ and where police had previously patrolled.
In cities like Oakland and Los Angeles, decades of data showed heavy policing in Black and Latino neighborhoods โ not necessarily because those neighborhoods had more crime, but because that's where police had historically concentrated. PredPol absorbed that history and concluded: send officers to those same neighborhoods. More officers meant more arrests. More arrests meant more data points confirming those neighborhoods as "high crime." Which meant the algorithm sent more officers there next time.
The bias didn't just reflect the past. It amplified it and justified it โ turning a historical policing pattern into a mathematical recommendation that looked objective, scientific, and inevitable.
A feedback loop happens when the output of a system feeds back in as the input for the next cycle. In the PredPol case: algorithm output (more patrols) โ more arrests in those areas โ more data from those areas โ algorithm reinforced to keep sending patrols there.
Feedback loops are not always bad. Your body uses feedback loops constantly โ when you get too hot, you sweat, which cools you down, which tells your body to stop sweating. That's a corrective feedback loop. The problem with many AI systems is that they create amplifying feedback loops โ ones where the output makes the original pattern stronger, not weaker.
In a criminal justice system, an amplifying feedback loop is particularly dangerous because the people being targeted have no way to prove they were over-policed rather than genuinely higher-crime. The arrest record exists. The algorithm reads the arrest record. The arrest record becomes evidence. The pattern is self-sealing.
Feedback loops often work through proxy variables. When you can't measure what you actually want to know, you measure something you can โ and hope it's a reliable stand-in.
PredPol wanted to measure crime. It couldn't directly count crimes that happened (since most crime is never reported). So it used arrest records and police incident reports โ which are proxies for crime, but are actually measuring police activity as much as criminal activity. The proxy was corrupted by the same bias the system was supposed to transcend.
This same problem appears across AI systems. A healthcare algorithm studied by Obermeyer et al. in Science in 2019 was designed to identify high-risk patients who needed extra care. It used healthcare costs as a proxy for health need โ because sicker patients should cost more to treat. But Black patients received less medical care than white patients with the same health conditions, due to systemic inequities in healthcare access. So their costs were lower. The algorithm read low costs as low risk, and ranked them as needing less additional care โ compounding an existing disparity with algorithmic authority.
What made PredPol and similar systems especially consequential is that they were applied at scale โ across entire cities, affecting thousands of people's encounters with police, which in turn shaped their records, their opportunities, their lives. A biased human officer affects some encounters. A biased algorithm running 24 hours a day, applied to an entire metropolitan area, encodes that bias into the infrastructure of a city.
Here's the hardest part: feedback loops are nearly impossible to audit from inside the system. If you only have the current data โ arrests, patrol patterns, flagged incidents โ how do you distinguish "this neighborhood has more crime" from "this neighborhood was policed more, so it has more records"? The answer requires data you don't have: what would crime look like if policing had been distributed equally from the start?
That counterfactual โ the world that didn't happen โ is exactly what a feedback loop destroys. It overwrites the pre-loop reality with the post-loop data, making the past inaccessible. Some cities have responded by abandoning predictive policing software entirely. Santa Cruz, California, banned predictive policing tools in 2020, the first city in the US to do so. Other jurisdictions continue using them.
The question isn't just technical. It's about what systems of accountability we want to build before we deploy AI โ not after we discover the loop is already running.
If a feedback loop has been running for ten years, can you ever fully unwind it โ or has the biased data permanently altered the landscape the AI operates in? And if you can't fully unwind it, what does "fixing" the AI even mean?
Knowing this changes how you read every headline about AI and policing, healthcare, or credit scoring. When a company claims its AI is "data-driven and objective," you can now ask: what data? From where? Does that data reflect a world that was already unequal? And does the AI's current output become tomorrow's training set? Those questions are not cynical โ they're responsible.
A mid-sized US city has been using a predictive policing algorithm for six years. The police chief argues it has reduced crime by 12%. A community coalition says the same neighborhoods have been over-policed for six consecutive years, and arrest rates in those neighborhoods are now being used to justify continued heavy patrolling. The city council has asked you for a recommendation.
CLIO has the same briefing you do. CLIO will push you to think through the consequences of your recommendation before you commit to it.
In November 2019, tech entrepreneur David Hansson โ co-creator of the Ruby on Rails programming language โ posted a thread on Twitter that went viral almost immediately. He had applied for an Apple Card, a credit card launched by Apple and Goldman Sachs that year. His wife, Jamie Heinemeier Hansson, applied separately. Their credit scores were similar. Their shared assets were significant. But the algorithm granted David a credit limit twenty times higher than Jamie's.
David's tweet triggered an avalanche of similar stories. Within days, Apple co-founder Steve Wozniak reported the same pattern โ his wife received a credit limit ten times lower than his, despite sharing all accounts jointly.
New York State's Department of Financial Services launched an investigation. Goldman Sachs said their algorithm didn't use gender as a variable โ it was, officially, "gender blind." But investigators noted that many of the factors the algorithm did use โ employment history patterns, credit utilization rates, certain spending categories โ historically tracked gender differences produced by decades of discrimination. The algorithm wasn't using gender. It was using the effects of gender discrimination that had accumulated in economic data over generations.
The case was closed in 2021 without Goldman Sachs being found to have violated any law. No algorithmic fix was publicly announced.
The Apple Card story is a window into one of the most difficult concepts in AI fairness: historical bias. This is what happens when the data an AI is trained on reflects not just neutral human behavior, but the accumulated outcomes of past discrimination.
Consider credit data. For most of American history, women could not obtain a credit card in their own name without a male co-signer. This didn't change until the Equal Credit Opportunity Act of 1974 โ meaning that anyone whose credit history began before the mid-1970s was building it under discriminatory rules. Even after 1974, research documented that women, Black Americans, and Hispanic Americans systematically received less credit, at higher interest rates, in smaller amounts โ producing the disparate credit histories that an AI model trained on that data will "see" as just the way things are.
The same pattern runs through housing data (shaped by federally-sanctioned redlining from the 1930sโ1960s), employment data (shaped by discriminatory hiring), and educational data (shaped by unequal school funding tied to property taxes in segregated neighborhoods). An AI trained on any of these domains is, in a meaningful sense, learning from the documented outcomes of documented injustice.
Goldman Sachs said it: the algorithm doesn't use gender. This is what technologists call being facially neutral โ the system doesn't explicitly look at race, gender, or religion. And in the United States, many anti-discrimination laws are written to evaluate whether protected characteristics were used, not whether their effects were reproduced.
But researchers in the field of algorithmic fairness have documented extensively that you don't need to use a protected variable to encode its effects. Employment gaps, because they historically run along gender lines. Neighborhood credit patterns, because neighborhoods were racially segregated by law. Spending categories, because income disparities follow racial wealth gaps. Every one of these factors is a legitimate-sounding, non-discriminatory variable that carries historical discrimination inside it like a hidden passenger.
This is sometimes called redundant encoding โ the protected characteristic is technically absent but functionally present, because correlated variables reconstruct it. The algorithm says it doesn't see gender. But its variables effectively do.
In 2019, US anti-discrimination law was largely unprepared for disparate impact from algorithms that don't explicitly use protected characteristics. The legal standard for housing discrimination (the Fair Housing Act) includes disparate impact; credit law is more complicated. As of 2024, regulatory agencies and courts are still actively working out what "discrimination" means when the discriminating entity is a mathematical model trained on historical data. This is a genuinely unsettled area of law โ and policy decisions about it are being made right now by real institutions.
Here's the question that sits at the center of everything in this module: if an AI learns from historical data, it learns from a world that was shaped by injustice. To be truly accurate about that world, the AI must model the inequalities it contains. But to be deployed fairly in this world, the AI must not reproduce those inequalities as recommendations.
Some researchers argue for algorithmic affirmative action โ deliberately adjusting AI outputs to correct for historical bias, so that the AI's decisions would look like what decisions should have been in a fair world, rather than reflecting the unfair world that actually existed. Critics of this approach argue it introduces its own distortions, and raises the question of who gets to define what the "fair world" counterfactual looks like.
Others argue the problem is upstream: we should never deploy AI in high-stakes domains like credit, housing, criminal justice, and healthcare until we can demonstrate, rigorously, that its outputs don't produce disparate outcomes โ regardless of what variables it uses. This is the regulatory approach taken by some European countries under the EU AI Act, which requires risk assessments for high-stakes AI applications before deployment.
What's clear is that "the algorithm doesn't see race" is not a defense, not a proof of fairness, and not a substitute for actually measuring what outcomes the algorithm produces โ for whom, and compared to whom.
If correcting for historical bias in an AI system means deliberately giving some groups better outcomes than the "neutral" model would produce โ is that fair? Is it reparative? Is it a new kind of discrimination? Different people โ with equally serious ethical commitments โ land in very different places on this question.
You now understand something that shapes every major AI policy debate happening in governments and courts right now. The question isn't just whether an AI uses protected variables. It's whether the data it learned from carries the effects of historical discrimination โ and whether "accuracy" in a biased world is the same thing as fairness. That distinction is at the center of real lawsuits, real regulations, and real decisions that will affect millions of people in the next decade.
A bank is launching a mortgage AI. They've submitted a compliance document stating: "The model uses no protected variables (race, gender, national origin) and therefore meets anti-discrimination standards." A housing rights organization has challenged this. You're advising the regulatory agency on whether the bank's statement is sufficient.
CLIO is acting as a sharp senior analyst who has read both the bank's filing and the housing group's challenge. CLIO will test your reasoning and won't let you get away with vague answers.