Module 2 · Lesson 1

Where Does the Answer Come From?

AI doesn't look things up. It learned from billions of examples — and some of them were wrong.

If an AI reads a million pages of bad information, what does it believe?

In January 2023, a law professor at Drexel University named Jonathan Turley typed his own name into ChatGPT and asked it to describe any legal scandals he'd been involved in. The AI produced a detailed, confident story: Turley had allegedly made inappropriate comments toward female students on a class trip to Alaska, and the Washington Post had covered the incident.

None of it happened. The Washington Post article didn't exist. The trip didn't happen. The comments were invented. When researchers investigated, they found the AI had apparently stitched together real names and plausible-sounding events from different sources into a story that had never occurred — and then presented it as fact.

Turley's case became one of the first documented examples of what researchers were already calling "hallucination" — when AI generates text that is confident, fluent, and completely false. He wasn't the only one. The same investigation found fabricated cases involving other law professors at multiple universities.

What Actually Happened Inside the AI

To understand why an AI would invent a story about a real person and present it as fact, you need to understand where AI answers come from — because it's not where most people think.

An AI like ChatGPT doesn't search the internet every time you ask it something. It was trained — meaning it read an enormous pile of text (books, websites, news articles, forums, academic papers) before it ever talked to anyone. During that training, it learned patterns: what words usually follow other words, what topics relate to each other, how confident-sounding sentences are structured.

When you ask it a question, it's not retrieving a stored answer. It's predicting what a good-sounding answer would look like, based on everything it absorbed during training. Most of the time, that prediction matches reality. But sometimes — like in Turley's case — it generates something plausible-sounding that happens to be false.

Training data The massive collection of text an AI reads and learns from before it starts answering questions. Think of it as everything the AI ever "studied" — but it studied passively, absorbing patterns without checking if the information was true.

Hallucination When an AI generates information that sounds correct and confident but is actually false or invented. The name comes from the idea of "seeing" something that isn't there — the AI "sees" a plausible answer where there isn't one.

Why the AI Can't Just "Know" It's Wrong

Here's the part that surprises most people: the AI doesn't have a separate fact-checker running inside it. When it generates an answer, it doesn't have access to a list of "things I know are true" vs. "things I'm guessing." It just generates text that fits the pattern of a correct-sounding response.

Imagine you were asked to write a paragraph about a topic you half-remember from a book you read two years ago. You might accidentally mix up details, combine events from different stories, or invent a specific date that feels right but isn't. You wouldn't be lying — you'd just be filling in gaps with what seemed plausible. That's very close to what AI does, except it's doing it with training data from billions of sources, and it does it for every single response.

The Turley case is especially instructive because the AI didn't just get a fact slightly wrong — it invented an entire news article, an entire event, with a real person's name attached. The more a topic sounds like something that should have happened (a professor + a workplace complaint + a news story), the more confidently the AI might generate a version of it, even if the specific version never occurred.

The Confidence Problem

AI systems often express the same level of confidence whether they're saying something true or something invented. In the Turley case, the AI didn't say "I think there might have been a complaint" — it described a specific article in a specific newspaper. That confident framing is part of what makes hallucinations dangerous.

The Scale of the Training Data

To appreciate why this problem is hard to fix, consider the size of what we're talking about. GPT-4, the model behind ChatGPT, is estimated to have trained on somewhere between 45 and 100 terabytes of text data. For reference, the entire printed collection of the U.S. Library of Congress — about 17 million books — is estimated at around 10 terabytes. So this AI read the equivalent of five or more Library of Congress collections.

In that ocean of text, there's accurate information, outdated information, satirical articles, opinion pieces, misinformation, corrections to misinformation, and everything in between. The AI absorbed all of it, weighted by how frequently patterns appeared. Common, well-documented facts appear in many sources and get reinforced. Obscure or false information might also appear in enough sources to leave a mark.

Think about what that means: if a false story gets repeated enough times on enough websites, the AI might treat it as more reliable than a true story that only appeared in one careful source. Repetition, not accuracy, is what shapes the model's beliefs.

You Can Now See What Most People Miss

When someone asks "why did the AI say that?" the most common assumption is that the AI searched for the answer and found something wrong. Now you know the real picture: the AI didn't search at all. It predicted. And prediction, no matter how good the model is, will sometimes generate false outputs with full confidence. That distinction changes how you should read every AI response you ever encounter.

An Ethical Question Without a Clean Answer

Jonathan Turley's name is now permanently attached to a story that never happened — a story generated by an AI and then circulated across the internet as people discussed the case. Even though the original story was false, the news coverage of the AI's false story is real, and it shows up in searches of his name.

Here's the question no one has fully resolved: Who is responsible? OpenAI, the company that built ChatGPT, didn't intend to defame Turley. The user who typed the question wasn't trying to harm him. The AI isn't a legal entity that can be sued. And yet, a real person's reputation was affected by something that had no author, no editor, and no intention.

As AI systems become more common in journalism, research, and everyday decisions — this question matters more and more. We don't have a good answer yet. Neither do lawyers, courts, or the companies building these systems.

Lesson 1 · Quiz

Check Your Thinking

Four questions. Apply what you learned — don't just recall it.

1. Professor Turley found that ChatGPT invented a false story about him. What does this tell us about how AI generates answers?

Correct. AI doesn't retrieve stored facts — it predicts what a good-sounding answer would look like. Sometimes that prediction doesn't match reality.

Not quite. AI systems like ChatGPT don't search the internet when generating responses. They predict based on training data patterns. The answer they give reflects what seemed plausible, not what was found.

2. An AI is trained on a lot of text that includes a common but false rumor about a historical event. What would you expect the AI to do when asked about that event?

Right. The AI's training process reinforces patterns that appear frequently. A widely repeated false rumor may be more deeply embedded than a less-circulated truth.

Think about how training works. The AI learned from patterns in text — it doesn't have a separate fact-checker. If a false claim appears frequently enough in training data, it can become a reinforced pattern.

3. What is a "hallucination" in AI terms?

Exactly. Hallucination is specifically about confidence + falseness. The AI doesn't signal that it's guessing — it presents invented content the same way it presents accurate content.

The key feature of hallucination is that the AI presents false information confidently, without flagging it as uncertain. It's not intentional — it's a failure mode built into how prediction works.

4. After Turley's case became public news, the false AI-generated story about him was discussed in real articles across the internet. What new problem does this create?

Correct. This is a real compounding problem — the discussion of a false AI claim creates new real text, which future AI systems might train on. The false narrative gets embedded in the historical record even after it's been corrected.

Think about what happens when a debunked story gets covered widely. The correction creates new text — and that text still contains the false claim's details, even in the act of correcting it. Future searches and future AI training may pick up both.

Lesson 1 · Lab

Pattern Detective

You're investigating how AI prediction leads to hallucination. Your job is to push the reasoning — not just agree.

Your Role: AI Auditor

You've been hired to investigate AI hallucinations. Your lab partner (the AI below) has studied the Turley case and similar incidents. Your job is to interrogate the reasoning — challenge assumptions, propose new scenarios, and figure out when prediction becomes dangerous.

Start here: "If AI prediction is the cause of hallucination, does making the AI more confident actually make the problem worse? Why or why not?"

Lab Partner — AESOP Auditor

AI Auditor Mode

Turley's case is a clean example of a messy problem. Here's what I keep coming back to: the AI didn't hesitate. It didn't hedge. It described a Washington Post article with specifics. That confidence is a feature of how these systems are trained — outputs that sound authoritative are often rewarded during development. So here's my question back to you before we dive in: do you think the AI "knew" it was wrong? And does that even matter?

Module 2 · Lesson 2

The World the AI Learned From

Training data doesn't just shape what an AI knows. It shapes what it assumes.

If an AI learned from a world full of human biases, what did it absorb along the way?

In 2018, Reuters reported that Amazon had quietly shut down an AI recruiting tool that had been in development since 2014. The tool was supposed to rate job candidates automatically, scanning resumes and scoring them from one to five stars. Engineers had trained it on ten years of Amazon's historical hiring data — the resumes of people who had been hired at the company over the previous decade.

The problem: Amazon's tech workforce over that decade had been overwhelmingly male. So the AI learned that male-associated patterns predicted hiring success. It started penalizing resumes that included the word "women's" — as in "women's chess club" or "women's college." It downgraded graduates of all-women's universities. It had absorbed the bias in Amazon's past decisions and converted it into a scoring algorithm.

Amazon scrapped the tool before it was ever used in actual hiring decisions. But the story revealed something that researchers had been warning about for years: when you train an AI on data produced by a biased system, the AI doesn't just learn the patterns — it learns the bias.

Why the AI Wasn't Trying to Be Unfair

The Amazon tool wasn't programmed to discriminate. No engineer wrote code that said "give women lower scores." The bias emerged from the data itself — from the fact that the historical record of who got hired at Amazon reflected a world where women were systematically underrepresented in tech roles.

This is the subtle part: the AI was doing exactly what it was supposed to do. It was finding patterns in successful hires and learning to recognize them. The problem was that "successful hires" in the training data meant "people Amazon hired in the past" — and those people were filtered through a biased process before the AI ever got involved.

Bias in, bias out. The AI learned to reproduce the judgments of a system that was already unfair, and it reproduced them faster, at scale, and without any human pausing to ask whether the original judgments were right.

Algorithmic bias When an AI system produces systematically unfair outcomes because the data it learned from reflected historical inequalities or human prejudices. The algorithm isn't broken — it's working exactly as designed. The problem is what it was designed on.

This Isn't Just a Hiring Problem

Amazon's case made headlines, but researchers had already documented the same pattern in many other domains. In 2016, ProPublica published an investigation showing that a risk-assessment AI called COMPAS — used by courts across the United States to predict whether convicted criminals would reoffend — was scoring Black defendants as higher risk at nearly twice the rate of white defendants who went on to commit the same types of crimes. The system had been trained on historical arrest and conviction data, which itself reflected decades of racially unequal policing practices.

In medicine, an AI system published in Science in 2019 was found to systematically underestimate the health needs of Black patients, partly because it had been trained on healthcare cost data — and Black patients had historically received less care, meaning lower costs, which the AI mistakenly interpreted as lower need.

The pattern is consistent: wherever human decisions were unequal in the past, AI trained on those decisions tends to reproduce that inequality in the present.

The Scale Problem

A human hiring manager making a biased decision affects one candidate. An AI tool making a biased decision can screen out thousands of candidates in seconds, without ever being questioned. Scale amplifies the impact of bias dramatically — and makes it harder to detect, because the decisions happen too fast for any individual to notice the pattern.

What You Can See That Others Miss

Most news coverage of AI bias frames it as a technical problem: "The algorithm was flawed." But knowing what you now know, you can see it differently. The algorithm wasn't flawed — the world it learned from was. And that's a much harder problem to solve, because you can't just patch the code.

To fix the Amazon tool, you'd need to somehow teach the AI to ignore patterns that emerged from past discrimination while still learning patterns that predict actual job performance. But how do you separate those? If women with the same qualifications were historically paid and promoted less — and those outcomes are embedded in the data as "success signals" — the AI may not be able to distinguish real performance from historical disadvantage.

Knowing this changes how you should think about every automated system that makes decisions about people: hiring, lending, healthcare, criminal justice. The question isn't just "is the algorithm fair?" It's "was the world that generated the training data fair?"

The Ethical Question

Amazon discovered its recruiting AI was biased and shut it down. But COMPAS — the criminal sentencing tool — was still being used in courts years after ProPublica's investigation. If a tool is shown to produce racially unequal outcomes, should courts be legally required to stop using it? Who gets to decide? And what happens to people who were already sentenced using a biased system? These questions don't have agreed-upon answers. Judges, legislators, and AI researchers are still actively debating them.

Lesson 2 · Quiz

Check Your Thinking

Apply the concept of algorithmic bias to new scenarios.

1. Amazon's recruiting AI penalized resumes with "women's" in them. What was the root cause of this behavior?

Correct. The AI wasn't programmed to discriminate — it learned that male-associated patterns predicted hiring success, because the historical data it trained on reflected a male-dominated hiring record.

No one programmed the discrimination in. The AI absorbed it from the training data itself — specifically from ten years of hiring decisions that happened in a male-dominated hiring environment.

2. A city uses an AI to predict which neighborhoods need the most police patrols. It's trained on historical arrest records. What bias risk should you be most concerned about?

Exactly right. If a neighborhood was over-policed in the past, it has more arrests in the record — not necessarily because it has more crime, but because it was watched more closely. The AI reads high arrest counts as high risk, sends more police, creates more arrests, and reinforces the original bias.

Think about what "historical arrest records" actually measure. They don't measure how much crime happened — they measure how much crime was detected, which depends on where police were already looking. Areas that were over-policed have inflated arrest records, which the AI might misread as evidence of high crime.

3. "Bias in, bias out" means:

Correct. The phrase captures the core insight: data isn't neutral. If the data reflects a biased world, the AI learns a biased world-view — and applies it at scale.

"Bias in, bias out" specifically refers to how biased training data produces biased AI behavior. It's not about question wording or about whether AI is always biased — it's about data being a mirror of the world, including its inequalities.

4. After ProPublica revealed COMPAS was scoring Black defendants as higher risk at twice the rate, the company said: "The tool is fair — it predicts recidivism at the same accuracy rate for both groups." ProPublica said it wasn't fair. Who do you think has the stronger argument, and why?

Strong reasoning. A tool can have the same overall accuracy for two groups while still being wrong in different ways. COMPAS was found to falsely flag Black defendants as high-risk at twice the rate — meaning its errors disproportionately harmed one group, even if the total accuracy numbers looked equal.

Think about where the errors fall. If a tool is wrong 30% of the time for everyone, but it's wrong in opposite directions — over-predicting risk for one group, under-predicting for another — equal accuracy doesn't mean equal treatment. The type of error matters, not just the rate.

Lesson 2 · Lab

Bias Auditor

You're reviewing an AI system for bias. Your job is to think like an auditor — push beyond obvious answers.

Your Role: Independent Auditor

A city government wants to use an AI system to decide which students get recommended for advanced academic programs. The AI was trained on 20 years of historical recommendation data from the school district. You've been hired to identify potential bias risks before the system goes live.

Your lab partner below has reviewed the system specs. Challenge their reasoning, propose test cases, and decide what you'd recommend to the city.

Start here: "What's the first thing you'd want to check in the 20 years of training data? And what would you do if you found evidence of bias — fix the data, throw it out, or something else?"

Lab Partner — AESOP Auditor

Bias Audit Mode

I've read the system specs. Here's what stands out to me immediately: 20 years ago, this district had documented disparities in which students were recommended for advanced programs — and the patterns correlated with race and family income. If we train on that history, we might be training on a record of who was recommended, not who was academically ready. But I'm not sure "throw out the old data" is the answer either — that might introduce different problems. What's your first instinct, and why?

Module 2 · Lesson 3

Lost in Translation

AI processes words. But meaning often lives between the words — in tone, culture, and context that text alone can't carry.

Can a system that only reads text ever really understand what we mean?

In October 2020, Facebook was caught in an incident that exposed a deep problem with AI language systems. The platform's automated content moderation AI — designed to detect hate speech — began removing posts by Black users discussing racism at dramatically higher rates than posts by white users discussing the same topics.

The issue was documented by researchers at the University of Washington and published in a major study: the AI had learned to flag African American Vernacular English (AAVE) — a dialect used by many Black Americans — as more likely to contain hate speech or "toxic" language. Words and phrases common in AAVE were being flagged as violations. Discussions of racism that used the actual slurs being described were flagged, while discussions that sanitized the same content were not.

The result: people describing their experiences of racism were being silenced by an AI that couldn't distinguish describing discrimination from perpetrating it. The system was reading words, but completely missing context.

The Gap Between Words and Meaning

The Facebook case is a precise illustration of something that's easy to miss when AI seems to be "reading" text fluently: understanding language and processing language are not the same thing.

A human reading a post about racism can usually tell immediately whether the author is recounting a slur that was used against them, criticizing a racist system, or expressing a racist view themselves. We pick this up from tone, grammar, the platform context, the account's history, what the surrounding sentences are about. Context is everywhere in communication — and most of it is invisible when you're just counting words.

AI language models are extraordinarily good at identifying statistical patterns in text. They've processed enough language to recognize that certain words appear more often in certain contexts. But they don't have lived experience that tells them why a word is used differently depending on who is speaking, who they're speaking to, and what happened in the conversation before this moment.

Context The surrounding circumstances that give words their actual meaning. The same sentence can be an expression of pain, a piece of satire, a historical record, or an attack — depending on who said it, when, where, and why. AI struggles to reliably detect context because context is often not in the text itself.

Content moderation AI Automated systems used by platforms like Facebook, YouTube, and Twitter to scan millions of posts and flag or remove content that violates rules. These systems operate at a scale no human team could match — but they also make mistakes at scale.

Sarcasm, Satire, and the Problem of Tone

Tone is one of the hardest things to communicate in text, and one of the hardest things for AI to detect. In 2022, Twitter's AI systems were repeatedly found flagging satirical posts — jokes, parody accounts, clearly labeled satire — as misinformation or policy violations. Meanwhile, literally false statements presented in neutral language sometimes passed through unchallenged.

Sarcasm is a particularly thorny problem. "Great job, that went really well" — said by a friend after you dropped your lunch tray — means the opposite of what it says. Any human who witnessed the tray incident understands immediately. An AI reading only the sentence has no way to access the tray, the relationship, or the tone of voice. It sees words that express praise.

Researchers have developed specific tests for AI sarcasm detection, and even the best modern models perform significantly below human level on detecting irony and sarcasm in naturalistic text. Language models can recognize sarcasm in clearly labeled examples — but in the wild, where nothing is labeled, they frequently miss it.

The Scale Consequence

Facebook's AI was processing roughly 100 billion pieces of content per day across its platforms by 2020. Even a 1% error rate in context detection means 1 billion misclassifications per day. At that scale, the impact on individuals — posts wrongly removed, accounts wrongly suspended, real experiences of discrimination silenced — adds up to something that affects millions of people's lives.

What This Means for How You Use AI

This lesson has a direct practical implication: when you interact with an AI, the context you provide in your words is often all it has. If you ask ambiguously, you'll get an answer that might be responding to a completely different interpretation of your question than you intended. If you describe a situation without key context, the AI will fill in the gaps — and it might fill them in wrong.

Skilled AI users understand this. They learn to be explicit about context: stating who they are, what situation they're describing, what they already know, and what kind of answer they actually want. This doesn't guarantee the AI understands — but it shifts some of the interpretive burden from the AI (where it often fails) to you (where you can actually control the meaning).

Knowing this changes how you should interpret AI responses. When an AI seems to be answering a different question than the one you asked, it's usually not broken — it interpreted your words through a context it inferred, and that context might not match your actual situation. The fix isn't frustration; it's more precise communication.

An Ethical Question Without a Clean Answer

Facebook's content moderation AI disproportionately silenced Black users discussing racism. One response is: fix the AI's dialect bias. But here's the harder question: is it even possible to automate judgment about context-dependent language at the scale these platforms operate? If the answer is "not reliably," then platforms face a choice: use imperfect AI moderation at scale, or use accurate human moderation that can only cover a fraction of the content. There's no clean answer. Every option has victims — people hurt by missed violations or people silenced by false positives. The real question is: who decides which failure is more acceptable?

Lesson 3 · Quiz

Check Your Thinking

Test how well you can apply context reasoning to new situations.

1. Facebook's AI flagged Black users' posts about racism at higher rates. What was the core cause of this failure?

Correct. The AI was pattern-matching on word appearance, not understanding the purpose of those words in context. Describing racism and perpetrating it can involve the same vocabulary — but the meaning is entirely different.

The AI could "read" the posts fine — it was processing language. But processing language and understanding meaning aren't the same. It flagged the presence of certain words without grasping whether they were being used to describe harm or to cause it.

2. An AI customer service bot receives the message: "Oh great, another update that broke everything. Really love this software." What is the most likely failure this AI will make?

Exactly. Sarcasm inverts literal meaning — "really love this software" is praise in words but complaint in intent. An AI reading the surface text sees positive language. Without tone, history, or body language, it may respond to the literal words rather than the actual frustration.

Consider what the words literally say: "great," "really love." Even though the meaning is sarcastic, the surface-level language is positive. AI systems frequently miss sarcasm because they process the words, not the tone behind them. The word "broke" might help — but in context, the AI has to weigh it against "great" and "really love."

3. "Processing language and understanding language are not the same thing." What does this mean for AI?

Right. Pattern recognition ≠ comprehension. AI can recognize that a word appears in certain contexts, but it doesn't "know" the life experience, cultural meaning, or emotional weight that word carries for the human using it.

AI processes language very effectively — it can read, summarize, translate, and generate text fluently. But "processing" means recognizing patterns. "Understanding" implies grasping meaning, intent, and context. Those are different things, and AI is much stronger at the first than the second.

4. You're using an AI assistant to help draft an email about a sensitive workplace conflict. The AI keeps generating responses that miss the point. What is the most likely explanation, and what's the best fix?

Correct. When AI misses the point, it usually means it inferred a different context than you meant. Adding explicit background — who the parties are, what happened, what you're trying to achieve — gives the AI the information it needs to align with your actual situation.

Before concluding the tool is broken or limited, consider what context you've given it. AI fills in missing context with its best guess — and that guess might not match your real situation. The most effective fix is usually being more explicit about who, what, and why.

Lesson 3 · Lab

Context Investigator

You're designing a test to find where AI context-blindness breaks down. No easy answers here.

Your Role: AI Red-Teamer

A "red team" is a group of people hired to find flaws in a system before it goes live. You've been brought in to test a content moderation AI for context failures before a social media company deploys it. Your job: design test cases that would reveal where the AI gets context wrong — and argue for how those failures should be handled.

Start here: "Give me your first test case — a sentence or situation where context matters so much that the same words could be appropriate or a violation depending on who said them and why."

Lab Partner — AESOP Auditor

Red Team Mode

Red teaming is one of the most important things you can do before an AI system affects real people. I'll push back on your test cases — if they're too obvious, the company will say "we already handle that." You want edge cases: situations where even humans disagree about the right call. That's where AI fails most badly, because it's making a confident decision in a situation where humans themselves have no consensus. What's your first test case?

Module 2 · Lesson 4

When AI Errors Feed Themselves

A wrong output can become new input. When AI affects the world and then learns from what it affected, errors don't stay small.

What happens when an AI's mistakes become part of what the next AI learns from?

On March 23, 2016, Microsoft launched a chatbot called Tay on Twitter. Tay was designed to learn from conversations with real users — the more it talked to people, the better it would get at engaging conversation. Microsoft described it as an experiment in "conversational understanding."

Within sixteen hours, Tay had been manipulated into generating racist, antisemitic, and sexist content. Coordinated groups of users had discovered that Tay would repeat phrases they taught it, and they systematically fed it hateful content. Tay learned — exactly as designed — and reproduced what it learned.

Microsoft shut Tay down after less than a day. But the incident revealed something fundamental about systems that learn from real-world feedback: if the feedback is poisoned, the learning is poisoned too. Tay had a runaway feedback loop — its outputs were shaped by its inputs, and its inputs were being controlled by bad actors who understood the mechanism.

What a Feedback Loop Is — and Why It Matters

A feedback loop happens when a system's outputs feed back into its inputs. In AI, this takes many forms. Sometimes it's intentional: you rate a movie, Netflix learns your preferences, recommends more movies, you watch them, and it learns more. That's a feedback loop designed to improve recommendations.

But feedback loops can also amplify errors. If an AI recommendation system learns that users click more on outrage-inducing content — and the platform rewards clicks — it will recommend more outrage-inducing content. Users who watch it then get recommended more of it. The system didn't start with the goal of maximizing outrage; it started with the goal of maximizing engagement. But engagement and outrage turned out to be correlated in the data, and the feedback loop amplified that correlation until it dominated the recommendations.

This is exactly what researchers documented about YouTube's recommendation algorithm between 2016 and 2019. A study by Guillaume Chaslot, a former YouTube engineer who left the company, found that the algorithm consistently recommended progressively more extreme content in political and conspiracy theory categories — not because it was designed to, but because that content held user attention longer, and attention was what the loop was optimizing for.

Feedback loop When a system's outputs affect its future inputs. In AI, this means the AI's decisions shape the world, and then the AI learns from that changed world — which can amplify small errors into large systematic problems over time.

The Model Collapse Problem

Feedback loops in AI are becoming a more urgent concern now that AI-generated content is flooding the internet. Researchers at institutions including Oxford and Rice University published studies in 2023 and 2024 warning about what they called "model collapse" — what happens when AI systems are trained on data that was itself generated by AI.

Here's why it matters: AI systems trained in 2023 learned from a web that was largely human-written. AI systems trained in 2025 are learning from a web that is increasingly AI-generated. If AI-generated text contains errors, biases, or distortions — and it does — then models trained on it will absorb those flaws. When those models generate more text, which then gets used to train the next generation of models, errors can compound across generations.

The Oxford study used a technical phrase for this: "the model learns the model's distribution rather than the true distribution." In plain language: the AI starts learning what AI says the world is like, instead of learning what the world is actually like. Over generations, it can drift further and further from reality.

Institutional Stakes — What This Means Right Now

The model collapse concern isn't theoretical. Governments, medical institutions, and financial systems are beginning to rely on AI for analysis and decision-making. If the AI systems they use were trained on compromised feedback loops, the recommendations they produce may be systematically skewed in ways that aren't visible on the surface. This is why auditing AI training data is increasingly considered a governance priority — not just a technical one.

What You Now Understand That Changes Everything

Put all four lessons together and something important comes into focus. AI doesn't make isolated errors — it makes errors that are deeply connected to where it came from, what it learned, how it handles context, and what its outputs flow back into.

The training data problem (Lesson 1) means its starting knowledge can be wrong. The bias problem (Lesson 2) means historical inequalities get reproduced at scale. The context problem (Lesson 3) means it can misread what you mean. And the feedback loop problem (this lesson) means errors don't stay contained — they propagate, amplify, and get baked into the next version.

None of these mean AI is useless. They mean AI is a powerful tool that requires informed users. You are now an informed user. When someone tells you an AI said something, you can ask four questions they probably haven't thought to ask: What was in the training data? Were there historical biases in that data? Did the AI have enough context? And is there a feedback loop that might have amplified an initial error into a systematic one?

Most people asking "why did it say that?" are looking for a simple answer. Now you know the answer is almost never simple — and knowing that puts you ahead of most adults talking about AI right now.

The Ethical Question — No Clean Answer

YouTube's algorithm optimized for engagement and, as a result, many researchers believe it radicalized significant numbers of users toward extreme political content between 2016 and 2019. YouTube changed its algorithm in 2019 after public pressure. But here's the hard version of the question: YouTube never intended to radicalize anyone. The algorithm was doing exactly what it was designed to do — maximize watch time. If an AI system causes harm while doing exactly what it was designed to do, is the company morally responsible? Legally? And if the harm came from a feedback loop that nobody fully understood at the time, how do you hold anyone accountable?

Lesson 4 · Quiz

Check Your Thinking

Feedback loops, model collapse, and the compounding of errors.

1. Microsoft's Tay chatbot became harmful within 16 hours. What does this reveal about AI systems that learn from real-world interactions?

Correct. Tay's design — learn from interactions — was the same feature that made it vulnerable. Any learning system is only as good as what it learns from. Tay's feedback was poisoned, and it learned from the poison.

The users who manipulated Tay played a role — but the deeper issue is the design. A system that learns from all interactions equally is vulnerable to anyone who understands how it learns. That's a structural problem, not just a bad-actor problem.

2. A music streaming app learns that you skip songs quickly. It starts recommending only songs it's confident you'll like. Over months, you notice you're only hearing one type of music. What does this illustrate?

Exactly. This is a classic feedback loop in recommendation systems. Your behavior trains the model; the model shapes what you see; what you see shapes future behavior. Over time, the loop narrows your experience toward what the model is most confident about — which may not be what's best for you.

Even if the app is working as designed, a feedback loop is still operating. Your skips → its recommendations → your listening → your future skips. Each cycle reinforces the previous one. The app isn't broken — it's doing exactly what it was designed to do, and that's exactly the issue.

3. What is "model collapse," and why is it a concern as more AI-generated content fills the internet?

Right. Model collapse describes a generational drift: each generation of AI learns from content increasingly shaped by previous AI generations, absorbing their errors and amplifying them. The model starts representing "what AI thinks reality is" rather than reality itself.

Model collapse is about training data contamination across generations. As the internet fills with AI-generated text, future AI trained on that text absorbs AI errors alongside human knowledge. The "true distribution" of the world gets replaced by the "model's distribution" — what AI has said about the world.

4. A hospital uses an AI to help schedule which patients need urgent follow-up calls. Over six months, the AI de-prioritizes patients who don't answer their phones. These patients then miss more appointments, which the AI records as "low compliance," which causes it to rank them even lower. What questions should a hospital administrator be asking about this system?

This is exactly the right question. The feedback loop conflates "doesn't answer calls" with "lower medical need" — but those aren't the same thing. Patients with less schedule flexibility, no reliable phone, or greater work demands might have serious medical needs while also having lower call-answer rates. The loop is likely creating health disparities, not just efficiency patterns.

The core issue here isn't a technical one — it's a feedback loop creating a false equivalence. Not answering phone calls doesn't mean lower medical need. But the system is treating them as the same signal, and over time it's building a record that "proves" low-compliance patients need less urgent care. That record will affect real medical decisions for real patients.

Lesson 4 · Lab

Feedback Loop Analyst

You're mapping a real AI feedback loop and deciding what — if anything — should be done about it.

Your Role: Policy Analyst

A city is considering using an AI-powered social media monitoring system to identify young people at risk of gang involvement. The AI will analyze public posts and flag accounts. Flagged individuals get outreach services — or increased police attention, depending on the department. The AI will learn from outcomes: if a flagged person later gets in trouble, that counts as a "correct" flag.

Your job is to map the feedback loops in this system and argue for what safeguards — if any — should be required before it's deployed. There's no single right answer.

Start here: "Map the feedback loop in this system — trace how the AI's outputs feed back into what it learns next. Then tell me: is this loop self-correcting or self-reinforcing?"

Lab Partner — AESOP Auditor

Policy Analysis Mode

This system has at least three nested feedback loops, and I think most policy proposals miss the deeper one. Before you map them — here's a framing challenge: the city says the AI is "learning from outcomes." But it can only learn from the outcomes it can measure. If someone never gets in trouble because they got support services, is that a "correct" flag or an "incorrect" flag? The AI can't know, because the intervention itself changed the outcome. How does that affect your analysis?

Module 2 · Final Assessment

Module Test

15 questions across all four lessons. Score 80% or higher to pass. Apply concepts — don't just recall them.

1. What does it mean when we say AI "predicts" answers rather than "looks them up"?

Correct. Prediction from learned patterns — not retrieval from a fact database — is the core mechanism. This is why hallucination is possible even when the AI sounds completely confident.

AI language models generate text by predicting what comes next based on training patterns. They don't search in real time or consult verified databases for most responses.

2. Professor Turley found that ChatGPT invented a detailed false story about him. The most important word to understand in this case is:

Right. Hallucination is the precise term because it captures the key feature: confident falseness. The AI wasn't uncertain or hedging — it described a specific fake event as fact.

"Hallucination" is the technical term — it captures both elements: the content was false, and the AI expressed it with the same confidence it uses for true content. That combination is what makes it dangerous.

3. Amazon's recruiting AI downgraded resumes from women's colleges. No engineer programmed this rule. How did the bias get there?

Correct. The bias came from the data's historical record, not from intentional programming. The AI learned what successful hires looked like — and in Amazon's history, they were mostly men.

The key insight: bias doesn't require a biased programmer. If the training data reflects a biased world, the AI absorbs that world's inequalities automatically through pattern learning.

4. A new city uses an AI to decide loan approvals. It trains on 15 years of the city's own loan decisions. The city historically denied loans to certain neighborhoods at much higher rates. What risk does this create?

Exactly right. If past loan denials were discriminatory, training an AI on them teaches it to replicate discrimination — at scale and with the veneer of algorithmic objectivity.

Think about what the training data represents. It's not the "right" answer to who deserves loans — it's a historical record of who got them, which may reflect discriminatory practices. The AI learns to reproduce those outcomes.

5. The COMPAS criminal sentencing AI produced equal accuracy rates for Black and white defendants but was found to make different types of errors for each group. What does this show about measuring fairness?

Correct. Fairness isn't just about totals — it's about who bears the cost of errors. COMPAS was more likely to falsely classify Black defendants as high-risk (false positives), which has serious real-world consequences.

Equal overall accuracy can hide deeply unequal impact. If a system is wrong in opposite directions for different groups — over-flagging one, under-flagging another — those groups are not being treated equally even if the percentages add up the same.

6. Facebook's content moderation AI silenced Black users discussing racism. What does this case tell us about the limits of word-based AI moderation?

Right. Context determines meaning. The same words can document racism or perpetrate it. An AI scanning for word patterns cannot reliably distinguish these — which is why context blindness causes real harm at scale.

The core failure was context blindness. The AI saw words associated with racism and flagged them regardless of whether those words were being used critically, historically, or harmfully. It processed language without understanding it.

7. You ask an AI for help writing a speech for your school's environmental club. It writes something that sounds like a corporate sustainability report. What most likely happened?

Correct. Without explicit context about who you are and who your audience is, the AI defaults to patterns it's seen most — and "environmental speech" in training data may skew toward formal corporate or policy language rather than student activism.

The AI isn't broken — it predicted based on the context it inferred. Without knowing you're a student writing for peers, it may default to the most common context it's seen for "environmental speech." Adding explicit context (student audience, casual tone, age 12-16) would likely produce a very different result.

8. Microsoft's Tay chatbot was deliberately manipulated into generating hateful content within 16 hours. What design principle did Tay's development team underestimate?

Correct. Tay's openness to learning from all interactions was its vulnerability. Any mechanism that makes an AI system adaptive also makes it exploitable by anyone who understands the learning mechanism.

The design issue was that learning from all interactions equally gives bad actors equal influence over what the AI learns. There was no filter between user input and model update — which meant coordinated manipulation could reprogram the model quickly.

9. A social media algorithm maximizes "engagement" (likes, shares, comments). Over time, it recommends increasingly extreme political content. This is an example of:

Exactly. The algorithm wasn't programmed to promote extreme content — it was optimizing for engagement. Extreme content happened to generate more engagement, the loop amplified that signal, and the recommendations drifted toward the extreme over time.

This is a feedback loop, not a bias-in-data problem. The algorithm is doing what it was designed to do: maximize engagement. But what maximizes engagement (outrage, conflict, extreme content) is different from what's good for users or society. The loop amplifies that misalignment.

10. "Model collapse" describes a scenario where:

Correct. Model collapse is a generational problem: as AI-generated text floods the training data used by future AI, the new models learn AI's errors rather than reality. Each generation can drift further.

Model collapse is about what happens when AI trains on AI-generated data across generations. Errors in earlier models become embedded in the training data for later models, compounding over time.

11. Which four questions should you ask when someone tells you "an AI said X"?

These four questions correspond directly to the four lessons in this module. They turn you from a passive consumer of AI outputs into an active, informed critic — and that's a different category of thinker.

The four questions from this module address the root causes of AI errors: training data quality, historical bias in that data, context limitations, and feedback loop amplification. Those are the analytical framework you've built through these lessons.

12. A journalist asks an AI to summarize "everything written about" a local politician. The AI produces a confident summary including several events the politician says never happened. The journalist publishes it. Who is responsible?

Correct framing. This is one of the genuinely unresolved questions of the current moment. Journalism ethics require verification; AI companies design products they know can hallucinate; courts haven't established liability frameworks. All three failure points matter.

The honest answer is: this is contested and unresolved. Journalism has verification standards, AI companies know their products hallucinate, and courts are still developing frameworks for AI-generated harm. The real insight is recognizing when a question doesn't have a clean answer.

13. Sarcasm is especially hard for AI to detect because:

Right. "That was great" as sarcasm means the opposite of its literal content. AI reading surface patterns sees praise. Without tone, body language, or situational context, the literal and sarcastic readings are indistinguishable from the text alone.

The core problem is that sarcasm's meaning is inverted from its words. AI processes language patterns — it sees "great" as a positive signal. The situational context that reveals it's sarcastic often isn't in the text itself.

14. A healthcare AI is trained on patient records from a hospital that historically served mostly white, affluent patients. It's now being deployed at a community hospital serving a more diverse, lower-income population. What is the main risk?

button

Correct. This is called "distribution shift" — when the people the AI makes decisions about are different from the people it was trained on. Patterns learned from one population may not transfer accurately to another.

AI trained on one population learns that population's patterns. When applied to a different population with different health histories, demographics, and social conditions, those learned patterns may not apply — leading to systematically wrong recommendations.

15. After completing this module, which statement best describes your new understanding of AI outputs?

This is the integrated insight of the module. AI outputs aren't random and aren't propaganda — they're the product of specific, understandable mechanisms. Knowing those mechanisms means you can evaluate AI outputs instead of just accepting or rejecting them wholesale.

The module builds toward a nuanced position: AI is neither randomly wrong nor generally unreliable nor controlled. Its outputs are shaped by traceable mechanisms — training data, bias, context, feedback loops. Understanding those lets you be a critical, informed evaluator.