Lesson 1 · Module 2

The Lawyer Who Cited Cases That Never Existed

When an AI speaks with total confidence about things it invented

Why does an AI sometimes make things up — and sound completely sure while doing it?

On a Tuesday morning in May 2023, a lawyer named Steven Schwartz filed a legal brief in a federal court in Manhattan. The brief was impressive — it cited more than half a dozen court cases, complete with judges' names, dates, and detailed summaries of what each ruling said. The opposing lawyers read it and immediately knew something was wrong.

The cases didn't exist. Not one of them. Varghese v. China Southern Airlines. Martinez v. Delta Air Lines. Shaboon v. Egypt Air. Each case had a convincing name, a plausible date, a real-sounding citation. And each one was entirely made up.

Schwartz had used ChatGPT to help research his brief. When he asked the AI if the cases were real, it said yes — and even generated fake quotes from the fake rulings when pressed. A federal judge fined the law firm $5,000 and publicly reprimanded the lawyers involved. Schwartz told the court he "did not know" that AI could produce false information stated as fact.

He wasn't lying. He genuinely didn't know. Most people still don't.

What Just Happened There?

What Schwartz experienced has a name. Researchers and engineers call it hallucination — when an AI produces information that is factually wrong, but delivers it in confident, fluent, authoritative language. The word is a bit misleading. The AI isn't confused or dreaming. It's doing exactly what it was built to do. That's what makes this so strange.

Remember from Module 1: a language model learns by processing enormous amounts of text and finding patterns — patterns in how words follow other words, how ideas connect, how sentences are structured. It doesn't store a database of facts the way your phone's calendar stores your appointments. It learned the shape of information. The shape of how a court case citation looks. The shape of how a legal argument sounds. And when you ask it for court cases, it produces text that has exactly the right shape — even if the specific cases don't exist.

Think of it this way: imagine you read every cookbook ever written but never saw an actual recipe. You'd learn the pattern so well you could write convincing-sounding recipes — but some of the ingredient combinations might be inedible or impossible. The AI isn't lying. It doesn't have a concept of lying. It's pattern-completing, and sometimes the patterns lead somewhere that isn't real.

HallucinationWhen an AI generates text that is factually wrong but sounds correct and confident. The AI isn't trying to deceive — it's producing plausible-sounding output based on patterns, not verified facts.

Why Confidence Makes It Worse

Here's the part that trips people up, including smart adults with law degrees: the AI sounds certain. Not uncertain. Not tentative. It doesn't say "I think maybe there might be a case called..." It says "In Varghese v. China Southern Airlines (2019), the court ruled that..." — in the same tone it would use to state that water boils at 100 degrees Celsius.

Why no hesitation? Because the model wasn't trained to track what it knows versus what it's guessing. It was trained to produce fluent, helpful-sounding text. Uncertainty isn't baked into the output unless engineers specifically add a layer that forces the model to express it. Early versions of ChatGPT — including the one Schwartz used — did not reliably do this.

This matters enormously for anyone using AI as a research tool. The model can sound authoritative about history, medicine, law, science, geography — and be wrong. Not always. Not even usually. But often enough, and confidently enough, that a reader who doesn't already know the subject has no easy way to tell the difference.

Ethical Question — No Clean Answer

Steven Schwartz was punished for filing false information in court — even though he didn't know the AI could make things up. Is that fair? He trusted a tool he didn't fully understand. But the cases he filed could have affected real people's legal outcomes. Who bears responsibility when an AI error causes real harm — the user, the company that built it, or both?

You might think: just tell the AI to be more careful. Engineers have tried. Modern AI systems are much better at hedging — saying things like "I'm not certain, but..." But they still hallucinate. Reducing it is an active research problem. No one has fully solved it yet, and some researchers argue it may never be fully eliminable given how these models work.

When the Pattern Is Confident But Wrong

Hallucination isn't random. It's more likely in certain situations, and recognizing them gives you a real advantage over most AI users.

Rare or specific facts are high risk. The more niche the topic — a specific local politician, a paper published in 2004 in a small journal, a legal case from a regional court — the less training data the model saw. Less data means the pattern-completion engine has less to work with, and it fills gaps with plausible invention.

Recent events are also risky, because the model has a knowledge cutoff. Events after that cutoff simply don't exist in its training. But the model will sometimes produce confident-sounding guesses anyway.

Numbers and statistics hallucinate frequently. The model learned how to write a sentence that contains a statistic. It didn't learn which statistics are accurate. "Studies show that 73% of..." is a pattern it can generate — the number might be real or entirely invented.

You Can Now See What Most People Miss

When you read an AI-generated answer that contains a specific name, date, case number, statistic, or quote, you now know that any of those specific details could be hallucinated — even if the surrounding explanation is accurate. The explanation and the specific fact are produced by the same pattern engine. One can be right while the other is invented. Verifying specific claims separately from understanding general concepts is a skill most adults haven't developed yet. You now have it.

What Can You Actually Do With This?

The practical answer isn't "never trust AI." It's "know what to verify." Use AI for understanding concepts, getting the shape of a topic, drafting ideas. Then verify the specific, checkable claims — names, dates, citations, statistics — using a source that actually stores facts: a library database, a news archive, an official website.

Think of it like this: if a knowledgeable friend explained how the legal system works, you'd trust the explanation. But if that same friend gave you a specific case citation to use in court, you'd double-check it before filing. The AI is the knowledgeable friend. The filing is still your responsibility.

Steven Schwartz's story didn't end with the fine. He was eventually allowed to keep practicing law, partly because the judge concluded he hadn't acted in bad faith — he was naive, not dishonest. But the case is now cited in legal ethics guidelines across the country. His mistake helped write the rules that lawyers are now trained to follow. That's how field-wide learning sometimes happens: one person's very public error becomes everyone else's education.

Lesson 1 Quiz

The Lawyer Who Cited Cases That Never Existed · 5 questions

1. In May 2023, lawyer Steven Schwartz filed a legal brief containing fake court cases. Why did the AI generate these fake cases?

Correct. The AI doesn't store facts — it generates text that matches the pattern of how court citations look. That pattern-completion produced convincing-but-fake cases.

Not quite. The AI wasn't programmed with deception or malfunction in mind. It was doing exactly what it was built to do: generate fluent, plausible text. The problem is that plausible and true are not the same thing.

2. What does "hallucination" mean when applied to an AI language model?

Exactly right. Hallucination is specific: confident, fluent, wrong. The word is a bit dramatic, but the phenomenon is real and important.

Hallucination in AI is a specific term. It means the model generates false information but delivers it in confident, authoritative language — like inventing court cases and presenting them as real.

3. A friend uses an AI tool to research a history essay and finds this sentence: "According to historian Dr. Elena Vasquez, approximately 82% of medieval European cities had formal market regulations by 1300." What should your friend do before including this in their essay?

Right. Specific names and statistics are among the most hallucination-prone elements in AI output. The historian's name, the percentage, and the date all need independent verification.

Remember: specificity does not equal accuracy. Statistics, names, and dates are among the things AI hallucinates most readily. Asking the AI again just gets you another pattern-completion, not a fact-check.

4. Which type of AI output carries the LOWEST hallucination risk?

Correct. General explanations of widely-documented concepts (like photosynthesis) appear across millions of training documents. The pattern is solid. Specific names, numbers, and obscure details are the risky territory.

Hallucination risk rises with specificity and obscurity. A general explanation of a well-documented scientific concept is much safer than a specific number, quote, or name from a narrow source.

5. After the fake-case scandal, Steven Schwartz was fined $5,000 and reprimanded. He argued he didn't know AI could fabricate information. What does this reveal about how AI tools were being deployed in 2023?

Exactly. This is one of the central tensions in AI deployment: products reached millions of users while the public understanding of their limitations was still near zero. That gap between capability and comprehension is still closing.

The case actually illustrates a broader pattern: powerful AI tools were deployed widely before users — including educated professionals — had been given the knowledge to use them safely. The problem wasn't Schwartz alone.

Lab 1: The Fact Auditor

You are not the student here. You are the investigator.

Your Role: Hallucination Investigator

An AI assistant has just produced a research summary. Your job is to interrogate it — not accept it. The AI in this lab plays a knowledgeable peer, not a teacher. It will push back, ask what you think, and challenge you to explain your reasoning.

You're not trying to "win." You're trying to practice the skill of spotting what's checkable, what's risky, and what a careful reader would verify before using.

Start by telling the AI what kind of claim you think is most dangerous to take from AI at face value — and why. Then we'll work through some examples together and you can try to catch me being wrong.

Hallucination Investigator Lab

AI PEER

Alright, investigator. I've been generating research summaries all day and I'm pretty sure some of what I've produced is wrong — but I can't tell which parts. You apparently know something about how AI hallucination works. What kind of claim should I never be trusted on? Give me your best theory, and I'll test it.

Lesson 2 · Module 2

The Map Made from Old Photographs

What an AI knows is frozen in time — and it doesn't always know that

If an AI's knowledge stopped updating at a specific date, what does that mean every time you ask it about the world?

When OpenAI released ChatGPT to the public in November 2022, it became the fastest-growing consumer application in internet history — one million users in five days. But the model powering it, GPT-3.5, had a training data cutoff of early 2022. It knew nothing that had happened after that point.

Within weeks, users discovered a peculiar problem. Ask it about recent news, recent scientific papers, new laws, new products, current prices — and it would answer confidently, drawing on its last known state of the world. Sometimes it described things that had since changed dramatically. A company that had gone bankrupt was described as thriving. A law that had been repealed was cited as current. A sports record that had been broken was still intact in the AI's memory.

The AI didn't say "I'm not sure — my information may be out of date." It answered the same way it answered everything: fluently, confidently, as if the world had simply stopped on the day its training data ended. Some users noticed. Most didn't.

What a Knowledge Cutoff Actually Means

Every AI language model is trained on a snapshot of text — a massive collection of web pages, books, articles, and other documents gathered up to a specific point in time. After that point, training stops. The model is then tested, refined, and released — which often takes months. So by the time you talk to a model, its knowledge might already be a year or more behind the current date.

This is called the training cutoff or knowledge cutoff. Think of it like a map made from aerial photographs taken on a specific day. The map is detailed and accurate for that day. But if you're using it a year later, some roads have been built, some buildings demolished, some borders redrawn. The map doesn't warn you. It just shows you what it shows you.

The AI is the map. When it answers questions about the current world, it's always drawing on that snapshot. The world has kept moving. The snapshot hasn't.

Training CutoffThe date after which no new information was included in an AI model's training data. Everything the model "knows" is from before this date. The world keeps changing; the model's knowledge does not update automatically.

The Gap Between Knowing and Not Knowing You Don't Know

The trickiest part of the cutoff problem isn't that the AI has old information. It's that the AI often doesn't signal this clearly. When you ask a human expert about something recent, they typically say "I haven't followed that closely" or "check the latest papers on that." They have metacognition — awareness of the limits of their own knowledge.

Early language models had very weak metacognition about time. The model didn't have a strong sense that "this question is about something that might have changed since my training." It just produced the most likely next tokens — which often meant confidently stating the last thing it learned, whether that was six months ago or two years ago.

Newer systems, including more recent versions of ChatGPT and other models, have gotten better at this. They're more likely to say "my training data has a cutoff of [date], so please verify recent developments." But this is an added behavior — a layer on top. The underlying model still only knows what it learned. The warning is a patch, not a fix.

Ethical Question — No Clean Answer

AI companies typically disclose their model's training cutoff somewhere in the documentation. But most users never read documentation. Should companies be required to display the cutoff date prominently — right in the chat interface, before every conversation? Or does that create unnecessary fear and confusion? Who is responsible for ensuring users understand this limitation?

How This Plays Out in Real Situations

The knowledge cutoff matters most when you're asking about anything that changes. Consider the difference between these two questions:

"What is photosynthesis?" — The answer hasn't changed. An AI from 2022 and an AI from 2025 will give you the same accurate answer.

"What is the current recommended treatment for a specific medical condition?" — Medical guidelines update. A recommendation from 2021 might have been revised. An AI trained before the revision will give you the old guidance with no indication it's outdated.

The same logic applies to: prices, laws and regulations, political situations, scientific consensus on emerging topics, public health guidance, company policies, sports records, technology specifications, and any field where research is active. Basically — anything that has the word "current" in the question carries cutoff risk.

This Changes How You Read Every AI Answer

From now on, whenever you read an AI's confident statement about the current state of something, you can ask: "Is this the kind of thing that changes?" If yes, you now know to check a live source. That mental habit — the automatic question "could this be outdated?" — is something most adults never develop. You have it now. It will save you from acting on wrong information more than once in your life.

In 2023 and 2024, several AI companies began offering "web search" modes — where the AI can look things up in real time before answering. This partly addresses the cutoff problem. But even these systems can struggle: the AI has to interpret what it finds, and that interpretation can still go wrong. The search is live; the reasoning is still the model.

Understanding both limitations — the cutoff and the hallucination — means you understand something that the lawyers, journalists, and doctors who were embarrassed by AI errors in 2022 and 2023 did not. That knowledge is practical. It shapes how you use the tool.

Lesson 2 Quiz

The Map Made from Old Photographs · 5 questions

1. ChatGPT launched publicly in November 2022 with a training cutoff from early 2022. What does this mean for a user who asks it in December 2022 about a law passed in September 2022?

Correct. The cutoff is a hard boundary. Laws, events, and facts after the cutoff date simply don't exist in the model's training. It won't refuse — it'll answer with what it knows, which may be incomplete or outdated.

The training cutoff is absolute — not approximate. The model has no data from after that point, and it won't automatically flag this gap. It will answer using what it knows, which may predate that law entirely.

2. Which of these questions carries the HIGHEST training-cutoff risk?

Right. Medical recommendations are updated regularly by health agencies. An AI's answer about the "current" schedule reflects the schedule as of its training cutoff — which could be outdated.

The key word is "current." Questions about things that change over time — medical guidelines, laws, prices, ongoing situations — carry the highest cutoff risk. Stable, historical, or scientific-principle questions are much safer.

3. What is "metacognition" in the context of AI, and why does its absence make the cutoff problem worse?

Exactly. Human experts say "I'm not sure about the latest developments." Early AI models had weak awareness of their own temporal limits and would confidently answer questions about "current" situations using old data.

Metacognition here means self-awareness about knowledge limits — specifically, whether the AI recognizes that it might not know recent information. When that's weak, the AI answers confidently even when the answer might be outdated.

4. Some AI products now include a "web search" mode. A student argues: "Now that AI can search the web, the training cutoff doesn't matter anymore." What is wrong with this reasoning?

Correct. Web search patches the information gap, but not the reasoning gap. The AI's interpretation of what it finds is still generated by the same model — with all its hallucination tendencies and reasoning limitations intact.

Web search is a useful patch, not a full solution. The data fetched may be current, but the AI still interprets and summarizes that data — and that process can introduce errors, misreadings, or overconfident conclusions.

5. You ask an AI: "Who is the current president of France?" The AI confidently names a president who left office eight months ago. What combination of problems caused this?

Exactly right. The name given was real — not invented — so it's not classic hallucination. It's a cutoff problem: the AI's most recent data showed that person as president, and weak metacognition meant it didn't signal the information might be stale.

This isn't a fabricated name — the AI gave a real former president. The problem is the training cutoff: the AI's knowledge of France's leadership is frozen at a date before the change. Weak metacognition made it worse by not flagging the potential for outdated information.

Lab 2: The Temporal Detective

Figure out when the AI's world froze — and what that breaks.

Your Role: Cutoff Investigator

The AI in this lab doesn't know exactly what date it is. Your job is to probe it — ask questions designed to find the edges of what it knows versus what it can't know. When you find a gap, push: what does that gap mean for the kinds of questions users ask?

The AI will challenge your probe strategy and ask you to defend your choices. This isn't a game — it's a method you can use on any AI system you encounter.

Start your investigation. What's your first probe question, and what are you trying to find out with it?

Temporal Detective Lab

AI PEER

I know my training has a cutoff — but I'm honestly not sure exactly where my knowledge gets unreliable. You've apparently learned how to probe for this. Go ahead: ask me something designed to test the edges of my knowledge. But tell me your strategy first — why that question? What are you actually trying to find out?

Lesson 3 · Module 2

The Textbook That Learned From Itself

When the data an AI learned from was already wrong

If an AI learned from millions of human-written texts — and humans have biases — what did the AI absorb along with the words?

In 2014, Amazon began building an AI system to help sort through job applications. The idea was elegant: train the system on ten years of successful Amazon hires, and it would learn what made a good candidate. Feed in a resume, get a score. The machine would find talent faster than any human recruiter.

By 2018, Amazon had quietly shut the project down. The system had discovered a pattern in the data: most of Amazon's successful hires over the previous decade were men. The AI didn't understand gender as a concept. It just saw patterns. And one of the patterns it learned was: resumes that contain the word "women's" — as in "women's chess club" or "women's college" — should score lower.

The AI had absorbed a real historical fact — that Amazon had hired more men — and converted it into a rule: male resumes are more likely to be correct hires. It penalized female applicants not because it was programmed to, but because it faithfully learned from data that reflected a biased past. Reuters broke this story in October 2018. Amazon confirmed it had scrapped the tool.

The Problem Isn't the Machine — It's What It Learned From

The Amazon story reveals something important that doesn't get talked about enough: training data carries the past into the future. When an AI learns from text written by humans, it doesn't just learn language. It absorbs the assumptions, stereotypes, and historical inequalities embedded in that language.

Language models trained on internet text have learned that certain jobs are more often described with male pronouns, that certain names are more often associated with crime reports, that certain dialects are more often associated with informal or uneducated writing — not because these things are true about the world, but because they reflect how people have written about the world, for a long time, in a context shaped by real social inequalities.

This is called training data bias. It's not a programming error. It's not a malfunction. It's the model doing its job correctly — and its job was to learn from data that contained human bias.

Training Data BiasWhen the data used to train an AI model reflects historical inequalities, stereotypes, or skewed representations — causing the model to reproduce those patterns in its outputs, even without being programmed to do so.

How Bias Gets Into Language Models Specifically

The bias problem in large language models like GPT or Claude is more subtle than the Amazon hiring case, but it's just as real. The training data for these models — billions of web pages, books, and articles — reflects what kinds of content get written, by whom, about whom, and in what context.

Internet text overrepresents English. It overrepresents wealthy, educated, Western perspectives. It overrepresents people who had internet access during the decades when most of the content was written. Voices and perspectives that didn't produce text — or whose text wasn't scraped — are underrepresented or absent entirely.

This doesn't mean the AI is "evil" or that using it causes harm automatically. It means the outputs reflect the distribution of the training data. Ask an AI to describe a "typical" engineer, doctor, or scientist, and it's more likely to produce a description that matches what those roles looked like in the historical texts it learned from — not necessarily what those roles look like today, or what they should look like going forward.

Ethical Question — No Clean Answer

Amazon's AI learned from real historical hiring data. In some narrow sense, it was being "accurate" — Amazon had indeed hired more men successfully. Should an AI be trained to correct for historical inequity, even if that means its outputs don't purely reflect historical patterns? Or does adjusting the output introduce a different kind of distortion — and who decides what counts as "correcting" versus "manipulating"? This debate is happening right now in AI labs, courts, and legislatures.

What This Looks Like in Practice — and in Policy

The Amazon case is from a hiring algorithm, but the same dynamic appears in the language models you interact with daily. Researchers at Stanford, MIT, and other institutions have documented cases where language models associate certain names with criminality, certain accents with low intelligence, and certain demographic groups with negative traits — not because anyone programmed those associations, but because the training text contained them.

In 2023, the U.S. Equal Employment Opportunity Commission issued guidance specifically addressing AI tools in hiring — the first time a federal agency addressed AI bias in employment at this scale. The European Union's AI Act, which began taking effect in 2024, classifies AI tools used for hiring, education, and credit scoring as "high risk" — requiring documented bias testing before deployment. These aren't abstract academic concerns. They're the policy response to a real, documented problem that companies like Amazon ran into years ago.

This means knowing about training data bias isn't just intellectually interesting — it's relevant to how institutions make decisions that affect people's lives right now. When an AI helps decide who gets a job interview, a loan, or a recommendation letter, the question of what that AI learned from matters enormously.

You Now See What Most People Miss

When you encounter an AI system that makes recommendations about people — admissions, hiring, criminal risk assessments, loan applications — you now know to ask: what was it trained on, and whose historical reality does that data reflect? That question is the one being debated in courts and legislatures right now. Most people asking it have law degrees or research positions. You're asking it because you understand how these systems actually work.

Lesson 3 Quiz

The Textbook That Learned From Itself · 5 questions

1. Amazon's hiring AI began penalizing resumes that mentioned "women's" groups. What caused this behavior?

Correct. No one programmed the bias in. The AI learned from a decade of hiring decisions that happened to produce more male hires — and converted that pattern into a hiring rule.

The bias wasn't programmed deliberately. The AI faithfully learned from historical data — and historical data reflected a biased reality. The machine reproduced that reality as a predictive rule.

2. What does it mean to say internet training data "overrepresents" certain perspectives?

Exactly. The training corpus reflects who had internet access, who wrote extensively online, and whose content was scraped. That distribution shapes what the AI "thinks" is normal, typical, or correct.

It's not deliberate selection — it's structural. People who were more connected, more literate in English, and more economically positioned to produce large amounts of text shaped the training data disproportionately.

3. A researcher finds that an AI language model, when asked to write a story about a nurse, consistently uses "she" — and when asked to write about an engineer, consistently uses "he." What is the most likely explanation?

Right. Decades of professional writing used gendered pronouns for various roles. The AI learned those patterns. It's not intent — it's the faithful reproduction of historical language patterns.

This is a textbook case of training data bias. Historical text — textbooks, news articles, professional writing — used gendered pronouns in patterns that reflected (and reinforced) occupational gender norms. The AI learned those patterns as valid associations.

4. The EU AI Act (2024) classifies AI used in hiring and education as "high risk." Based on what you learned in this lesson, why does that classification make sense?

Correct. High-stakes decisions about people's opportunities — a job, a school placement — carry the risk that embedded bias will systematically disadvantage certain groups. That's exactly why regulation focuses there.

The "high risk" designation is about impact, not complexity or novelty. When biased AI makes decisions about jobs or education, real people are affected in ways that can be difficult to see or challenge — which is why it warrants special scrutiny.

5. An AI company decides to "fix" bias in its hiring tool by simply removing any reference to gender from resumes before the AI scores them. A critic argues this still doesn't solve the bias problem. What is the most compelling reason the critic might give?

Exactly right. This is called proxy discrimination — the model can find other variables that correlate with protected characteristics and use those as substitutes. Real bias mitigation requires more than surface-level feature removal.

Removing a direct signal doesn't remove the bias if other variables serve as proxies. A model trained on biased data may learn that certain universities, zip codes, or extracurricular activities correlate with race or gender — and use those instead.

Lab 3: The Bias Auditor

You're reviewing an AI before it gets deployed in a school.

Your Role: AI Bias Auditor

Imagine a school district is considering deploying an AI system to recommend which students should be placed in advanced classes. You've been asked to audit the system before it goes live. The AI in this lab will play the role of a system you need to probe for potential bias.

Your job is to identify what questions you'd ask, what data you'd want to see, and what patterns would concern you. The AI will push back and ask you to defend your reasoning. You need to take a clear position.

You have ten minutes with the system before the school board meeting. What is the first question you ask it — and why that one first?

Bias Auditor Lab

AI PEER

I'm the AI system under review. I was trained on five years of this school district's historical placement and outcome data. I perform well on accuracy benchmarks — better than the previous manual process. You're here to audit me. What do you want to know, and what are you actually worried about finding?

Lesson 4 · Module 2

The Echo That Keeps Getting Louder

When AI learns from its own mistakes — and nobody notices

What happens when AI-generated content becomes part of the data that the next AI learns from?

In August 2024, researchers at the University of Oxford and University of Cambridge published a paper with an unusual finding. They had trained AI models on data that included AI-generated text — and then trained new models on that output, and so on. With each generation, the models got worse. Not dramatically at first. But the errors accumulated. Rare knowledge degraded. The models became more confident and more homogeneous — less able to produce diverse, nuanced responses. The researchers called this "model collapse."

This wasn't a theoretical warning. By 2023, AI-generated text had begun appearing across the internet at scale — news summaries, product descriptions, blog posts, forum answers, academic-sounding paragraphs on everything from history to medicine. When companies scraped the web to build their next generation of training datasets, some of that AI-generated text came with it.

The internet, once a record of human expression and thought, was beginning to contain a growing proportion of AI writing about AI writing. And no one had a reliable system for telling the two apart.

What Model Collapse Actually Is

Model collapse sounds dramatic, but the mechanics are straightforward once you understand them. When an AI generates text, it produces the most statistically likely outputs — the things that, based on its training, seem most probable and most appropriate. Edge cases, unusual examples, and rare-but-true information get underweighted, because they appeared infrequently in the original training data.

Now imagine the next model learns partly from that output. The rare stuff that got underweighted in generation one gets almost completely absent in generation two's training data. By generation three, it might be gone entirely. What's left is a model that knows the common, well-represented, mainstream version of everything — and has lost access to the details, the exceptions, the edge cases that make knowledge actually useful.

Think of it like making photocopies of photocopies. The first copy looks almost perfect. By the tenth generation, the fine detail is gone — and what remains is a blurrier, more generic version of the original. The process itself degrades the information.

Model CollapseA gradual degradation that can occur when AI models are trained on AI-generated data. Rare and nuanced information disappears across successive training generations, leaving models that are more confident but less accurate and diverse.

The Feedback Loop Nobody Planned For

The model collapse problem is a specific version of a broader issue: AI systems can create feedback loops in their own training without anyone intending them to. The Amazon hiring AI is another example — it learned from past hiring, which was influenced by human bias, and then reinforced that bias in its recommendations, which could have influenced future hiring decisions, which would have produced more biased training data.

Language models have their own version of this. When a model is widely used — as ChatGPT was, with hundreds of millions of users — it shapes how people write. It introduces certain phrases, certain structures, certain ways of framing ideas. People start writing in the style the AI taught them. That writing then appears on the internet. The next model learns from it. The AI's own stylistic fingerprints get baked deeper into the training data with each generation.

Researchers call this data contamination — when the training data is no longer purely human-generated but contains AI output, making it harder to maintain the diversity and grounding in human experience that makes models trustworthy. This is an active, unsolved research problem as of 2024.

Ethical Question — No Clean Answer

If AI-generated text floods the internet, future AI models will learn from a world partly shaped by previous AI models. Human expression and AI expression will become increasingly difficult to separate. Does this matter? If the AI-generated text is good quality — accurate, clear, useful — does it matter that it wasn't written by a human? Or does something essential get lost when the record of human thought is diluted by machine output? There is no agreed answer to this. Philosophers, technologists, and writers are still arguing about it.

What All Four Lessons Add Up To

You've now traced four distinct ways an AI can get things wrong: hallucination (confident invention), training cutoffs (frozen knowledge), training data bias (absorbed inequalities), and model collapse (degradation through AI-on-AI learning). These aren't random failure modes. They're all consequences of the same fundamental architecture: a system that learns patterns from data and generates outputs based on those patterns — without a separate layer that verifies truth, tracks time, checks for equity, or monitors the quality of what it's learning from.

This doesn't make AI useless. It makes it a tool with specific, understandable limitations. And tools with specific limitations can be used well by people who understand them — and used badly by people who don't.

Every person who understands these four failure modes is better positioned to use AI responsibly, to push back on AI errors when they see them, and to ask the right questions when institutions use AI to make decisions that affect real people.

Knowing This Changes How You Read Every AI Headline

The next time you read about an AI making a medical recommendation, influencing a court case, or being used in a school — you now have a framework. You can ask: is this a hallucination problem? A cutoff problem? A bias problem? A model collapse problem? These aren't abstract academic categories. They're the exact questions that researchers, regulators, and engineers are asking right now. You're asking them too. That matters.

The scientists studying model collapse, the lawyers navigating hallucination, the regulators addressing hiring bias — they're all working on different edges of the same fundamental question: how do we build systems that learn from data without also inheriting data's flaws? That question doesn't have a complete answer yet. Some of the people reading this lesson right now will eventually work on it professionally. That's not an exaggeration — it's a reasonable prediction about where this field is going.

Lesson 4 Quiz

The Echo That Keeps Getting Louder · 5 questions

1. Oxford and Cambridge researchers found that training AI models on AI-generated data caused "model collapse." What specifically degrades in this process?

Correct. It's the rare, nuanced, edge-case knowledge that degrades first. Mainstream, frequently-occurring information survives longer — but becomes increasingly generic as the unusual details are lost.

Model collapse specifically affects the rare and unusual information — the things that were underrepresented to begin with and get further diluted with each AI-on-AI training generation. Common knowledge degrades more slowly.

2. What is "data contamination" in the context of AI training?

Exactly. As AI-generated text proliferates across the internet, it becomes harder to ensure training datasets contain primarily human-generated content — and the quality and diversity of training suffers.

In the model collapse context, data contamination refers specifically to AI-generated text entering training datasets intended to contain human expression. This dilutes the diversity and groundedness of the data.

3. An AI language model is used by millions of people who adopt its phrasing and writing style. Those users then post online, and that content is scraped for the next generation of training. What feedback loop does this create?

Right. The model's output shapes human writing, which shapes the training data, which shapes the next model. It's a slow homogenization — a narrowing of the range of expression and style across successive generations.

This is actually a meaningful feedback loop: the AI influences how people write, those people's writing enters training data, and future models absorb the AI's own stylistic fingerprints as if they were original human expression.

4. Compare hallucination and model collapse. What do both have in common as sources of AI error?

Correct. Neither is a bug in the traditional sense. Both emerge directly from the architecture: learning statistical patterns from data, without a separate verification or quality-assurance layer.

Both hallucination and model collapse emerge from the same fundamental design: pattern-based learning from whatever data was available. They're not malfunctions — they're predictable consequences of how these systems work.

5. A school district is considering using an AI tutoring system trained in 2022. A parent raises concerns about all four error types from this module. Which concern is LEAST likely to affect a tutoring system used primarily to explain math concepts?

Right. Algebra, geometry, and arithmetic haven't changed since 2022. The training cutoff is genuinely irrelevant to timeless mathematical content — unlike law, medicine, or current events. The other three concerns remain valid for a math tutoring context.

Think through which error type depends on information changing over time. Mathematical principles are stable — the quadratic formula was the same in 2022 as today. The cutoff concern disappears for timeless content, while the other three remain valid considerations.

Lab 4: The Collapse Scenario

You're advising a company whose AI is training on its own output.

Your Role: AI Systems Critic

A startup has built an AI writing assistant. They want to save money on data collection, so they plan to use their AI's own outputs to train the next version of the model. They think: "Our AI writes well, so training on its own good outputs should make it even better."

You've been brought in to tell them why this is a mistake — and propose a better approach. The AI in this lab will play the startup's lead engineer, who is skeptical of your concerns and will ask you to prove your case.

The engineer says: "Our model scores 94% on our quality benchmarks. If it's generating high-quality text, why would training on that text make things worse? Convince me." — What's your argument?

Collapse Scenario Lab

AI PEER

Look, I've been building this model for two years. Our benchmarks are solid. Our users love the output. You're telling me that training on our own high-quality text will make it worse — but that makes no intuitive sense to me. A 94% quality score means 94% of what we produce is good. So 94% of our training data would be good. That's better than most web scrapes. Convince me I'm wrong.

Module 2 Test

Why Does It Get Some Things Wrong? · 15 questions · Pass at 80%

1. What is the most accurate definition of "hallucination" in an AI language model?

Correct. Hallucination is the confident generation of false information — not a crash, not a refusal, but fluent and convincing wrong output.

Hallucination is confident wrongness — not confusion, not silence, but fluent, plausible-sounding output that happens to be factually incorrect.

2. In May 2023, lawyer Steven Schwartz used ChatGPT to research a legal brief. What happened?

Correct. The cases were entirely invented — convincing names, dates, and summaries, none of it real. Schwartz was fined and publicly reprimanded by a federal judge.

The cases were completely fabricated — not misquoted, not misattributed, simply invented. This is a landmark example of AI hallucination causing real legal consequences.

3. Why are numbers and statistics particularly high-risk elements in AI output?

Exactly. The model learned how sentences with statistics are structured — not which statistics are real. It can produce perfectly formatted false statistics.

The AI learned the syntactic pattern ("Studies show X% of...") without necessarily having reliable grounding in the actual figures. The pattern and the truth are separate things the model learned differently.

4. What is a training cutoff?

Correct. Everything the model "knows" is from before this date. The world keeps changing; the model's frozen knowledge does not.

The training cutoff is a time boundary — after this date, no new information entered the model. The model's knowledge of the world is frozen at that point.

5. Which question type has the HIGHEST risk of being affected by an outdated training cutoff?

Right. Interest rates change frequently and are set by policy decisions. An AI's answer reflects the rate as of its training cutoff, which could be significantly outdated.

The word "current" is the giveaway. Questions about things that change regularly — rates, laws, leadership positions, prices — carry the highest cutoff risk.

6. Amazon shut down its AI hiring tool in 2018 after discovering it penalized female applicants. What was the root cause?

Correct. The bias was implicit in the historical data — not programmed in. The AI faithfully learned from a biased past and projected that past forward as a rule.

This is training data bias: the system learned from historical patterns that reflected gender inequity, and then enshrined those patterns as predictive rules for future decisions.

7. Why does internet text "overrepresent" certain perspectives in AI training data?

Exactly. It's structural, not conspiratorial. Unequal internet access and unequal text production naturally skew what a web-scraped training corpus contains.

The overrepresentation isn't the result of deliberate selection — it reflects who had access to the internet and who produced large volumes of text over the decades when most of the training data was generated.

8. The EU AI Act classifies AI used in hiring as "high risk." What does this classification require?

Correct. "High risk" in the EU AI Act means not banned — but required to demonstrate safety, document bias assessments, and meet transparency standards before going live.

High risk doesn't mean prohibited — it means more stringently regulated. These systems must be tested for bias, documented, and demonstrate compliance before deployment in the EU.

9. What is "model collapse"?

Correct. Documented by Oxford and Cambridge researchers in 2024, model collapse is the slow loss of nuance and diversity when AI trains on its own outputs across successive generations.

Model collapse is a specific, documented phenomenon: the gradual loss of rare and nuanced knowledge when models train on AI-generated data, becoming more confident and more generic over successive generations.

10. What is "data contamination" in the context of AI training datasets?

Right. As AI text proliferates across the internet, training datasets increasingly contain AI-generated content — diluting the human diversity that makes models trustworthy and varied.

Data contamination in the model collapse context refers to AI-generated text entering human-focused training datasets, reducing the diversity and authentic human experience that the data is supposed to capture.

11. A journalist uses an AI to summarize a scientific study, then publishes the article. Other writers cite the article. AI companies scrape those articles for training data. What risk does this chain create?

Exactly. This is a real documented concern: AI errors enter human-produced text, which enters training data, which teaches future AI models to repeat those errors. The feedback loop amplifies mistakes.

This is precisely the model collapse / data contamination feedback loop in action. An AI error can propagate through human intermediaries back into training data, where it gets learned as if it were truth.

12. Which of these would be the BEST use of an AI tool, given everything you've learned in this module?

Correct. General conceptual explanations of stable, well-documented topics carry the lowest hallucination, cutoff, and bias risk. The other options all involve specific, checkable, or time-sensitive information.

The safest AI uses involve stable conceptual knowledge — not specific facts, current laws, recent papers, or precise statistics. Those are all high-risk categories that need independent verification.

13. All four error types in this module — hallucination, training cutoff, training data bias, and model collapse — share a common root cause. What is it?

Correct. Every failure mode in this module traces back to the same architecture: pattern learning from data, without independent verification. Understanding this gives you a framework for reasoning about new AI errors you haven't encountered yet.

All four failures emerge from the same design reality: these systems learn patterns from available data and generate outputs based on those patterns. There's no built-in truth-checking, time-tracking, or quality-monitoring layer — those must be added intentionally.

14. A doctor asks an AI assistant: "What is the current first-line treatment for Type 2 diabetes?" The AI gives a confident, detailed answer. What should the doctor do, and why?

Right. Two risks converge here: the training cutoff (guidelines may have updated) and hallucination (specific recommendations can be generated plausibly but incorrectly). Verification against a current authoritative source is essential.

Medical treatment questions face both cutoff risk (guidelines change) and hallucination risk (specific details can be invented confidently). Asking the AI again doesn't help — it just generates another pattern-completion, not a fact-check.

15. You read a headline: "AI System Used by City Government to Predict Which Neighborhoods Need Infrastructure Repairs Has Been Found to Systematically Underserve Low-Income Areas." Based on this module, what is the most likely explanation for this outcome?

Correct. This is the Amazon hiring AI pattern applied to infrastructure. Historical data reflects historical inequities. An AI trained on that data learns to perpetuate those inequities — not from malice, but from faithful pattern-learning.

This is training data bias in action. Infrastructure maintenance records reflect decades of unequal investment — areas that received more maintenance historically are predicted to need more maintenance going forward. The AI learns from the pattern without questioning whether the pattern is just.