On a Tuesday morning in May 2023, a lawyer named Steven Schwartz filed a legal brief in a federal court in Manhattan. The brief was impressive — it cited more than half a dozen court cases, complete with judges' names, dates, and detailed summaries of what each ruling said. The opposing lawyers read it and immediately knew something was wrong.
The cases didn't exist. Not one of them. Varghese v. China Southern Airlines. Martinez v. Delta Air Lines. Shaboon v. Egypt Air. Each case had a convincing name, a plausible date, a real-sounding citation. And each one was entirely made up.
Schwartz had used ChatGPT to help research his brief. When he asked the AI if the cases were real, it said yes — and even generated fake quotes from the fake rulings when pressed. A federal judge fined the law firm $5,000 and publicly reprimanded the lawyers involved. Schwartz told the court he "did not know" that AI could produce false information stated as fact.
He wasn't lying. He genuinely didn't know. Most people still don't.
What Schwartz experienced has a name. Researchers and engineers call it hallucination — when an AI produces information that is factually wrong, but delivers it in confident, fluent, authoritative language. The word is a bit misleading. The AI isn't confused or dreaming. It's doing exactly what it was built to do. That's what makes this so strange.
Remember from Module 1: a language model learns by processing enormous amounts of text and finding patterns — patterns in how words follow other words, how ideas connect, how sentences are structured. It doesn't store a database of facts the way your phone's calendar stores your appointments. It learned the shape of information. The shape of how a court case citation looks. The shape of how a legal argument sounds. And when you ask it for court cases, it produces text that has exactly the right shape — even if the specific cases don't exist.
Think of it this way: imagine you read every cookbook ever written but never saw an actual recipe. You'd learn the pattern so well you could write convincing-sounding recipes — but some of the ingredient combinations might be inedible or impossible. The AI isn't lying. It doesn't have a concept of lying. It's pattern-completing, and sometimes the patterns lead somewhere that isn't real.
Here's the part that trips people up, including smart adults with law degrees: the AI sounds certain. Not uncertain. Not tentative. It doesn't say "I think maybe there might be a case called..." It says "In Varghese v. China Southern Airlines (2019), the court ruled that..." — in the same tone it would use to state that water boils at 100 degrees Celsius.
Why no hesitation? Because the model wasn't trained to track what it knows versus what it's guessing. It was trained to produce fluent, helpful-sounding text. Uncertainty isn't baked into the output unless engineers specifically add a layer that forces the model to express it. Early versions of ChatGPT — including the one Schwartz used — did not reliably do this.
This matters enormously for anyone using AI as a research tool. The model can sound authoritative about history, medicine, law, science, geography — and be wrong. Not always. Not even usually. But often enough, and confidently enough, that a reader who doesn't already know the subject has no easy way to tell the difference.
Steven Schwartz was punished for filing false information in court — even though he didn't know the AI could make things up. Is that fair? He trusted a tool he didn't fully understand. But the cases he filed could have affected real people's legal outcomes. Who bears responsibility when an AI error causes real harm — the user, the company that built it, or both?
You might think: just tell the AI to be more careful. Engineers have tried. Modern AI systems are much better at hedging — saying things like "I'm not certain, but..." But they still hallucinate. Reducing it is an active research problem. No one has fully solved it yet, and some researchers argue it may never be fully eliminable given how these models work.
Hallucination isn't random. It's more likely in certain situations, and recognizing them gives you a real advantage over most AI users.
Rare or specific facts are high risk. The more niche the topic — a specific local politician, a paper published in 2004 in a small journal, a legal case from a regional court — the less training data the model saw. Less data means the pattern-completion engine has less to work with, and it fills gaps with plausible invention.
Recent events are also risky, because the model has a knowledge cutoff. Events after that cutoff simply don't exist in its training. But the model will sometimes produce confident-sounding guesses anyway.
Numbers and statistics hallucinate frequently. The model learned how to write a sentence that contains a statistic. It didn't learn which statistics are accurate. "Studies show that 73% of..." is a pattern it can generate — the number might be real or entirely invented.
When you read an AI-generated answer that contains a specific name, date, case number, statistic, or quote, you now know that any of those specific details could be hallucinated — even if the surrounding explanation is accurate. The explanation and the specific fact are produced by the same pattern engine. One can be right while the other is invented. Verifying specific claims separately from understanding general concepts is a skill most adults haven't developed yet. You now have it.
The practical answer isn't "never trust AI." It's "know what to verify." Use AI for understanding concepts, getting the shape of a topic, drafting ideas. Then verify the specific, checkable claims — names, dates, citations, statistics — using a source that actually stores facts: a library database, a news archive, an official website.
Think of it like this: if a knowledgeable friend explained how the legal system works, you'd trust the explanation. But if that same friend gave you a specific case citation to use in court, you'd double-check it before filing. The AI is the knowledgeable friend. The filing is still your responsibility.
Steven Schwartz's story didn't end with the fine. He was eventually allowed to keep practicing law, partly because the judge concluded he hadn't acted in bad faith — he was naive, not dishonest. But the case is now cited in legal ethics guidelines across the country. His mistake helped write the rules that lawyers are now trained to follow. That's how field-wide learning sometimes happens: one person's very public error becomes everyone else's education.
An AI assistant has just produced a research summary. Your job is to interrogate it — not accept it. The AI in this lab plays a knowledgeable peer, not a teacher. It will push back, ask what you think, and challenge you to explain your reasoning.
You're not trying to "win." You're trying to practice the skill of spotting what's checkable, what's risky, and what a careful reader would verify before using.
When OpenAI released ChatGPT to the public in November 2022, it became the fastest-growing consumer application in internet history — one million users in five days. But the model powering it, GPT-3.5, had a training data cutoff of early 2022. It knew nothing that had happened after that point.
Within weeks, users discovered a peculiar problem. Ask it about recent news, recent scientific papers, new laws, new products, current prices — and it would answer confidently, drawing on its last known state of the world. Sometimes it described things that had since changed dramatically. A company that had gone bankrupt was described as thriving. A law that had been repealed was cited as current. A sports record that had been broken was still intact in the AI's memory.
The AI didn't say "I'm not sure — my information may be out of date." It answered the same way it answered everything: fluently, confidently, as if the world had simply stopped on the day its training data ended. Some users noticed. Most didn't.
Every AI language model is trained on a snapshot of text — a massive collection of web pages, books, articles, and other documents gathered up to a specific point in time. After that point, training stops. The model is then tested, refined, and released — which often takes months. So by the time you talk to a model, its knowledge might already be a year or more behind the current date.
This is called the training cutoff or knowledge cutoff. Think of it like a map made from aerial photographs taken on a specific day. The map is detailed and accurate for that day. But if you're using it a year later, some roads have been built, some buildings demolished, some borders redrawn. The map doesn't warn you. It just shows you what it shows you.
The AI is the map. When it answers questions about the current world, it's always drawing on that snapshot. The world has kept moving. The snapshot hasn't.
The trickiest part of the cutoff problem isn't that the AI has old information. It's that the AI often doesn't signal this clearly. When you ask a human expert about something recent, they typically say "I haven't followed that closely" or "check the latest papers on that." They have metacognition — awareness of the limits of their own knowledge.
Early language models had very weak metacognition about time. The model didn't have a strong sense that "this question is about something that might have changed since my training." It just produced the most likely next tokens — which often meant confidently stating the last thing it learned, whether that was six months ago or two years ago.
Newer systems, including more recent versions of ChatGPT and other models, have gotten better at this. They're more likely to say "my training data has a cutoff of [date], so please verify recent developments." But this is an added behavior — a layer on top. The underlying model still only knows what it learned. The warning is a patch, not a fix.
AI companies typically disclose their model's training cutoff somewhere in the documentation. But most users never read documentation. Should companies be required to display the cutoff date prominently — right in the chat interface, before every conversation? Or does that create unnecessary fear and confusion? Who is responsible for ensuring users understand this limitation?
The knowledge cutoff matters most when you're asking about anything that changes. Consider the difference between these two questions:
"What is photosynthesis?" — The answer hasn't changed. An AI from 2022 and an AI from 2025 will give you the same accurate answer.
"What is the current recommended treatment for a specific medical condition?" — Medical guidelines update. A recommendation from 2021 might have been revised. An AI trained before the revision will give you the old guidance with no indication it's outdated.
The same logic applies to: prices, laws and regulations, political situations, scientific consensus on emerging topics, public health guidance, company policies, sports records, technology specifications, and any field where research is active. Basically — anything that has the word "current" in the question carries cutoff risk.
From now on, whenever you read an AI's confident statement about the current state of something, you can ask: "Is this the kind of thing that changes?" If yes, you now know to check a live source. That mental habit — the automatic question "could this be outdated?" — is something most adults never develop. You have it now. It will save you from acting on wrong information more than once in your life.
In 2023 and 2024, several AI companies began offering "web search" modes — where the AI can look things up in real time before answering. This partly addresses the cutoff problem. But even these systems can struggle: the AI has to interpret what it finds, and that interpretation can still go wrong. The search is live; the reasoning is still the model.
Understanding both limitations — the cutoff and the hallucination — means you understand something that the lawyers, journalists, and doctors who were embarrassed by AI errors in 2022 and 2023 did not. That knowledge is practical. It shapes how you use the tool.
The AI in this lab doesn't know exactly what date it is. Your job is to probe it — ask questions designed to find the edges of what it knows versus what it can't know. When you find a gap, push: what does that gap mean for the kinds of questions users ask?
The AI will challenge your probe strategy and ask you to defend your choices. This isn't a game — it's a method you can use on any AI system you encounter.
In 2014, Amazon began building an AI system to help sort through job applications. The idea was elegant: train the system on ten years of successful Amazon hires, and it would learn what made a good candidate. Feed in a resume, get a score. The machine would find talent faster than any human recruiter.
By 2018, Amazon had quietly shut the project down. The system had discovered a pattern in the data: most of Amazon's successful hires over the previous decade were men. The AI didn't understand gender as a concept. It just saw patterns. And one of the patterns it learned was: resumes that contain the word "women's" — as in "women's chess club" or "women's college" — should score lower.
The AI had absorbed a real historical fact — that Amazon had hired more men — and converted it into a rule: male resumes are more likely to be correct hires. It penalized female applicants not because it was programmed to, but because it faithfully learned from data that reflected a biased past. Reuters broke this story in October 2018. Amazon confirmed it had scrapped the tool.
The Amazon story reveals something important that doesn't get talked about enough: training data carries the past into the future. When an AI learns from text written by humans, it doesn't just learn language. It absorbs the assumptions, stereotypes, and historical inequalities embedded in that language.
Language models trained on internet text have learned that certain jobs are more often described with male pronouns, that certain names are more often associated with crime reports, that certain dialects are more often associated with informal or uneducated writing — not because these things are true about the world, but because they reflect how people have written about the world, for a long time, in a context shaped by real social inequalities.
This is called training data bias. It's not a programming error. It's not a malfunction. It's the model doing its job correctly — and its job was to learn from data that contained human bias.
The bias problem in large language models like GPT or Claude is more subtle than the Amazon hiring case, but it's just as real. The training data for these models — billions of web pages, books, and articles — reflects what kinds of content get written, by whom, about whom, and in what context.
Internet text overrepresents English. It overrepresents wealthy, educated, Western perspectives. It overrepresents people who had internet access during the decades when most of the content was written. Voices and perspectives that didn't produce text — or whose text wasn't scraped — are underrepresented or absent entirely.
This doesn't mean the AI is "evil" or that using it causes harm automatically. It means the outputs reflect the distribution of the training data. Ask an AI to describe a "typical" engineer, doctor, or scientist, and it's more likely to produce a description that matches what those roles looked like in the historical texts it learned from — not necessarily what those roles look like today, or what they should look like going forward.
Amazon's AI learned from real historical hiring data. In some narrow sense, it was being "accurate" — Amazon had indeed hired more men successfully. Should an AI be trained to correct for historical inequity, even if that means its outputs don't purely reflect historical patterns? Or does adjusting the output introduce a different kind of distortion — and who decides what counts as "correcting" versus "manipulating"? This debate is happening right now in AI labs, courts, and legislatures.
The Amazon case is from a hiring algorithm, but the same dynamic appears in the language models you interact with daily. Researchers at Stanford, MIT, and other institutions have documented cases where language models associate certain names with criminality, certain accents with low intelligence, and certain demographic groups with negative traits — not because anyone programmed those associations, but because the training text contained them.
In 2023, the U.S. Equal Employment Opportunity Commission issued guidance specifically addressing AI tools in hiring — the first time a federal agency addressed AI bias in employment at this scale. The European Union's AI Act, which began taking effect in 2024, classifies AI tools used for hiring, education, and credit scoring as "high risk" — requiring documented bias testing before deployment. These aren't abstract academic concerns. They're the policy response to a real, documented problem that companies like Amazon ran into years ago.
This means knowing about training data bias isn't just intellectually interesting — it's relevant to how institutions make decisions that affect people's lives right now. When an AI helps decide who gets a job interview, a loan, or a recommendation letter, the question of what that AI learned from matters enormously.
When you encounter an AI system that makes recommendations about people — admissions, hiring, criminal risk assessments, loan applications — you now know to ask: what was it trained on, and whose historical reality does that data reflect? That question is the one being debated in courts and legislatures right now. Most people asking it have law degrees or research positions. You're asking it because you understand how these systems actually work.
Imagine a school district is considering deploying an AI system to recommend which students should be placed in advanced classes. You've been asked to audit the system before it goes live. The AI in this lab will play the role of a system you need to probe for potential bias.
Your job is to identify what questions you'd ask, what data you'd want to see, and what patterns would concern you. The AI will push back and ask you to defend your reasoning. You need to take a clear position.
In August 2024, researchers at the University of Oxford and University of Cambridge published a paper with an unusual finding. They had trained AI models on data that included AI-generated text — and then trained new models on that output, and so on. With each generation, the models got worse. Not dramatically at first. But the errors accumulated. Rare knowledge degraded. The models became more confident and more homogeneous — less able to produce diverse, nuanced responses. The researchers called this "model collapse."
This wasn't a theoretical warning. By 2023, AI-generated text had begun appearing across the internet at scale — news summaries, product descriptions, blog posts, forum answers, academic-sounding paragraphs on everything from history to medicine. When companies scraped the web to build their next generation of training datasets, some of that AI-generated text came with it.
The internet, once a record of human expression and thought, was beginning to contain a growing proportion of AI writing about AI writing. And no one had a reliable system for telling the two apart.
Model collapse sounds dramatic, but the mechanics are straightforward once you understand them. When an AI generates text, it produces the most statistically likely outputs — the things that, based on its training, seem most probable and most appropriate. Edge cases, unusual examples, and rare-but-true information get underweighted, because they appeared infrequently in the original training data.
Now imagine the next model learns partly from that output. The rare stuff that got underweighted in generation one gets almost completely absent in generation two's training data. By generation three, it might be gone entirely. What's left is a model that knows the common, well-represented, mainstream version of everything — and has lost access to the details, the exceptions, the edge cases that make knowledge actually useful.
Think of it like making photocopies of photocopies. The first copy looks almost perfect. By the tenth generation, the fine detail is gone — and what remains is a blurrier, more generic version of the original. The process itself degrades the information.
The model collapse problem is a specific version of a broader issue: AI systems can create feedback loops in their own training without anyone intending them to. The Amazon hiring AI is another example — it learned from past hiring, which was influenced by human bias, and then reinforced that bias in its recommendations, which could have influenced future hiring decisions, which would have produced more biased training data.
Language models have their own version of this. When a model is widely used — as ChatGPT was, with hundreds of millions of users — it shapes how people write. It introduces certain phrases, certain structures, certain ways of framing ideas. People start writing in the style the AI taught them. That writing then appears on the internet. The next model learns from it. The AI's own stylistic fingerprints get baked deeper into the training data with each generation.
Researchers call this data contamination — when the training data is no longer purely human-generated but contains AI output, making it harder to maintain the diversity and grounding in human experience that makes models trustworthy. This is an active, unsolved research problem as of 2024.
If AI-generated text floods the internet, future AI models will learn from a world partly shaped by previous AI models. Human expression and AI expression will become increasingly difficult to separate. Does this matter? If the AI-generated text is good quality — accurate, clear, useful — does it matter that it wasn't written by a human? Or does something essential get lost when the record of human thought is diluted by machine output? There is no agreed answer to this. Philosophers, technologists, and writers are still arguing about it.
You've now traced four distinct ways an AI can get things wrong: hallucination (confident invention), training cutoffs (frozen knowledge), training data bias (absorbed inequalities), and model collapse (degradation through AI-on-AI learning). These aren't random failure modes. They're all consequences of the same fundamental architecture: a system that learns patterns from data and generates outputs based on those patterns — without a separate layer that verifies truth, tracks time, checks for equity, or monitors the quality of what it's learning from.
This doesn't make AI useless. It makes it a tool with specific, understandable limitations. And tools with specific limitations can be used well by people who understand them — and used badly by people who don't.
Every person who understands these four failure modes is better positioned to use AI responsibly, to push back on AI errors when they see them, and to ask the right questions when institutions use AI to make decisions that affect real people.
The next time you read about an AI making a medical recommendation, influencing a court case, or being used in a school — you now have a framework. You can ask: is this a hallucination problem? A cutoff problem? A bias problem? A model collapse problem? These aren't abstract academic categories. They're the exact questions that researchers, regulators, and engineers are asking right now. You're asking them too. That matters.
The scientists studying model collapse, the lawyers navigating hallucination, the regulators addressing hiring bias — they're all working on different edges of the same fundamental question: how do we build systems that learn from data without also inheriting data's flaws? That question doesn't have a complete answer yet. Some of the people reading this lesson right now will eventually work on it professionally. That's not an exaggeration — it's a reasonable prediction about where this field is going.
A startup has built an AI writing assistant. They want to save money on data collection, so they plan to use their AI's own outputs to train the next version of the model. They think: "Our AI writes well, so training on its own good outputs should make it even better."
You've been brought in to tell them why this is a mistake — and propose a better approach. The AI in this lab will play the startup's lead engineer, who is skeptical of your concerns and will ask you to prove your case.