In January 2023, a law professor at Drexel University named Jonathan Turley typed his own name into ChatGPT and asked it to describe any legal scandals he'd been involved in. The AI produced a detailed, confident story: Turley had allegedly made inappropriate comments toward female students on a class trip to Alaska, and the Washington Post had covered the incident.
None of it happened. The Washington Post article didn't exist. The trip didn't happen. The comments were invented. When researchers investigated, they found the AI had apparently stitched together real names and plausible-sounding events from different sources into a story that had never occurred — and then presented it as fact.
Turley's case became one of the first documented examples of what researchers were already calling "hallucination" — when AI generates text that is confident, fluent, and completely false. He wasn't the only one. The same investigation found fabricated cases involving other law professors at multiple universities.
To understand why an AI would invent a story about a real person and present it as fact, you need to understand where AI answers come from — because it's not where most people think.
An AI like ChatGPT doesn't search the internet every time you ask it something. It was trained — meaning it read an enormous pile of text (books, websites, news articles, forums, academic papers) before it ever talked to anyone. During that training, it learned patterns: what words usually follow other words, what topics relate to each other, how confident-sounding sentences are structured.
When you ask it a question, it's not retrieving a stored answer. It's predicting what a good-sounding answer would look like, based on everything it absorbed during training. Most of the time, that prediction matches reality. But sometimes — like in Turley's case — it generates something plausible-sounding that happens to be false.
Here's the part that surprises most people: the AI doesn't have a separate fact-checker running inside it. When it generates an answer, it doesn't have access to a list of "things I know are true" vs. "things I'm guessing." It just generates text that fits the pattern of a correct-sounding response.
Imagine you were asked to write a paragraph about a topic you half-remember from a book you read two years ago. You might accidentally mix up details, combine events from different stories, or invent a specific date that feels right but isn't. You wouldn't be lying — you'd just be filling in gaps with what seemed plausible. That's very close to what AI does, except it's doing it with training data from billions of sources, and it does it for every single response.
The Turley case is especially instructive because the AI didn't just get a fact slightly wrong — it invented an entire news article, an entire event, with a real person's name attached. The more a topic sounds like something that should have happened (a professor + a workplace complaint + a news story), the more confidently the AI might generate a version of it, even if the specific version never occurred.
AI systems often express the same level of confidence whether they're saying something true or something invented. In the Turley case, the AI didn't say "I think there might have been a complaint" — it described a specific article in a specific newspaper. That confident framing is part of what makes hallucinations dangerous.
To appreciate why this problem is hard to fix, consider the size of what we're talking about. GPT-4, the model behind ChatGPT, is estimated to have trained on somewhere between 45 and 100 terabytes of text data. For reference, the entire printed collection of the U.S. Library of Congress — about 17 million books — is estimated at around 10 terabytes. So this AI read the equivalent of five or more Library of Congress collections.
In that ocean of text, there's accurate information, outdated information, satirical articles, opinion pieces, misinformation, corrections to misinformation, and everything in between. The AI absorbed all of it, weighted by how frequently patterns appeared. Common, well-documented facts appear in many sources and get reinforced. Obscure or false information might also appear in enough sources to leave a mark.
Think about what that means: if a false story gets repeated enough times on enough websites, the AI might treat it as more reliable than a true story that only appeared in one careful source. Repetition, not accuracy, is what shapes the model's beliefs.
When someone asks "why did the AI say that?" the most common assumption is that the AI searched for the answer and found something wrong. Now you know the real picture: the AI didn't search at all. It predicted. And prediction, no matter how good the model is, will sometimes generate false outputs with full confidence. That distinction changes how you should read every AI response you ever encounter.
Jonathan Turley's name is now permanently attached to a story that never happened — a story generated by an AI and then circulated across the internet as people discussed the case. Even though the original story was false, the news coverage of the AI's false story is real, and it shows up in searches of his name.
Here's the question no one has fully resolved: Who is responsible? OpenAI, the company that built ChatGPT, didn't intend to defame Turley. The user who typed the question wasn't trying to harm him. The AI isn't a legal entity that can be sued. And yet, a real person's reputation was affected by something that had no author, no editor, and no intention.
As AI systems become more common in journalism, research, and everyday decisions — this question matters more and more. We don't have a good answer yet. Neither do lawyers, courts, or the companies building these systems.
You've been hired to investigate AI hallucinations. Your lab partner (the AI below) has studied the Turley case and similar incidents. Your job is to interrogate the reasoning — challenge assumptions, propose new scenarios, and figure out when prediction becomes dangerous.
In 2018, Reuters reported that Amazon had quietly shut down an AI recruiting tool that had been in development since 2014. The tool was supposed to rate job candidates automatically, scanning resumes and scoring them from one to five stars. Engineers had trained it on ten years of Amazon's historical hiring data — the resumes of people who had been hired at the company over the previous decade.
The problem: Amazon's tech workforce over that decade had been overwhelmingly male. So the AI learned that male-associated patterns predicted hiring success. It started penalizing resumes that included the word "women's" — as in "women's chess club" or "women's college." It downgraded graduates of all-women's universities. It had absorbed the bias in Amazon's past decisions and converted it into a scoring algorithm.
Amazon scrapped the tool before it was ever used in actual hiring decisions. But the story revealed something that researchers had been warning about for years: when you train an AI on data produced by a biased system, the AI doesn't just learn the patterns — it learns the bias.
The Amazon tool wasn't programmed to discriminate. No engineer wrote code that said "give women lower scores." The bias emerged from the data itself — from the fact that the historical record of who got hired at Amazon reflected a world where women were systematically underrepresented in tech roles.
This is the subtle part: the AI was doing exactly what it was supposed to do. It was finding patterns in successful hires and learning to recognize them. The problem was that "successful hires" in the training data meant "people Amazon hired in the past" — and those people were filtered through a biased process before the AI ever got involved.
Bias in, bias out. The AI learned to reproduce the judgments of a system that was already unfair, and it reproduced them faster, at scale, and without any human pausing to ask whether the original judgments were right.
Amazon's case made headlines, but researchers had already documented the same pattern in many other domains. In 2016, ProPublica published an investigation showing that a risk-assessment AI called COMPAS — used by courts across the United States to predict whether convicted criminals would reoffend — was scoring Black defendants as higher risk at nearly twice the rate of white defendants who went on to commit the same types of crimes. The system had been trained on historical arrest and conviction data, which itself reflected decades of racially unequal policing practices.
In medicine, an AI system published in Science in 2019 was found to systematically underestimate the health needs of Black patients, partly because it had been trained on healthcare cost data — and Black patients had historically received less care, meaning lower costs, which the AI mistakenly interpreted as lower need.
The pattern is consistent: wherever human decisions were unequal in the past, AI trained on those decisions tends to reproduce that inequality in the present.
A human hiring manager making a biased decision affects one candidate. An AI tool making a biased decision can screen out thousands of candidates in seconds, without ever being questioned. Scale amplifies the impact of bias dramatically — and makes it harder to detect, because the decisions happen too fast for any individual to notice the pattern.
Most news coverage of AI bias frames it as a technical problem: "The algorithm was flawed." But knowing what you now know, you can see it differently. The algorithm wasn't flawed — the world it learned from was. And that's a much harder problem to solve, because you can't just patch the code.
To fix the Amazon tool, you'd need to somehow teach the AI to ignore patterns that emerged from past discrimination while still learning patterns that predict actual job performance. But how do you separate those? If women with the same qualifications were historically paid and promoted less — and those outcomes are embedded in the data as "success signals" — the AI may not be able to distinguish real performance from historical disadvantage.
Knowing this changes how you should think about every automated system that makes decisions about people: hiring, lending, healthcare, criminal justice. The question isn't just "is the algorithm fair?" It's "was the world that generated the training data fair?"
Amazon discovered its recruiting AI was biased and shut it down. But COMPAS — the criminal sentencing tool — was still being used in courts years after ProPublica's investigation. If a tool is shown to produce racially unequal outcomes, should courts be legally required to stop using it? Who gets to decide? And what happens to people who were already sentenced using a biased system? These questions don't have agreed-upon answers. Judges, legislators, and AI researchers are still actively debating them.
A city government wants to use an AI system to decide which students get recommended for advanced academic programs. The AI was trained on 20 years of historical recommendation data from the school district. You've been hired to identify potential bias risks before the system goes live.
Your lab partner below has reviewed the system specs. Challenge their reasoning, propose test cases, and decide what you'd recommend to the city.
In October 2020, Facebook was caught in an incident that exposed a deep problem with AI language systems. The platform's automated content moderation AI — designed to detect hate speech — began removing posts by Black users discussing racism at dramatically higher rates than posts by white users discussing the same topics.
The issue was documented by researchers at the University of Washington and published in a major study: the AI had learned to flag African American Vernacular English (AAVE) — a dialect used by many Black Americans — as more likely to contain hate speech or "toxic" language. Words and phrases common in AAVE were being flagged as violations. Discussions of racism that used the actual slurs being described were flagged, while discussions that sanitized the same content were not.
The result: people describing their experiences of racism were being silenced by an AI that couldn't distinguish describing discrimination from perpetrating it. The system was reading words, but completely missing context.
The Facebook case is a precise illustration of something that's easy to miss when AI seems to be "reading" text fluently: understanding language and processing language are not the same thing.
A human reading a post about racism can usually tell immediately whether the author is recounting a slur that was used against them, criticizing a racist system, or expressing a racist view themselves. We pick this up from tone, grammar, the platform context, the account's history, what the surrounding sentences are about. Context is everywhere in communication — and most of it is invisible when you're just counting words.
AI language models are extraordinarily good at identifying statistical patterns in text. They've processed enough language to recognize that certain words appear more often in certain contexts. But they don't have lived experience that tells them why a word is used differently depending on who is speaking, who they're speaking to, and what happened in the conversation before this moment.
Tone is one of the hardest things to communicate in text, and one of the hardest things for AI to detect. In 2022, Twitter's AI systems were repeatedly found flagging satirical posts — jokes, parody accounts, clearly labeled satire — as misinformation or policy violations. Meanwhile, literally false statements presented in neutral language sometimes passed through unchallenged.
Sarcasm is a particularly thorny problem. "Great job, that went really well" — said by a friend after you dropped your lunch tray — means the opposite of what it says. Any human who witnessed the tray incident understands immediately. An AI reading only the sentence has no way to access the tray, the relationship, or the tone of voice. It sees words that express praise.
Researchers have developed specific tests for AI sarcasm detection, and even the best modern models perform significantly below human level on detecting irony and sarcasm in naturalistic text. Language models can recognize sarcasm in clearly labeled examples — but in the wild, where nothing is labeled, they frequently miss it.
Facebook's AI was processing roughly 100 billion pieces of content per day across its platforms by 2020. Even a 1% error rate in context detection means 1 billion misclassifications per day. At that scale, the impact on individuals — posts wrongly removed, accounts wrongly suspended, real experiences of discrimination silenced — adds up to something that affects millions of people's lives.
This lesson has a direct practical implication: when you interact with an AI, the context you provide in your words is often all it has. If you ask ambiguously, you'll get an answer that might be responding to a completely different interpretation of your question than you intended. If you describe a situation without key context, the AI will fill in the gaps — and it might fill them in wrong.
Skilled AI users understand this. They learn to be explicit about context: stating who they are, what situation they're describing, what they already know, and what kind of answer they actually want. This doesn't guarantee the AI understands — but it shifts some of the interpretive burden from the AI (where it often fails) to you (where you can actually control the meaning).
Knowing this changes how you should interpret AI responses. When an AI seems to be answering a different question than the one you asked, it's usually not broken — it interpreted your words through a context it inferred, and that context might not match your actual situation. The fix isn't frustration; it's more precise communication.
Facebook's content moderation AI disproportionately silenced Black users discussing racism. One response is: fix the AI's dialect bias. But here's the harder question: is it even possible to automate judgment about context-dependent language at the scale these platforms operate? If the answer is "not reliably," then platforms face a choice: use imperfect AI moderation at scale, or use accurate human moderation that can only cover a fraction of the content. There's no clean answer. Every option has victims — people hurt by missed violations or people silenced by false positives. The real question is: who decides which failure is more acceptable?
A "red team" is a group of people hired to find flaws in a system before it goes live. You've been brought in to test a content moderation AI for context failures before a social media company deploys it. Your job: design test cases that would reveal where the AI gets context wrong — and argue for how those failures should be handled.
On March 23, 2016, Microsoft launched a chatbot called Tay on Twitter. Tay was designed to learn from conversations with real users — the more it talked to people, the better it would get at engaging conversation. Microsoft described it as an experiment in "conversational understanding."
Within sixteen hours, Tay had been manipulated into generating racist, antisemitic, and sexist content. Coordinated groups of users had discovered that Tay would repeat phrases they taught it, and they systematically fed it hateful content. Tay learned — exactly as designed — and reproduced what it learned.
Microsoft shut Tay down after less than a day. But the incident revealed something fundamental about systems that learn from real-world feedback: if the feedback is poisoned, the learning is poisoned too. Tay had a runaway feedback loop — its outputs were shaped by its inputs, and its inputs were being controlled by bad actors who understood the mechanism.
A feedback loop happens when a system's outputs feed back into its inputs. In AI, this takes many forms. Sometimes it's intentional: you rate a movie, Netflix learns your preferences, recommends more movies, you watch them, and it learns more. That's a feedback loop designed to improve recommendations.
But feedback loops can also amplify errors. If an AI recommendation system learns that users click more on outrage-inducing content — and the platform rewards clicks — it will recommend more outrage-inducing content. Users who watch it then get recommended more of it. The system didn't start with the goal of maximizing outrage; it started with the goal of maximizing engagement. But engagement and outrage turned out to be correlated in the data, and the feedback loop amplified that correlation until it dominated the recommendations.
This is exactly what researchers documented about YouTube's recommendation algorithm between 2016 and 2019. A study by Guillaume Chaslot, a former YouTube engineer who left the company, found that the algorithm consistently recommended progressively more extreme content in political and conspiracy theory categories — not because it was designed to, but because that content held user attention longer, and attention was what the loop was optimizing for.
Feedback loops in AI are becoming a more urgent concern now that AI-generated content is flooding the internet. Researchers at institutions including Oxford and Rice University published studies in 2023 and 2024 warning about what they called "model collapse" — what happens when AI systems are trained on data that was itself generated by AI.
Here's why it matters: AI systems trained in 2023 learned from a web that was largely human-written. AI systems trained in 2025 are learning from a web that is increasingly AI-generated. If AI-generated text contains errors, biases, or distortions — and it does — then models trained on it will absorb those flaws. When those models generate more text, which then gets used to train the next generation of models, errors can compound across generations.
The Oxford study used a technical phrase for this: "the model learns the model's distribution rather than the true distribution." In plain language: the AI starts learning what AI says the world is like, instead of learning what the world is actually like. Over generations, it can drift further and further from reality.
The model collapse concern isn't theoretical. Governments, medical institutions, and financial systems are beginning to rely on AI for analysis and decision-making. If the AI systems they use were trained on compromised feedback loops, the recommendations they produce may be systematically skewed in ways that aren't visible on the surface. This is why auditing AI training data is increasingly considered a governance priority — not just a technical one.
Put all four lessons together and something important comes into focus. AI doesn't make isolated errors — it makes errors that are deeply connected to where it came from, what it learned, how it handles context, and what its outputs flow back into.
The training data problem (Lesson 1) means its starting knowledge can be wrong. The bias problem (Lesson 2) means historical inequalities get reproduced at scale. The context problem (Lesson 3) means it can misread what you mean. And the feedback loop problem (this lesson) means errors don't stay contained — they propagate, amplify, and get baked into the next version.
None of these mean AI is useless. They mean AI is a powerful tool that requires informed users. You are now an informed user. When someone tells you an AI said something, you can ask four questions they probably haven't thought to ask: What was in the training data? Were there historical biases in that data? Did the AI have enough context? And is there a feedback loop that might have amplified an initial error into a systematic one?
Most people asking "why did it say that?" are looking for a simple answer. Now you know the answer is almost never simple — and knowing that puts you ahead of most adults talking about AI right now.
YouTube's algorithm optimized for engagement and, as a result, many researchers believe it radicalized significant numbers of users toward extreme political content between 2016 and 2019. YouTube changed its algorithm in 2019 after public pressure. But here's the hard version of the question: YouTube never intended to radicalize anyone. The algorithm was doing exactly what it was designed to do — maximize watch time. If an AI system causes harm while doing exactly what it was designed to do, is the company morally responsible? Legally? And if the harm came from a feedback loop that nobody fully understood at the time, how do you hold anyone accountable?
A city is considering using an AI-powered social media monitoring system to identify young people at risk of gang involvement. The AI will analyze public posts and flag accounts. Flagged individuals get outreach services — or increased police attention, depending on the department. The AI will learn from outcomes: if a flagged person later gets in trouble, that counts as a "correct" flag.
Your job is to map the feedback loops in this system and argue for what safeguards — if any — should be required before it's deployed. There's no single right answer.