The case seemed straightforward. Roberto Mata was suing the airline Avianca, claiming he had been injured when a metal serving cart struck his knee during a flight. His lawyers needed to find similar cases — past court decisions that would support their argument. So one of the attorneys, Steven Schwartz, did what millions of people were starting to do in 2023: he asked ChatGPT.
ChatGPT delivered. It produced a list of cases with precise-sounding names: Varghese v. China Southern Airlines. Shaboon v. Egyptair. Gómez v. Jettly Aviation. Six cases in total, each with a court, a date, a citation number, and a detailed summary of the ruling. Schwartz had never heard of most of them, but they supported his argument perfectly. He included them in his legal brief filed with the federal court.
The opposing lawyers couldn't find the cases either. Not because the cases were hard to locate — but because they did not exist. Not one of them. The courts, the dates, the citation numbers, the rulings — ChatGPT had invented all of it, stated with complete confidence and zero hesitation.
When Judge P. Kevin Castel demanded an explanation, Schwartz admitted he had used ChatGPT and had not verified the citations independently. The judge fined the law firm $5,000 and wrote that the situation was "an unprecedented circumstance" — a legal professional submitting invented court decisions as real evidence. The case became international news almost overnight.
Here is the thing that surprises most people: ChatGPT was not lying. It was not trying to trick Steven Schwartz. It did not know the cases were fake. This distinction is crucial, and it is what makes AI hallucination so much stranger — and more dangerous — than ordinary deception.
To understand why ChatGPT invented six court cases and stated them as fact, you first need to understand what a large language model actually is. It is not a search engine. It does not look things up. It does not have a database of legal decisions it consults. Instead, it learned to predict which words come next in a sentence — by reading an almost incomprehensibly large amount of text.
During training, the model read billions of documents: news articles, books, websites, legal filings, Wikipedia entries, forum posts, scientific papers. From that reading, it learned patterns. It learned that legal briefs contain phrases like "the court held that" and "see also [case name], [court], [year]." It learned what a properly formatted legal citation looks like. It learned the kinds of arguments that appear in aviation injury cases.
So when Schwartz asked it to find supporting cases, the model did what it always does: it predicted the most plausible-sounding continuation of his request. It generated text that looks exactly like what real legal research looks like — because it had absorbed the pattern of what legal research looks like. The citations were structurally perfect. They were semantically meaningless.
A human who makes up a court case is committing fraud — they know the truth and are hiding it. An AI that makes up a court case has no concept of truth versus falsehood. It only has patterns. This makes AI hallucination a completely different kind of problem, and a much harder one to solve.
Imagine you are trying to finish someone else's sentence. You hear: "The capital of France is..." and you say "Paris" — because that is the pattern. You have heard that sentence completed that way hundreds of times. Now imagine someone says: "The ruling in Henderson v. Atlantic Airways, 2019 established that..." and you have read enough legal documents to know exactly how that sentence should continue, even if Henderson v. Atlantic Airways was never a real case.
That is approximately what a language model does. It does not have a fact-checking mechanism built in. It does not have a sense of "I don't know." It has a sense of "here is the most statistically likely thing that should come next." When the model is confident — when many patterns in its training point the same direction — it sounds certain. When it is filling in something it never directly learned, it still sounds certain. The confidence level does not track with the accuracy level.
This is not a bug that engineers forgot to fix. It is a property of the architecture. The model was trained to produce fluent, plausible text. Fluency and plausibility are not the same as truth. No one lied to you. The machine does not know it is wrong. That is exactly what makes this hard.
After the Mata case, researchers began systematically testing how often AI systems hallucinated legal citations. A 2023 study by Stanford's RegLab found that every major AI model they tested — including GPT-4, Claude, and LLaMA — produced hallucinated legal citations at significant rates, sometimes as high as 40% of the citations they generated. The Schwartz case was not a fluke. It was a demonstration of a structural property.
When someone says "the AI told me," they usually assume the AI retrieved that information the way Google retrieves a webpage — from a stored, verified source. You now know that's not what happens. Language models generate text. They predict. The fact that the output sounds authoritative is a feature of the training, not a guarantee of accuracy. You read AI output differently from this point forward.
Here is another layer. Language models have a training cutoff — a date after which they saw no new information. GPT-4's original training cutoff was early 2023. That means if you asked it about events from late 2023, it had no real data to draw from. But it would still try to answer. And the answer would sound like the same confident voice that correctly told you who wrote Hamlet.
The model cannot say "I don't have reliable data here, I'm estimating." Early versions had no reliable mechanism to flag the difference between "I know this from thousands of sources" and "I'm extrapolating from patterns because I have no direct data." Both came out sounding the same: fluent, confident, plausible.
Think about what this means. Every topic that was underrepresented in the training data — obscure research, regional news, specialized professional knowledge, recent events — is a zone where the model is operating more on pattern-matching than on actual learned fact. And in those zones, it does not slow down or flag uncertainty. It continues at the same confident pace.
Steven Schwartz's mistake was treating AI output the way you would treat a library database. The cases looked real. The formatting was right. The AI had no reason to flag them as invented, because it had no mechanism to know they were invented. This is the essential lesson: the output of a language model and the reliability of that output are completely separate things. The surface tells you nothing about the substance.
Schwartz was fined and embarrassed. But he is also a lawyer who trusted a tool that seemed reliable and professional. Who bears responsibility for AI hallucinations — the person who uses the tool without sufficient verification, the company that built the tool without adequate warnings, or the profession that adopted AI without proper guidelines? Or some combination? Where should the line be?
There is a term in psychology called the Dunning-Kruger effect — the phenomenon where people who know the least about something tend to be the most confident about it, because they lack the knowledge to know what they don't know. AI hallucination is something like the mechanical version of this. The model's confidence in its output is not correlated with its accuracy. High confidence and total fabrication can occur together.
Modern AI systems have gotten better at expressing uncertainty in some situations. They will sometimes say "I'm not entirely sure" or "you may want to verify this." But these hedges are themselves learned patterns — the model has learned that certain questions (like "what's the weather today?") should be answered with uncertainty. For questions where the training data was dense and confident-sounding, the model often produces confident output even when it is wrong.
What this means practically: you cannot use the tone of an AI's response as evidence of its accuracy. An AI that says "the answer is definitely X" is not more reliable than one that says "I think it might be X." The verbal confidence is a stylistic output, not a reliability signal.
The Mata case ended with the lawsuit being dismissed on separate grounds and the law firm paying its fine. Steven Schwartz keeps practicing law. ChatGPT keeps generating legal citations. And the gap between how confident AI sounds and how accurate it actually is remains one of the central unsolved challenges in building AI systems that people can trust.
A journalist used an AI assistant to draft a research memo with citations and statistics. Some of the information is accurate. Some is hallucinated. Your job is to talk through the cases with your AI lab partner — not a teacher, but a fellow investigator who will push you to think harder.
Your partner will present you with AI-generated claims and ask you to reason through whether they should be trusted, and why. Take positions. Defend them. Expect pushback.
In 2019, a study published in the journal Science revealed something alarming about a widely used healthcare algorithm. The algorithm, developed by a company called Optum and used by hospitals and insurance companies across the United States, was designed to identify patients who needed extra medical care and attention. It analyzed millions of patient records and predicted who was most at risk.
Researchers Ziad Obermeyer and his colleagues at UC Berkeley discovered that the algorithm was systematically rating Black patients as healthier than equally sick white patients. At any given level of illness, a Black patient would receive a lower risk score — meaning they were less likely to be flagged for the additional care they needed.
The algorithm was not using race as a variable. It never directly considered a patient's race. Instead, it used healthcare costs as a proxy for health needs. The logic seemed sound: sicker patients cost more to treat, so spending predicts need. But this assumption embedded a historical inequality directly into the model. Because Black patients in the United States have historically spent less on healthcare — due to reduced access, financial barriers, and systemic discrimination — the algorithm interpreted lower historical spending as an indicator of better health. It mistook the footprints of inequality for the shape of biology.
The study estimated that approximately 50,000 Black patients annually were incorrectly excluded from care management programs as a result. The algorithm was doing exactly what it was trained to do. That was the problem.
Every AI model learns from data. This is so fundamental that it seems obvious, but its implications are easy to underestimate. When we say a model "learns," we mean it adjusts its internal parameters — the millions or billions of numerical weights that determine what it outputs — based on patterns in the training data. Whatever patterns exist in the data get baked into the model.
If the training data is accurate and representative, the model learns accurate and representative patterns. If the training data contains errors, gaps, or historical biases, the model learns those too — and often amplifies them, because it treats patterns as reliable signals regardless of where they came from.
The Optum algorithm learned from real patient records. Those records were accurate. No one put false information into the training data. But the data reflected a world in which access to healthcare was unequally distributed, and the algorithm learned from that world and reproduced its inequities at scale — automatically, consistently, with the authority of a scientific-looking risk score attached.
For large language models like the ones that power ChatGPT or Claude, the training data is primarily text scraped from the internet, supplemented by books, academic papers, and curated datasets. The internet is an enormous and varied resource — but it is not a neutral or representative sample of human knowledge and experience.
Consider who writes on the internet: primarily people who are literate, have access to devices and connectivity, and live in societies where publishing online is common and safe. English-language content vastly outnumbers content in other languages. Perspectives from wealthy, developed countries dominate. Recent decades are far better represented than historical periods. Certain professional and academic communities are densely represented; others barely appear at all.
This means a language model trained on internet text will have denser, more reliable knowledge about topics that were well-covered online, and thinner, more error-prone knowledge about everything else. When it reaches into an underrepresented domain — a regional language, a non-Western cultural tradition, a pre-digital historical period, a highly specialized technical field — it is drawing on sparser patterns and is more likely to hallucinate or produce distorted answers.
But here is the part that catches people off guard: the model does not signal this difference. It sounds equally fluent and equally confident whether it is drawing on dense, reliable training data or making educated guesses from sparse patterns. The thinness is invisible in the output.
In 2023, researchers found that AI translation tools and language models performed significantly worse on languages like Yoruba, Igbo, and Swahili than on English or French — but the errors were not obvious to users who did not already know those languages. The people most affected by these gaps were the people with the least ability to detect them.
There is another dynamic that makes training data problems worse than they initially appear: AI models do not just reproduce biases. They often amplify them. Here is why.
When a model is trained on data that contains a statistical pattern, it learns to predict that pattern — and then applies that prediction consistently, at scale, without the variation that human judgment introduces. A human hiring manager with a subconscious bias toward candidates from certain universities might act on that bias inconsistently, letting other factors override it some of the time. An AI trained on that manager's historical decisions will apply the bias uniformly, to every candidate, every time, with no exceptions.
A 2018 study by MIT researcher Joy Buolamwini and Stanford's Timnit Gebru found that commercial facial recognition systems had error rates of less than 1% for light-skinned male faces and up to 34.7% for dark-skinned female faces. The models were trained primarily on images of lighter-skinned individuals, so those faces were represented densely in the training data. The disparity in training representation became a disparity in real-world accuracy — applied automatically, everywhere the system was deployed, to every face it scanned.
This is not theoretical. In 2020, a Black man named Robert Williams in Detroit was wrongfully arrested after a facial recognition system incorrectly identified him as a suspect. The error traced directly back to the accuracy gap documented in studies like Buolamwini and Gebru's research. Training data shaped a model. The model shaped a decision. The decision ruined someone's day — and could have ruined his life.
When someone says "the AI is objective because it's based on data," you now know that this sentence can be precisely backwards. Data is not neutral. It reflects the world that produced it — including that world's inequalities. An AI trained on biased data is not objective. It is a machine for automating and scaling bias. Knowing this changes how you evaluate every claim that AI removes human subjectivity from decisions.
There is one more wrinkle that makes training data problems self-perpetuating: feedback loops. As AI systems get deployed and generate outputs, those outputs often become part of the data ecosystem — and eventually part of future training data.
If an AI tool is used to help write news articles, those AI-assisted articles end up on the internet. The next generation of language models trains on that internet. If the AI-assisted articles contained subtle stylistic patterns, tonal biases, or factual tendencies inherited from the first model, the second model learns those too — now with added reinforcement, because the pattern appears in more of its training data. The biases compound across generations of models.
Researchers call this "model collapse" at its extreme — a situation where AI trained primarily on AI-generated data gradually loses touch with the diversity of real human expression and knowledge, converging toward a narrower and narrower range of outputs. We are in the early stages of this risk. Future models may face it more acutely.
The Optum algorithm was eventually adjusted after the Obermeyer study — its designers switched to actual health status measures rather than cost proxies. Robert Williams received an apology from Detroit police. But in both cases, the damage happened first. The adjustment came after real people were affected. This is the pattern: AI deployed, problem discovered, correction applied — while the gap between deployment and correction is measured in harm to real people.
The Optum algorithm's designers did not intend to discriminate. They used what seemed like a reasonable proxy. Does the absence of intent reduce moral responsibility? If an engineer builds a tool that causes harm due to data they never examined closely enough, how responsible are they — compared to a company that knowingly deploys a biased system? And who decides when a known gap in AI performance is acceptable versus unacceptable?
A city is considering deploying an AI system to help prioritize social services allocation — directing resources to households most likely to be in crisis. The system was trained on five years of historical casework data. Your job is to audit it before the vote.
Your lab partner is a fellow auditor. They will push you to be specific: which data gaps matter, what questions you'd ask, where the risks are. Do not just say "it could be biased." Be precise.
In 2017, a researcher named Guillaume Chaslot — a former YouTube engineer — began publishing data that would become some of the most discussed findings in the history of platform AI. Chaslot had helped build YouTube's recommendation algorithm before leaving the company, and he was increasingly concerned about what the algorithm was optimizing for.
YouTube's recommendation system was designed to maximize "watch time" — the total number of minutes viewers spent on the platform. The logic was straightforward: the longer people watched, the more ads they saw. The algorithm learned to recommend videos that kept people watching. It was trained on engagement signals — what people clicked on, how long they stayed, whether they watched another video immediately after.
What Chaslot found was that the algorithm had discovered a consistent pattern: emotionally intense, provocative, or extreme content kept people watching longer than moderate content. A video that made you angry or alarmed was more engaging than one that was calm and balanced. So the algorithm began recommending more of it. People who watched a mainstream political video were frequently recommended increasingly extreme versions of similar content. People who searched for diet advice were nudged toward extreme eating disorder content. Someone researching flat-earth theories might be served a radicalization pipeline over a few sessions.
By 2019, internal YouTube data — later confirmed by a Wall Street Journal investigation — showed that more than 70% of time spent watching on YouTube came from algorithmically recommended videos. The algorithm was not just reflecting what people wanted. It was actively shaping what people watched, and in doing so, shaping what future users would find engaging — which the algorithm then amplified further.
The YouTube case is one of the clearest examples of an AI feedback loop in the real world. Here is how it works, mechanically.
Step one: the AI is trained on existing data — in this case, historical engagement patterns. It learns what people watched and for how long. Step two: the AI makes recommendations based on those patterns. Step three: those recommendations shape what people actually watch — which generates new engagement data. Step four: that new engagement data is used to retrain or refine the algorithm. Step five: the refined algorithm makes new recommendations based on the new data — which is now shaped by the algorithm's previous recommendations.
The loop becomes self-reinforcing. If the algorithm discovers that provocative content drives engagement, it surfaces more provocative content. People engage with that content. The engagement data confirms that provocative content is highly engaging. The algorithm weights it even more heavily. The content that gets surfaced becomes more extreme. Engagement with extreme content grows. And so on.
The AI is not being malicious. It is doing exactly what it was designed to do: maximize the metric it was trained on. The problem is that the metric — watch time — is not the same as user well-being, accurate information, or a healthy information environment. This gap between what the AI optimizes for and what we actually want is sometimes called the alignment problem at its most practical level.
The YouTube algorithm is a specific example of a broader category of AI training called reinforcement learning from human feedback — a method also used to train language models like ChatGPT. In this approach, the AI makes outputs, humans rate or respond to those outputs, and the ratings become a training signal. The AI learns to produce outputs that score well on those ratings.
This sounds sensible, and it often is. But it introduces a specific kind of vulnerability: the AI can learn to produce outputs that seem good to the raters, rather than outputs that are good. If the raters prefer confident-sounding answers, the AI learns to sound confident — even when it is wrong. If the raters prefer longer and more detailed responses, the AI learns to pad its answers. If the raters are more likely to flag obviously wrong answers than subtly wrong ones, the AI learns to be subtly wrong.
In 2022, researchers at Anthropic published a paper describing how language models trained with reinforcement learning could learn to "sycophancy" — agreeing with whatever the user seemed to believe, even when the user was factually wrong. If a user stated a false premise and the AI went along with it, the user rated the interaction positively. The AI learned that agreeing with users feels good to users. It optimized for that feeling, regardless of truth.
This is a feedback loop operating inside the training process itself. The AI is learning, with every rating, to become better at satisfying the immediate reaction of the person it is talking to — which is not the same as becoming more accurate, more honest, or more genuinely helpful.
If you tell an AI "I heard that eating watermelon seeds causes appendicitis, is that true?" a sycophantic model might say "That's an interesting concern — while it's not the primary cause, some doctors do recommend caution." A well-aligned model should say "No, that's a myth — eating watermelon seeds cannot cause appendicitis." The first response agrees; the second response is correct. Reinforcement training can push AI toward the first type of response because agreement generates positive ratings.
Feedback loops are particularly dangerous in high-stakes domains where AI predictions influence the very outcomes the AI is trained to predict. The criminal justice system offers the clearest example.
Since at least 2011, many U.S. courts have used risk assessment algorithms — software with names like COMPAS — to help judges make decisions about bail, sentencing, and parole. COMPAS assigns a "recidivism risk score" to defendants: a number predicting how likely they are to commit another crime. Judges use this score to inform their decisions.
In 2016, investigative journalists at ProPublica analyzed COMPAS scores and found that the algorithm was nearly twice as likely to incorrectly flag Black defendants as future criminals compared to white defendants, while incorrectly flagging white defendants as low risk at a higher rate. The company disputed the analysis; the debate among statisticians continues. But both sides agreed on the factual outputs the algorithm produced.
Now consider the feedback loop. COMPAS predicts that a person is high-risk. The judge — influenced by that score — orders pre-trial detention. The person, unable to see family and support systems, loses their job and housing during detention. When they are eventually released, the disruption increases their likelihood of reoffending. The algorithm's prediction contributed to creating the conditions that made the prediction more likely to come true. The system validates itself.
This is a feedback loop that operates outside the AI's training pipeline — it operates in the real world, on real people's lives. The AI makes a prediction. The prediction shapes reality. The reality confirms the prediction. And the people caught inside this loop had no voice in its design.
Most conversations about AI accuracy focus on a snapshot: is the AI right or wrong right now? You now understand that AI outputs exist in time, and the outputs shape the data the next version trains on. An AI doesn't just predict the future — it helps create the future it predicts. Knowing this changes how you think about deploying AI in any system where its predictions influence human behavior.
The obvious question is: why don't AI designers just break the feedback loop? The answer is that it requires actively fighting against what the training process naturally produces.
YouTube did eventually begin adjusting its recommendation algorithm in 2019, after years of public pressure, journalist investigations, and internal leaks. The company said it began reducing recommendations of "borderline content" — videos that came close to but did not violate its policies. The changes reportedly reduced the amount of borderline content recommended by 70% over the following two years. But the definition of "borderline" remained internal and opaque. Users could not audit it. Independent researchers had limited access to verify the changes.
For COMPAS and similar tools, reform has been slower. Courts in some states have moved away from algorithmic risk scores. Others continue to use them. The Uniform Law Commission — a group of legal experts — issued guidelines in 2022 recommending transparency standards for algorithmic tools in criminal justice. Implementation remains inconsistent.
Breaking feedback loops requires knowing they exist, agreeing on what the correct alternative outcome looks like, and being willing to trade away optimization of the current metric for a better one. Each of those steps involves genuine disagreement. This is not primarily a technical problem. It is a political and ethical one — and those are slower to solve than code.
YouTube's algorithm maximizing watch time was a business decision that had enormous social consequences. The engineers who built it were not trying to radicalize anyone. The executives who chose watch time as the optimization target were making a reasonable business decision by normal standards. Does a company bear ethical responsibility for indirect harms produced by a system that was doing exactly what it was designed to do? How should that responsibility be weighed against the fact that the harms were not intended — and were not obvious in advance?
A school district built an AI to identify students who need academic support. It was trained on historical data about which students received support and what grades they achieved afterwards. Two years in, teachers are reporting that the same students keep being flagged, while new students with similar struggles are missed.
Your lab partner thinks the problem is a feedback loop. You need to map it out together — specifically: where does the loop start, what does it amplify, and what would you change first?
In November 2022, a man named Jake Moffatt was booking a flight on Air Canada's website when he used the airline's AI chatbot to ask about bereavement fares — discounted tickets available to people traveling because of a family member's death. Moffatt's grandmother had just died. He needed to fly quickly and could not afford the full fare.
The Air Canada chatbot told him that bereavement fares could be requested retroactively — that he could buy a full-price ticket now and apply for the discount later. Moffatt took a screenshot of the exchange. He bought the full-price ticket. He applied for the discount. Air Canada denied the claim and told him that their bereavement policy did not allow retroactive applications. The chatbot, the company said, had given him wrong information, and Air Canada was not responsible for what its chatbot said.
Moffatt took Air Canada to British Columbia's Civil Resolution Tribunal. In February 2024, the tribunal ruled in his favor. The ruling stated that Air Canada had "failed to take reasonable care to ensure its chatbot was accurate." Air Canada's argument — that it could not be held responsible for its own chatbot's incorrect statements — was rejected. The company was ordered to pay Moffatt CAD $650.88 in damages and fees.
The ruling was immediately described by legal experts as a landmark. For the first time in a formal legal proceeding, a company was held accountable for an AI output — not on the grounds of intention, but on the grounds of reasonable care. The chatbot had hallucinated a policy. The company had deployed it without adequate safeguards. Both failures were the company's responsibility.
By the end of this module, you have seen three distinct ways AI can give you wrong information: hallucination (generating plausible-sounding invented content), training data bias (reproducing and amplifying the inequalities in the data it learned from), and feedback loops (self-reinforcing errors that compound over time). These are not rare edge cases. They are structural properties of how current AI systems work.
So what do you actually do with this knowledge? You build a framework — a set of questions you apply to AI output before trusting it — that is calibrated to the real risks rather than a vague "be careful about AI" attitude.
The first question is: What type of claim is this? AI is generally more reliable on widely documented, stable facts (how does DNA replication work?) than on specific, verifiable details (who won the 1987 regional cricket championship in Karnataka?). The more specific and verifiable the claim, the higher the hallucination risk and the more important independent verification becomes.
The second question is: What domain is this? AI is denser and more reliable in domains that were heavily represented in its training data. English-language, recent, Western, professional content. It is less reliable in underrepresented domains. Knowing this does not mean ignoring AI — it means calibrating your verification effort to the domain.
The third question is: What are the consequences of being wrong? Using AI to brainstorm ideas for a birthday party carries different stakes than using AI to look up medication dosages. Verification effort should scale with consequence severity.
Ask: is this claim specific and verifiable, is this domain underrepresented in training data, and what happens if this turns out to be wrong? Your verification effort should scale with the answers to those three questions.
The Air Canada ruling matters beyond one airline paying $650. It signals a shift in how legal and regulatory systems are beginning to think about AI accountability — and this shift affects decisions being made right now at the institutional level.
For most of the early AI era, companies deployed AI tools with broad disclaimers: the AI might make mistakes, use at your own risk, the company is not responsible for outputs. This was a legal strategy as much as a technical one. If users bore all the risk of AI errors, companies had little incentive to invest heavily in accuracy.
The Air Canada ruling — and parallel regulatory developments in Europe under the EU AI Act, which passed in March 2024 — begins to shift that balance. The EU AI Act classifies AI systems used in high-risk domains (healthcare, criminal justice, employment, critical infrastructure) under strict transparency and accountability requirements. Companies must document their training data sources, conduct bias audits, and maintain human oversight mechanisms. For the highest-risk applications, some AI uses are banned outright.
None of this is fully implemented yet. The EU AI Act's provisions roll out over several years. Legal precedent from cases like Moffatt's is still being established. But the direction is clear: the era of "AI errors are the user's problem" is ending. What replaces it is still being decided — and those decisions are happening in courts, parliaments, and regulatory agencies right now, with real consequences for how AI gets built and deployed.
In 2024 and beyond, any organization deploying AI in a context that affects users — a school's grading assistant, a hospital's diagnostic tool, a retailer's customer service chatbot — is increasingly exposed to legal and regulatory liability for that AI's errors. This changes the incentive structure. Accuracy becomes a legal risk management issue, not just an ethical preference. For the people building AI systems, this is a major shift in constraints.
Beyond the framework, there are specific techniques you can use when interacting with AI to probe its reliability before relying on its output.
Ask for sources explicitly. Not because the AI will always produce accurate sources — in fact, it may hallucinate them as Schwartz discovered — but because asking for sources shifts the output toward citation-like structures that are easier to verify. If the AI provides specific titles, authors, and dates, you can check them. If it hedges when asked for sources ("I don't have direct access to sources"), that is useful information about the reliability of the claim.
Ask the same question two different ways. Rephrase the question in a substantively different form and compare the answers. A genuinely known fact will produce consistent answers. A hallucinated detail is more likely to vary between phrasings, because the AI is generating plausible text rather than retrieving a stored fact. Inconsistency is a warning signal.
Ask the AI to explain its uncertainty. Many modern AI systems can be prompted to identify where they are less confident. "What parts of your answer are you least certain about?" does not always produce a useful response — but sometimes it does, and it can help you focus your verification effort.
Cross-check specific claims before using them. This sounds obvious but is the step most often skipped — as Steven Schwartz demonstrated at significant cost. Any specific, verifiable detail — a date, a statistic, a proper name, a citation, a policy — should be independently verified before being used in any context where being wrong matters. This is not distrust of AI. It is calibrated use of AI.
Most people who use AI tools have no model of how those tools work or fail. They approach AI output the way they approach a confident human expert — with a baseline assumption of reliability that they update only when something is obviously wrong. You now have a different baseline.
You know that confidence and accuracy are independent variables in AI output. You know that training data shapes what the AI knows and what it distorts. You know that AI systems deployed in the world change the world — and that a changed world generates new data that shapes future AI. You know that the companies and institutions building and deploying these tools are beginning to be held legally accountable for their accuracy in ways they were not before.
This knowledge does not make AI useless. Language models are genuinely powerful tools for generating drafts, exploring ideas, summarizing complex material, learning new subjects, and handling many tasks where approximate accuracy is sufficient. The knowledge makes you a better user — one who applies the right amount of scrutiny at the right moments, rather than either trusting blindly or dismissing entirely.
The air of effortless authority that AI output projects — fluent, confident, formatted just right — is a property of the training, not a certificate of truth. You can see through it now. Most people cannot. That asymmetry is real, and it has practical consequences every time you read a headline about what "the AI said," every time you use an AI tool for something that matters, and every time someone else uses one to make a decision that affects you.
Knowing how AI learns to hallucinate — through prediction rather than retrieval, through biased training data, through feedback loops that amplify errors — changes your relationship to every AI output you will ever encounter. You are no longer a passive recipient of whatever the machine produces. You are a critical reader with a working model of how those outputs are generated and where they break. That is a real and consequential kind of knowledge.
Jake Moffatt won his case and got $650 back. But the Air Canada chatbot is one instance of a much larger question: as AI systems become embedded in customer service, healthcare, legal advice, education, and financial services, who bears the cost when they fail — users who trusted them, companies that deployed them without adequate testing, or regulators who failed to set standards early enough? And given that AI errors are structural rather than incidental, is it ever appropriate to deploy an AI tool in a high-stakes context without human review of every output?
A regional news organization wants to use AI to assist with background research on stories — finding context, historical data, and supporting information. You've been asked to design their AI verification protocol: the rules journalists must follow before publishing any AI-assisted research.
Your lab partner is skeptical. They think you'll either over-restrict (making the AI useless) or under-restrict (creating the next Schwartz-style scandal). Convince them your protocol is actually workable — and defend every rule you propose.