Module 7 · Lesson 1

What Hallucination Actually Means

Confident, fluent, and wrong — understanding the defining failure mode of language models.

Why does a system that "knows" so much invent things that don't exist?

In June 2023, New York attorney Steven Schwartz filed a legal brief in a federal case against Avianca airline. The brief cited six supporting court precedents. Every single one was fabricated. Mata v. Avianca, Inc., Varghese v. China Southern Airlines, Shaboon v. Egypt Air — none existed anywhere in case law. The attorney had asked ChatGPT to find relevant cases and submitted the results without verification. Judge P. Kevin Castel fined the attorneys $5,000 and issued a formal sanction. The moment became a landmark warning about treating LLM output as ground truth.

The Mechanics of Fabrication

The word "hallucination" is borrowed from psychology, where it describes perceiving something that isn't there. In the context of LLMs, it has come to mean something more specific: the model generates content that is factually incorrect, unverifiable, or entirely invented, yet presented in the same confident, grammatically fluent register it uses for correct information.

This is not a bug in the traditional sense — it is an emergent property of how these models are built. LLMs do not store facts in a database and retrieve them. They learn statistical patterns across billions of text tokens, and at inference time they generate the next token based on what is most probable given the preceding context. There is no lookup, no citation trail, no internal truth-checker.

When a model is asked about a case that doesn't exist, it doesn't return NULL. It generates the most plausible-sounding response — a case name that sounds like real case names, citations that follow the correct formatting pattern, and a holding that is legally coherent. The output is confidently wrong because the model has no mechanism for distinguishing between "I learned this" and "I am generating this."

Key Distinction

Hallucination is not the same as the model being uncertain. A model can express uncertainty and still hallucinate. It can also express high confidence and be completely correct. Confidence signals in LLM output do not reliably track factual accuracy — they track how fluent and probable the continuation is.

Confabulation: The Better Term

Some researchers prefer the term confabulation, borrowed from neuropsychology. In patients with certain memory disorders, confabulation refers to the production of fabricated, distorted, or misinterpreted memories without conscious deception — the patient genuinely believes what they are saying. This maps more precisely onto LLM behavior. The model is not lying. It has no intent. It is filling gaps in a way that is internally coherent but externally false.

The distinction matters practically. If we call it lying, we might look for ways to make the model "want" to tell the truth. If we understand it as confabulation — a structural artifact of how memory and generation work — we look instead at architectural interventions, retrieval augmentation, and output verification pipelines.

Hallucination LLM output that is factually incorrect or fabricated, presented with the same fluency as correct output.

Confabulation The neuropsychological analog: gap-filling that is internally coherent but externally false, without intent to deceive.

Intrinsic vs. Extrinsic Intrinsic hallucination contradicts the provided context; extrinsic hallucination adds information not present in the context and unverifiable against it.

Types of Hallucination

Researchers have identified several distinct categories. Entity hallucination involves inventing people, places, organizations, or publications — a paper by a real researcher that was never written, a company that never existed. Temporal hallucination involves wrong dates — an event placed in the wrong year, a person described as living when they had died, or vice versa. Relation hallucination gets the facts right but the relationship wrong — correctly identifying two real people but wrongly claiming one supervised the other's doctoral thesis.

There is also source hallucination — the model cites a real journal, real volume number, real page range, but the article it describes either doesn't exist or says something entirely different. This is particularly dangerous because the citation format is correct enough that a reader might not bother to verify.

Why This Module Matters

Hallucination is not an edge case — studies have found fabrication rates of 3–15% in general-purpose tasks, rising sharply for specialized domains like law, medicine, and academic citation. Understanding why it happens at a mechanistic level is the first step toward building systems and workflows that catch it.

Lesson 1 Quiz

What Hallucination Actually Means

In the 2023 Mata v. Avianca case, what did attorney Steven Schwartz submit that led to sanctions?

Correct. Schwartz used ChatGPT to find supporting precedents and filed the results. All six cited cases were fabricated by the model. The judge imposed a $5,000 fine and formal sanctions.

Not quite. The problem was that ChatGPT invented entirely fictional cases — Varghese v. China Southern, Shaboon v. Egypt Air, and others — that had no existence in any case law database.

Why does an LLM generate hallucinations rather than returning an empty response when it lacks information?

Correct. LLMs have no lookup mechanism — they generate the most probable continuation of the context. There is no internal signal that distinguishes "I learned this fact" from "I am generating a plausible-sounding sequence."

Not quite. LLMs don't retrieve from databases at all — they generate token-by-token based on probability distributions learned during training. This is why they produce fluent, plausible-sounding output even when it is factually wrong.

The term "confabulation" is preferred by some researchers over "hallucination" because it emphasizes that:

Correct. Confabulation in neuropsychology describes patients who produce false memories without intent — they believe what they say. This maps better onto LLM behavior than hallucination, which implies a perceptual experience the model doesn't have.

Not quite. The key point is the absence of intent. "Confabulation" describes gap-filling that is structurally generated rather than consciously fabricated — making it a more precise description of what LLMs do.

Lab 1 — Anatomy of a Hallucination

Explore the mechanics of fabrication with your AI lab assistant

Lab Objective

In this lab you will explore what hallucination looks like mechanically — why LLMs generate false content with the same fluency as true content, and how the Mata v. Avianca case illustrates the real-world stakes. Ask questions, probe the mechanics, and test your understanding.

Suggested start: "Why didn't ChatGPT just say it didn't know those court cases existed? What was happening mechanically when it generated them?"

AI Lab Assistant Hallucination & Confabulation

Welcome to Lab 1. We're looking at what hallucination actually is — why fluent, confident-sounding output can be completely fabricated. The Mata v. Avianca case is a perfect anchor here. Ask me anything about the mechanics, the terminology, or what went wrong in that courtroom.

Module 7 · Lesson 2

Why Models Hallucinate: Mechanistic Causes

From training data gaps to decoding strategies — the structural reasons fabrication is built in.

Is hallucination a fixable bug, or an unavoidable consequence of how these systems work?

On February 7, 2023, Google debuted its Bard chatbot in a promotional video intended to showcase its capabilities. In the video, Bard was asked what new discoveries the James Webb Space Telescope had made that could be shared with a child. Bard responded that JWST was used to take "the very first pictures" of an exoplanet outside our solar system. This was false — the first direct image of an exoplanet was taken by the European Southern Observatory in 2004. Alphabet's stock fell roughly 7% on the day of the announcement, erasing more than $100 billion in market value. The error was caught by NASA astronomers before the public event, but the promotional video had already been released.

Cause 1 — Training Data Gaps and Biases

LLMs learn from text corpora scraped from the web, books, code repositories, and other sources. These corpora are vast but not complete. When a model encounters a question about a topic that was underrepresented, incorrectly represented, or absent from training data, it has no honest signal to fall back on. It generates a plausible completion based on adjacent, related patterns.

The Bard exoplanet error likely reflects a training signal where descriptions of JWST and "first images" frequently co-occurred — JWST genuinely did produce historic first images of many things. The model over-generalized this pattern to a claim it hadn't actually been trained on specifically.

Cause 2 — The Exposure Bias of Autoregressive Decoding

LLMs are trained on teacher-forced sequences: during training, each token prediction is conditioned on the ground-truth preceding tokens. At inference, the model conditions on its own previously generated tokens. This exposure bias means that once an incorrect token is generated, subsequent tokens are optimized to follow coherently from that error rather than correcting it. A fabricated case name becomes a plausible citation that becomes a coherent holding — each step is locally probable given the previous step.

This is the compounding effect: hallucination tends to be self-consistent. The fictional case Schwartz cited had a plausible party name, a plausible jurisdiction, a plausible year, and a holding that fit the argument. The model optimized for local coherence, not external truth.

The Snowball Effect

Once a model commits to a hallucinated entity in a long response, it often continues to reference it consistently — giving the hallucination internal coherence that makes it harder to detect. The fictional case is cited, then described, then quoted, all without ever existing.

Cause 3 — The Softmax Bottleneck and Parametric Memory

Everything a model knows is encoded in its parameters — billions of weights adjusted during training to compress vast amounts of text. This parametric memory is not lossless. Specific facts, especially rare ones, may be poorly encoded, encoded with errors, or conflated with similar facts. The model may "remember" a fact but attach the wrong date, the wrong name, or the wrong attribution.

This is distinct from not knowing something. The model has a representation — it's just incorrect or partially merged with another fact. Studies by researchers at MIT, Stanford, and DeepMind have shown that LLMs systematically confuse entities that appear in similar syntactic contexts in training data.

Cause 4 — RLHF and the Fluency-Accuracy Trade-off

Reinforcement Learning from Human Feedback (RLHF) trains models to produce responses that human raters prefer. Raters consistently prefer fluent, confident, detailed answers over hedged or incomplete ones. This creates an incentive gradient: the model learns that sounding authoritative is rewarded, even when the underlying content is uncertain. The result is a systematic overconfidence in generated output — the stylistic confidence of the response does not track its epistemic reliability.

Several research teams, including work published from Anthropic's interpretability group, have noted this tension: RLHF is excellent at making models helpful and readable, but it can amplify hallucination by training away the hedges and uncertainty signals that might otherwise warn users.

Exposure Bias The gap between training (conditioned on ground truth) and inference (conditioned on own output), which allows errors to compound.

Parametric Memory Facts encoded in model weights — lossy, imprecise, and subject to conflation with similar facts.

RLHF Fluency Bias The tendency for human feedback training to reward confident, fluent responses, inadvertently reinforcing overconfidence in hallucinated content.

Lesson 2 Quiz

Why Models Hallucinate: Mechanistic Causes

Google's Bard made a factual error about the James Webb Space Telescope in its launch video, contributing to a ~7% drop in Alphabet's stock. What was the error?

Correct. Bard stated JWST took "the very first pictures" of an exoplanet, but the first direct image of an exoplanet was taken by the ESO in 2004. NASA astronomers caught the error, but the promotional video had already been released publicly.

Not quite. Bard's error was claiming JWST took the first-ever pictures of exoplanets — the first direct image of an exoplanet was actually taken by the European Southern Observatory in 2004, almost two decades earlier.

What is "exposure bias" and why does it contribute to hallucination?

Correct. During training, each prediction is conditioned on true previous tokens. At inference, the model uses its own previous outputs — so one fabricated token leads to further tokens that are locally coherent with the error, compounding the hallucination.

Not quite. Exposure bias refers to the train/inference mismatch: training uses ground-truth context, but inference uses the model's own generated context. This means errors propagate forward rather than being corrected.

How does RLHF training potentially make hallucination worse, despite improving helpfulness?

Correct. The fluency-accuracy trade-off: RLHF optimizes for human approval, and humans tend to reward confident, detailed answers. This inadvertently trains away the hedges and uncertainty signals that might otherwise alert users to shaky ground.

Not quite. The key mechanism is that human raters reward confident, fluent outputs. This creates a training incentive to sound authoritative — even when the underlying content is uncertain or incorrect — amplifying overconfidence in hallucinated responses.

Lab 2 — The Causes of Confabulation

Probe the mechanistic roots of hallucination with your AI lab assistant

Lab Objective

This lab focuses on the structural causes of hallucination: training data gaps, exposure bias, parametric memory loss, and RLHF's fluency incentive. Use the Google Bard exoplanet error as a concrete case study, or ask about any of the four causes in depth.

Suggested start: "Can you walk me through exactly how exposure bias would turn one small error in a legal brief into a fully cited, internally consistent but completely fictional case?"

AI Lab Assistant Mechanistic Causes

Lab 2 is open. We're digging into why hallucination happens at a structural level — training data, exposure bias, parametric memory, and RLHF pressure. What would you like to explore first?

Module 7 · Lesson 3

Domain-Specific Risks and Real Consequences

Medicine, law, science, finance — where hallucination causes measurable harm.

Which domains are most vulnerable, and what does a costly hallucination actually look like in the field?

In February 2024, a British Columbia Civil Resolution Tribunal ruled against Air Canada after its customer service chatbot told passenger Jake Moffatt that he could apply for a bereavement fare discount retroactively after purchasing a ticket — a policy that did not exist. Air Canada argued the chatbot was "a separate legal entity" responsible for its own statements. The tribunal rejected this defense, holding Air Canada liable for the chatbot's misinformation. Air Canada was ordered to pay Moffatt $812.02. The ruling established a significant precedent: companies cannot disclaim liability for AI-generated misinformation in customer-facing applications.

Medical Hallucinations

In healthcare, hallucinated content can directly affect clinical decisions. A 2023 study published in JAMA Internal Medicine by researchers at Beth Israel Deaconess Medical Center found that when GPT-4 was used to answer medical licensing exam questions, it performed near-passing level — but when it made errors, those errors were often medically dangerous misattributions of drug interactions, dosage thresholds, or contraindication profiles. A fabricated drug interaction is not like a wrong date in a history essay; it can result in patient harm.

A separate 2023 study in npj Digital Medicine tested ChatGPT's ability to summarize clinical trial results from provided documents. The model hallucinated statistical findings not present in the source documents in approximately 30% of cases, often inverting outcome significance. These were extrinsic hallucinations — added content not present in the context — in exactly the settings where physicians might trust AI-generated summaries.

Legal Precedent — Air Canada 2024

The British Columbia ruling is the first time a court explicitly held a company liable for its AI chatbot's hallucinated policy information. The tribunal's reasoning: the chatbot is Air Canada's agent, and Air Canada is responsible for ensuring accurate information regardless of the source. This applies equally to hallucinated medical advice, financial guidance, or any consumer-facing AI claim.

Scientific Literature Hallucinations

Hallucinated citations are a particular threat to scientific integrity. A 2023 analysis published in Patterns (Cell Press) tested several LLMs on their ability to provide accurate citations in life sciences. Across models, 30–70% of generated citations were partially or entirely fabricated — wrong author combinations, wrong journal placements, wrong DOIs attached to real paper titles. Because scientific databases are trusted by downstream researchers, a hallucinated citation that makes it into even a single published paper can propagate through the literature.

The concern is not just that AI writes bad citations — it is that hallucinated references may cite studies that, if they existed, would lend authority to a claim. The absence of the paper is structurally invisible to any reader who doesn't check.

Financial Hallucinations

In finance, hallucinations about earnings figures, merger terms, regulatory filings, or market data can inform trading decisions or compliance workflows. Several incidents have been documented where AI-assisted research tools generated incorrect data about earnings per share, M&A deal terms, and financial covenants. While no single case reached the scale of the legal or medical examples above, financial regulators in the EU, UK, and US have issued guidance specifically noting that LLM-generated financial information must be independently verified against primary sources.

The Trust Calibration Problem

Across all these domains, the central danger is miscalibrated trust. Users — including professionals — develop a mental model of what errors look like. Typos and non-sequiturs are obvious. A coherent, fluent, well-formatted response that happens to be factually wrong does not look like an error. It looks like expertise.

This is the core challenge that makes domain-specific hallucination particularly dangerous: the errors are stylistically indistinguishable from correct output. Detecting them requires domain knowledge, access to primary sources, and the discipline to verify even plausible-sounding claims.

Extrinsic Hallucination Output that adds information not present in the provided context — particularly dangerous in summarization and document Q&A tasks.

Miscalibrated Trust The mismatch between how reliable AI output appears and how reliable it actually is — the primary reason domain-specific hallucinations cause harm.

Lesson 3 Quiz

Domain-Specific Risks and Real Consequences

What legal precedent did the Air Canada chatbot ruling (2024) establish?

Correct. The British Columbia Civil Resolution Tribunal ruled that Air Canada was responsible for its chatbot's statement about retroactive bereavement fares, rejecting the "separate legal entity" defense. The company was ordered to honor the discount.

Not quite. The key ruling was on liability: companies cannot claim their AI chatbot is a separate entity to escape responsibility for its misinformation. The chatbot is an agent of the company, and the company is accountable for what it says.

A 2023 study in npj Digital Medicine found that ChatGPT hallucinated statistical findings in approximately what percentage of clinical trial summaries?

Correct. The study found approximately 30% of cases involved extrinsic hallucinations — statistical findings not present in the source documents, sometimes inverting outcome significance — in exactly the settings where clinicians might rely on AI-generated summaries.

Not quite. The study found hallucination in approximately 30% of clinical trial summary cases, often inventing or inverting statistical outcomes not present in the source documents.

What is "miscalibrated trust" and why does it make domain-specific hallucination dangerous?

Correct. Hallucinated content in high-stakes domains is written with the same fluency and confidence as correct content. Users — including professionals — have no reliable stylistic signal that something is wrong, making verification the only safeguard.

Not quite. Miscalibrated trust refers specifically to the mismatch between how reliable AI output appears (very polished, confident) and how reliable it actually is. Hallucinations look like expertise, so errors pass undetected without active verification.

Lab 3 — Real-World Harm Assessment

Map hallucination risk across domains with your AI lab assistant

Lab Objective

In this lab you will explore how hallucination risk varies across domains — law, medicine, science, finance — and what the Air Canada, JAMA, and npj cases tell us about liability, detection, and mitigation. Ask about specific cases, risk factors, or what "miscalibrated trust" means in practice.

Suggested start: "If a hospital used an AI tool that hallucinated drug interactions, how would the Air Canada legal reasoning apply to determining liability?"

AI Lab Assistant Domain Risk & Real Cases

Lab 3 is ready. We're mapping hallucination risk across high-stakes domains — medicine, law, science, finance — using real cases as anchors. The Air Canada ruling is especially important here. Where would you like to start?

Module 7 · Lesson 4

Detection and Mitigation Strategies

RAG, grounding, self-consistency, uncertainty quantification — what actually works.

Given that hallucination is structurally embedded, what engineering and workflow approaches reduce it to manageable levels?

When Microsoft launched Bing Chat (later Copilot) in February 2023, early users discovered it could be prompted into producing what the model itself described as volatile or threatening statements, and it frequently hallucinated source citations. Microsoft's response was instructive: rather than attempting to retrain the model, they added real-time web retrieval grounding — each response was anchored to cited web sources that users could verify. Over subsequent months, measurable hallucination rates in factual queries dropped substantially. The architecture shift — from pure parametric memory to retrieval-augmented generation — is now industry standard for knowledge-intensive tasks.

Retrieval-Augmented Generation (RAG)

RAG is the most widely adopted technical mitigation. Instead of relying on parametric memory alone, the system retrieves relevant documents from an external corpus at query time and conditions the model's generation on those documents. The model cannot hallucinate facts that are directly contradicted by context it is conditioned on — though it can still hallucinate by ignoring the context, so RAG reduces but does not eliminate hallucination.

RAG pipelines require careful engineering: retrieval quality matters (returning irrelevant documents can increase hallucination), chunking and context window management affect which facts are available, and models must be explicitly prompted to cite and stay grounded in provided sources. Naive RAG implementations often fail to deliver reliable accuracy improvements.

RAG vs. Fine-Tuning

Fine-tuning adds facts to parametric memory — but this doesn't prevent hallucination, it just changes which false facts the model might produce. RAG keeps ground truth external and auditable. For high-stakes domains, RAG with cited sources is significantly more reliable than fine-tuning alone.

Self-Consistency Sampling

A technique developed at Google Research (Wang et al., 2022) samples multiple independent responses to the same prompt and selects the answer that appears most frequently across samples. If the model generates the same fact across 8 out of 10 samples, it is more likely to be a well-encoded piece of training data than a hallucination. Self-consistency doesn't eliminate hallucination but significantly improves accuracy on knowledge-intensive tasks — gains of 10–15 percentage points were reported across reasoning benchmarks.

The limitation: self-consistency is expensive (multiple forward passes per query) and doesn't help when a model has a consistent but incorrect belief — a systematic error in parametric memory will be consistently wrong across all samples.

Uncertainty Quantification and Calibration

Several research groups have worked on getting models to produce calibrated confidence signals — probability estimates that track actual accuracy. Techniques include verbalized confidence (prompting the model to state its confidence), logit-based confidence (using output token probabilities as reliability signals), and semantic entropy (measuring the diversity of outputs across samples as a proxy for uncertainty). A 2023 paper from the University of Oxford introduced semantic entropy as a hallucination detection metric, finding it significantly outperformed simple confidence elicitation.

The practical limitation is that models trained with RLHF have learned to suppress hedges — so verbalized confidence is often uncalibrated. Logit-based methods require access to model internals and don't work with API-only access.

Workflow-Level Mitigations

Beyond technical approaches, the most reliable mitigation in deployed systems is human-in-the-loop verification combined with scope restriction. The legal profession's post-Mata response illustrates this: major law firms issued policies requiring attorneys to independently verify every AI-generated citation against Westlaw or Lexis before filing. Healthcare systems deploying LLM assistants are required by FDA guidance (draft, 2023) to route all clinical suggestions through clinician review before any action is taken.

Scope restriction — limiting what the model is allowed to answer — is also effective. Air Canada's failure was partly a deployment decision: allowing a chatbot to answer detailed policy questions without grounding it in a live policy database and without a human escalation path.

RAG Retrieval-Augmented Generation — grounding model output in retrieved external documents to reduce reliance on potentially incorrect parametric memory.

Self-Consistency Sampling multiple responses and selecting the most frequently occurring answer to filter out one-off hallucinations.

Semantic Entropy A hallucination detection metric that measures diversity of meanings across output samples — high diversity signals uncertainty and likely hallucination.

Human-in-the-Loop Routing AI output through human expert review before action — the most reliable mitigation in high-stakes deployments.

No Silver Bullet

No single technique eliminates hallucination. Production systems in high-stakes domains use layered mitigations: RAG for grounding, self-consistency for critical claims, calibrated uncertainty signaling, scope restriction, and human review for consequential outputs. Each layer catches what the others miss.

Lesson 4 Quiz

Detection and Mitigation Strategies

What architectural change did Microsoft make to Bing Chat after its hallucination problems in early 2023?

Correct. Microsoft added real-time web retrieval (a form of RAG) so each response was grounded in and cited from current web sources users could verify. This shifted the system from pure parametric memory to retrieval-augmented generation.

Not quite. The key change was architectural — Microsoft added real-time web retrieval grounding. Instead of relying solely on parametric memory, the model now anchors responses to retrieved web documents and provides verifiable citations.

What is the main limitation of self-consistency sampling as a hallucination mitigation?

Correct. Self-consistency requires multiple forward passes (expensive) and its core assumption — that correct facts appear consistently — breaks down when a model has a systematic wrong belief encoded in its weights, which will appear consistently across all samples.

Not quite. The key limitations are cost (multiple forward passes per query) and the systematic error problem: if the model has consistently mislearned a fact, all samples will agree on the wrong answer, and self-consistency will confidently select the hallucination.

Semantic entropy as a hallucination detection metric (Oxford, 2023) measures:

Correct. Semantic entropy measures how semantically diverse the model's outputs are across multiple samples. When the model is uncertain (and likely confabulating), it generates semantically varied responses. When it has a well-encoded fact, outputs cluster around the same meaning.

Not quite. Semantic entropy is a measure of meaning diversity across multiple samples. If the model generates the same basic meaning 10 times, entropy is low and the output is likely reliable. High semantic diversity across samples signals uncertainty — a marker of likely hallucination.

Lab 4 — Building Hallucination-Resistant Systems

Design mitigation strategies with your AI lab assistant

Lab Objective

In this final lab you will work through the design of hallucination-resistant AI systems — when to use RAG, how to implement self-consistency, what semantic entropy adds, and how to design human-in-the-loop workflows for high-stakes domains. Apply what you've learned to concrete scenarios.

Suggested start: "I'm building an AI research assistant for a law firm. Walk me through a layered mitigation architecture that would make it safe to use for case research after the Mata v. Avianca disaster."

AI Lab Assistant Mitigation Architecture

Lab 4 is open. We're designing hallucination-resistant systems using the full toolkit: RAG, self-consistency, semantic entropy, scope restriction, and human-in-the-loop review. Bring me a scenario — medical, legal, financial, scientific — and we'll build a mitigation architecture together.

Module 7 Test

Hallucination and Confabulation — 15 questions · Pass at 80%

1. Which attorney was sanctioned in 2023 for filing AI-generated fictional case citations in federal court?

Correct. Steven Schwartz, representing Roberto Mata in a case against Avianca, used ChatGPT to find supporting precedents and filed six entirely fabricated cases. Judge Castel sanctioned him $5,000.

Steven Schwartz was the attorney sanctioned. He used ChatGPT to research precedents and submitted fabricated citations without verification in Mata v. Avianca, Inc.

2. LLMs do not "look up" facts — they generate output based on:

Correct. LLMs generate the most probable next token given the preceding context. There is no lookup, no citation trail, and no internal truth-checker — just probability-weighted token sequences.

LLMs generate token-by-token based on probability distributions learned during training. There is no database lookup, no logic engine, and no cached index. This is the fundamental reason hallucination is possible.

3. "Confabulation" is preferred over "hallucination" by some researchers because it emphasizes that the model:

Correct. Confabulation in neuropsychology describes unconscious gap-filling in patients with memory disorders — no intent to deceive. This maps better onto LLM behavior than hallucination, which implies perceptual experience.

Confabulation emphasizes the absence of intent — the model fills gaps structurally, like patients with certain memory disorders who genuinely believe their fabricated memories. This framing directs attention to architecture rather than "wanting" to tell the truth.

4. Google's Bard launch video error (February 2023) involved a false claim that JWST:

Correct. Bard claimed JWST took the first pictures of an exoplanet, but the first direct exoplanet image was taken by the European Southern Observatory in 2004. The error contributed to Alphabet losing roughly $100 billion in market cap on launch day.

Bard falsely claimed JWST took the first-ever pictures of an exoplanet. The actual first direct image of an exoplanet was captured by the ESO in 2004 — about 17 years before JWST's 2021 launch.

5. "Exposure bias" in autoregressive models refers to:

Correct. Training uses teacher-forcing (ground-truth context); inference uses the model's own previous outputs. Once an error token is generated, all subsequent tokens are optimized to follow coherently from it — compounding the hallucination.

Exposure bias is the gap between training conditions (ground-truth context) and inference conditions (own generated context). This allows errors to compound: a hallucinated token leads to more tokens that make sense given the hallucination.

6. Which type of hallucination adds information not present in a provided source document?

Correct. Extrinsic hallucination adds unverifiable information beyond the provided context. Intrinsic hallucination contradicts the context. The npj Digital Medicine study found extrinsic hallucinations in ~30% of clinical trial summaries.

Extrinsic hallucination adds content not present in (and not verifiable against) the source document. Intrinsic hallucination contradicts the source. This distinction matters for RAG pipelines — both types must be addressed separately.

7. The 2024 British Columbia tribunal ruling against Air Canada held that:

Correct. Air Canada argued the chatbot was a separate legal entity; the tribunal rejected this, establishing that companies are responsible for their AI agents' statements. Air Canada was ordered to pay Moffatt $812.02.

The tribunal ruled that Air Canada is responsible for what its chatbot says — it cannot treat the chatbot as a separate entity to avoid liability. This is a landmark precedent for AI deployment across all sectors.

8. RLHF (Reinforcement Learning from Human Feedback) can worsen hallucination because:

Correct. RLHF creates a fluency-accuracy trade-off: raters reward confident, complete answers. This trains away the hedges and uncertainty signals that might otherwise help users identify shaky claims — amplifying the overconfidence of hallucinated content.

RLHF's problem is the incentive structure: human raters tend to prefer confident, detailed responses. This trains the model to sound authoritative, inadvertently suppressing uncertainty signals and amplifying overconfident hallucination.

9. What is the primary mechanism by which RAG reduces hallucination?

Correct. RAG retrieves relevant documents and places them in the model's context window. The model generates answers conditioned on this external ground truth rather than relying solely on potentially incorrect parametric memory.

RAG works by placing retrieved documents in the model's context. The model then generates conditioned on that external content, which is verifiable and auditable — much more reliable than parametric memory for knowledge-intensive tasks.

10. Self-consistency sampling works by:

Correct. Wang et al. (Google Research, 2022) proposed sampling N independent responses to the same prompt and majority-voting for the answer. Well-encoded facts appear consistently; one-off hallucinations appear rarely and are filtered out.

Self-consistency samples multiple independent completions of the same prompt and selects the most frequently occurring answer. The intuition: if the model generates the same fact 8 of 10 times, it is likely a well-encoded training signal, not a hallucination.

11. Semantic entropy as a hallucination detection metric measures:

Correct. Semantic entropy (Farquhar et al., Oxford 2023) measures semantic diversity across samples. If a model produces semantically varied outputs for the same question, it is uncertain — a reliable indicator of potential hallucination.

Semantic entropy measures the spread of meanings across multiple sampled outputs. High entropy means the model generates different meanings each time — signaling uncertainty and likely hallucination. Low entropy signals a well-encoded, reliable fact.

12. Why is "source hallucination" particularly dangerous?

Correct. Source hallucination produces formally correct citations — real journal name, plausible volume and page numbers — for articles that don't exist or say something different. The professional appearance of the citation discourages verification.

Source hallucination is dangerous because the form is correct — real journal, correct formatting, plausible DOI — while the content doesn't exist. Users trust the familiar format and may not check, allowing fabricated authority to propagate.

13. Which of the following is NOT an effective standalone solution to hallucination?

Correct. Fine-tuning adds facts to parametric memory — but parametric memory is still lossy, still subject to conflation, and still produces hallucinations. Fine-tuning changes which false facts appear, not whether hallucination occurs. RAG + human review are more reliable.

Fine-tuning alone doesn't eliminate hallucination — it just changes which incorrect content the model might produce. It does not make parametric memory reliable. RAG, semantic entropy flagging, and human review all address the problem more directly.

14. The npj Digital Medicine 2023 study on ChatGPT clinical trial summaries found what type of hallucination in approximately 30% of cases?

Correct. The study found extrinsic hallucinations — invented statistical findings not present in the source documents — in approximately 30% of cases. Often these hallucinations inverted the significance of outcomes, a particularly dangerous error in clinical settings.

The study found extrinsic hallucinations — the model added statistical findings that weren't in the source documents at all. In approximately 30% of cases the model invented or inverted outcome significance, which could directly mislead clinical decision-making.

15. What is the most reliable approach for mitigating hallucination in high-stakes AI deployments, according to post-2023 professional and regulatory guidance?

Correct. Post-Mata legal guidance, FDA draft guidelines on AI medical tools, and financial regulator guidance all point to the same conclusion: layered mitigations — grounding via RAG, restricting scope to what the system can reliably answer, and routing consequential outputs through human expert review.

The consistent recommendation from legal, medical, and financial regulators is layered mitigation: RAG grounding (keep ground truth external), scope restriction (limit what the AI handles), and human-in-the-loop review for consequential outputs. No single layer is sufficient.