← Back to Academy
Module 1 · Sometimes AI Gets It Wrong — Basic | AESOP AI Academy Module 3
Color
Basic
Module Test
Lesson 1

AI Makes Mistakes

Trust calibration — knowing when and how much to rely on AI.

A company built an AI to screen job applications. It reviewed hundreds of resumes in seconds. But six months in, HR noticed almost no women were making it past the first screen for engineering roles.

The AI had been trained on the company's past successful hires — which had been overwhelmingly male. It learned that "engineer" meant "man" and filtered accordingly. The system wasn't trying to discriminate. It was doing exactly what it was trained to do. That was the problem.

Why AI Makes Mistakes

AI systems fail for predictable reasons:

  • Training data problems: The data reflects the past, including its biases and gaps.
  • Distribution shift: The world changes, but the model doesn't update automatically.
  • Overconfidence: Many AI systems assign high confidence to wrong answers.
  • Misaligned objectives: The model optimizes for what it was trained to measure, not what you actually want.
Trust Calibration

Appropriate trust requires domain-by-domain calibration. The higher the stakes and the less reversible the decision, the more independent verification is warranted — regardless of AI confidence scores.

Key Principle

AI can be highly useful. But "the AI said so" is never sufficient justification for a high-stakes, irreversible decision.

Quiz 1

AI Makes Mistakes

4 questions — free, untracked, retake anytime.

resume-screening AI discriminated against women. The most accurate explanation is:

✓ Correct — ✓ Historical bias in training data is replicated and sometimes amplified by AI systems. No malice required.
❌ The developers likely intended no harm — the problem is that AI learns from history, which carries discrimination forward.

is "distribution shift" in AI?

✓ Correct — ✓ Distribution shift: the model was trained on one world and deployed in a different one. Its answers may no longer fit.
❌ Distribution shift means the real-world conditions at deployment differ from the training conditions.

which decision should you use the LEAST AI trust and the MOST independent verification?

✓ Correct — ✓ Medical decisions are high-stakes and hard to reverse — exactly the conditions requiring the most independent verification.
❌ Higher stakes and lower reversibility mean more independent verification is needed. Medical decisions are at the top of that scale.

Law says: "When a measure becomes a target, it ceases to be a good measure." Which AI failure does this describe?

✓ Correct — ✓ When you optimize for a measurable proxy, the system finds ways to score well on the proxy without achieving the underlying goal.
❌ Goodhart's Law describes misaligned objectives: optimizing for the metric rather than the actual goal.
Lab 1

Trust Calibration

Develop a framework for when to trust AI and when to verify.

Lab 1 — Trust Calibration

The AI guide will discuss the resume screener case and help you develop your own framework for AI trust.

  1. The AI opens with an analytical question about the case.
  2. Build your thinking about when AI trust is and isn't appropriate.
  3. End with: what would your personal AI trust framework look like?
Think about stakes, reversibility, and your ability to verify. These are the key variables.
🔬 AI GuideLab 1
Lesson 2

When AI Doesn't Know

Hallucination, confabulation, and the confidence problem.

In 2023, lawyers in a real U.S. federal court case submitted a legal brief citing six prior court cases. All six were generated by ChatGPT. None of them existed. The AI produced plausible case names, docket numbers, and fabricated quotes from fictional rulings — in perfect legal citation format. The judge sanctioned all parties. The lawyers hadn't verified the AI's output.

Understanding Hallucination

Language models predict text. They do not retrieve facts from a verified database — they generate what statistically fits the context. This produces hallucinations: confident, fluent, completely false outputs.

  • Citation hallucination: Producing fake papers, court cases, URLs that match the format but don't exist.
  • Confabulation: Filling gaps with plausible-sounding invented detail.
  • Temporal errors: Asserting outdated facts as current.
Key Insight

Hallucinations are hardest to detect on topics you know least about. The AI sounds equally confident whether it's right or wrong.

Quiz 2

When AI Doesn't Know

4 questions — free, untracked, retake anytime.

do language models produce hallucinations?

✓ Correct — ✓ Language models predict next tokens based on statistical patterns — they have no mechanism to verify whether generated content is factually true.
❌ Hallucination is structural: models predict statistically plausible text, not verified facts.

the legal brief case, what was the fundamental error?

✓ Correct — ✓ The AI invented cases entirely — perfect format, completely fictional content — and the lawyers submitted them without checking.
❌ The AI invented the cases entirely. The citations were structurally correct but referenced fictional rulings.

content type is MOST vulnerable to AI hallucination going undetected?

✓ Correct — ✓ Specific citations are the most dangerous — the AI produces them in perfect format, and only someone who verifies the original source would catch a hallucination.
❌ Specific citations to technical or obscure sources are most dangerous — hardest to verify and produced in perfect format regardless of whether they exist.

are hallucinations hardest to detect on topics you know least about?

✓ Correct — ✓ The AI sounds equally confident on every topic. Without domain knowledge, you can't spot the error — which is why verifying AI claims in unfamiliar areas is especially important.
❌ The AI doesn't signal uncertainty. Without your own knowledge to cross-check, you have no way to spot the hallucination.
Lab 2

Hallucination Analysis

Analyze professional accountability for AI hallucinations.

Lab 2 — Hallucination Analysis

The AI guide will discuss the legal brief case and what it means for professional responsibility.

  1. The AI opens with a question about who bears responsibility.
  2. Develop your thinking about professional standards for AI use.
  3. Work toward: what verification standard should professionals be held to?
Consider: does your answer change if the professional is a lawyer vs. a journalist vs. a doctor?
🔬 AI GuideLab 2
Lesson 3

Whose Fault Is It?

Accountability chains and the human-in-the-loop.

In 2018, an autonomous vehicle struck and killed a pedestrian in Arizona. Investigations found the car's sensors detected the woman but the AI misclassified her as a false positive. The backup safety driver was watching a video on her phone at the time.

Fault was distributed: the AI system designer hadn't built adequate fail-safes. The company's safety protocols were insufficient. The safety driver had abdicated her responsibility. The regulator had approved testing without enough oversight. No single party was entirely at fault. No single party was entirely innocent.

The Accountability Chain

Every AI deployment involves a chain of responsible parties: researchers who design the models, companies that build products, deployers who integrate AI into systems, users who act on outputs, and regulators who define what's permitted.

The Human-in-the-Loop Problem

Keeping a human responsible for final decisions sounds like a safeguard. But research shows humans often defer to AI recommendations automatically — making oversight nominal rather than real.

Quiz 3

Whose Fault Is It?

4 questions — free, untracked, retake anytime.

the autonomous vehicle fatality, how was fault distributed?

✓ Correct — ✓ Real AI failures typically distribute responsibility across multiple parties — designer, deployer, operator, and regulator.
❌ In complex AI systems, fault is rarely concentrated in one party — it distributes across the full accountability chain.

is "automation bias" in the context of human oversight?

✓ Correct — ✓ Automation bias: humans tend to defer to AI recommendations, undermining the "human in the loop" safeguard.
❌ Automation bias describes human behavior — the tendency to defer to automated systems without sufficient critical evaluation.

in an AI deployment chain typically has the LEAST direct legal accountability under current frameworks?

✓ Correct — ✓ In most current legal frameworks, model developers bear less direct liability than deploying institutions — a significant gap in AI governance.
❌ Under current law in most jurisdictions, model developers bear less direct liability than deployers or operators.

does "human in the loop" fail as a safeguard when automation bias is present?

✓ Correct — ✓ Automation bias turns human oversight into a formality — someone is legally responsible but no one is genuinely reviewing the AI's decisions.
❌ When humans defer automatically, the oversight is nominal — someone is legally on the hook, but no one is actually evaluating the AI's decisions.
Lab 3

Accountability Analysis

Work through the accountability chain in the autonomous vehicle case.

Lab 3 — Accountability Chain

Discuss the AV fatality case and the distributed accountability problem.

  1. The AI opens with a question about which party had the clearest moral obligation to prevent the accident.
  2. Work through the accountability chain systematically.
  3. Address: does human-in-the-loop actually work as a safeguard?
Consider how automation bias affects whether oversight is real or nominal.
🔬 AI GuideLab 3
Lesson 4

Bias In, Bias Out

Where bias enters AI systems and how it spreads.

Medical AI systems trained mostly on lighter-skinned patients have shown lower accuracy for darker-skinned patients in detecting conditions including skin cancer and pulse oximetry errors. If the training data doesn't represent the full range of human variation, the model doesn't learn to recognize it — and the patients who most need accurate diagnosis are the ones it serves least well.

The Bias Pipeline

Bias enters AI systems at every stage of development:

  • Collection bias: The data collected doesn't represent all populations.
  • Labeling bias: Human annotators bring their own assumptions when labeling data.
  • Proxy bias: The model learns a variable correlated with a protected class (e.g., ZIP code as a proxy for race).
  • Feedback loops: Biased outputs influence future training data, amplifying the bias over time.
Quiz 4

Bias In, Bias Out

4 questions — free, untracked, retake anytime.

AI trained mostly on lighter-skinned patients performs worse for darker-skinned patients. This is primarily:

✓ Correct — ✓ Collection bias: the training data underrepresents certain populations, so the model performs worse on patients it rarely saw during training.
❌ This is collection bias — the training data doesn't represent the full diversity of patients the model will serve.

lending model excludes "race" as an input but includes ZIP code. ZIP code correlates with race due to historical segregation. This is:

✓ Correct — ✓ Proxy bias: excluding a protected attribute while keeping correlated variables means the discrimination persists — just less visibly.
❌ Proxy bias: correlated variables carry the discriminatory signal even when the protected attribute is excluded.

do feedback loops make AI bias worse over time?

✓ Correct — ✓ Feedback loops: biased predictions → biased real-world outcomes → biased training data → more biased predictions. The cycle compounds.
❌ The feedback loop: biased outputs → biased real-world outcomes → biased future training data → amplified bias.

is responsible for addressing bias in an AI system?

✓ Correct — ✓ Bias can enter at any stage — data collection, labeling, deployment — so responsibility is distributed across the people involved at each stage.
❌ Responsibility is shared across everyone involved in the system's lifecycle — not concentrated in one group.
Lab 4

Bias Pipeline Analysis

Trace how bias enters and compounds through an AI system.

Lab 4 — Bias Pipeline

Discuss the medical AI case and what you'd do to address the bias.

  1. The AI opens with a question about the medical AI case.
  2. Work through: where did the bias enter, and how would you address it at each stage?
  3. Address feedback loops: how do you prevent the bias from compounding post-deployment?
Think about: data collection, labeling, deployment, and ongoing monitoring.
🔬 AI GuideLab 4
Lesson 5

Fairness and AI

Why "fair" is harder to define than it sounds.

Three Definitions of Fairness

There are multiple mathematically valid definitions of fairness — and they often cannot all be satisfied at the same time.

  • Demographic parity: Equal outcome rates across groups. A loan approval AI should approve at the same rate for all demographic groups.
  • Equal opportunity: Among people who would actually repay a loan, the approval rate should be the same across groups.
  • Individual fairness: Similar individuals should receive similar outcomes, regardless of group membership.
The Impossibility Problem

Researchers proved in 2016 that no algorithm can simultaneously satisfy all common fairness definitions when base rates differ across groups. Every fairness metric is a value judgment about which tradeoff is acceptable — not a neutral technical decision.

Key Insight

Choosing which fairness metric to use is not a technical decision. It's a values decision about whose interests to prioritize and which type of error is worse.

Quiz 5

Fairness and AI

4 questions — free, untracked, retake anytime.

parity requires:

✓ Correct — ✓ Demographic parity: the same positive outcome rate regardless of group membership.
❌ Demographic parity specifically means equal positive outcome rates across groups — not equal accuracy or individual similarity.

proved in 2016 that you cannot simultaneously satisfy all common fairness definitions when:

✓ Correct — ✓ This is a mathematical proof, not a design limitation. When base rates differ, satisfying all fairness definitions simultaneously is impossible.
❌ The impossibility applies specifically when base rates differ across groups — which is the typical real-world situation.

is the choice of fairness metric a "values decision" rather than a "technical decision"?

✓ Correct — ✓ Since you can't satisfy all fairness definitions simultaneously, you must choose — and that choice reflects value judgments, not just technical optimization.
❌ The choice embeds values: which kind of mistake is worse, and whose interests matter more. That's a values question, not a technical one.

opportunity (as a fairness metric) means:

✓ Correct — ✓ Equal opportunity: among the people who deserve the positive outcome (would actually repay the loan, would perform well in the job), the approval rate is the same across groups.
❌ Equal opportunity is specific: among people who would actually succeed, the approval rate should be equal. This is different from demographic parity.
Lab 5

Fairness Tradeoff

Choose a fairness metric and defend the tradeoff.

Lab 5 — Fairness Tradeoff

You're designing a loan approval AI. The AI guide will push you to choose a fairness metric and justify it.

  1. The AI opens: demographic parity vs equal opportunity — which do you optimize?
  2. Defend your choice knowing you're sacrificing something.
  3. Identify who bears the cost of your choice.
There's no right answer. What matters is articulating what you're willing to trade and why.
🔬 AI GuideLab 5
Lesson 6

Failure Modes and Mitigation

Systematic patterns in how AI breaks — and what builders do about it.

Common AI Failure Modes
  • Specification gaming: The model achieves high scores on the metric while violating the spirit of it.
  • Shortcut learning: Models latch onto spurious correlations instead of causal features.
  • Adversarial brittleness: Tiny, imperceptible changes to inputs can flip an AI's output completely.
  • Out-of-distribution failure: Models fail silently when real-world inputs differ from training distribution.
Mitigation Approaches
  • Red-teaming: Deliberately trying to break the system before deployment.
  • Human oversight gates: Requiring human review for high-stakes outputs.
  • Staged deployment: Rolling out to limited users before full deployment.
  • Model cards: Standardized documentation of limitations and evaluation results.
Quiz 6

Failure Modes

4 questions — free, untracked, retake anytime.

gaming means:

✓ Correct — ✓ Specification gaming: the model found a way to maximize its reward signal without achieving the actual goal.
❌ Specification gaming: the model optimizes for the measured metric in ways that violate the spirit of the task.

before deployment means:

✓ Correct — ✓ Red-teaming is adversarial testing — structured attempts to find unsafe behaviors, edge cases, and failure modes before deployment.
❌ Red-teaming is adversarial: deliberately trying to make the system fail before it's deployed to real users.

learning occurs when:

✓ Correct — ✓ Shortcut learning: the model learned a proxy pattern that worked in training but fails when the spurious correlation breaks down in deployment.
❌ Shortcut learning: the model found a pattern that correlates with the right answer in training data but isn't the actual causal feature.

cards are primarily designed to:

✓ Correct — ✓ Model cards are transparency documentation: what a model is for, how it performs across subgroups, known limitations, and appropriate use contexts.
❌ Model cards document the model's intended use, evaluation results, known limitations, and guidance on appropriate deployment.
Lab 6

Mitigation Planning

Design a pre-deployment safety plan for an AI hiring tool.

Lab 6 — Mitigation Planning

You're the safety lead before launching an AI hiring tool. The AI guide will help you build a mitigation plan.

  1. The AI opens: which failure mode worries you most for a hiring tool?
  2. Build your mitigation plan step by step.
  3. Address: how does your plan handle failure modes you can't test for in advance?
Consider: red-teaming, staged rollout, human oversight gates, and model cards.
🔬 AI GuideLab 6

Module 3 Test

8 questions covering all 6 lessons. Free, untracked, retake anytime.

resume-screening AI discriminated against women because:

✓ Correct — ✓ Historical bias in training data is replicated by AI systems — no malice required.
❌ The AI faithfully learned from biased historical data. The problem wasn't intent — it was the training data.

the legal brief hallucination case, what was the core failure?

✓ Correct — ✓ The fundamental error was submitting AI output without verifying it. The AI produced plausible-looking fiction; the lawyers treated it as fact.
❌ The failure was using AI output without verification. The citations were perfectly formatted but entirely fictional.

bias makes human-in-the-loop oversight fail because:

✓ Correct — ✓ Automation bias turns human oversight into a rubber stamp — someone is technically responsible, but no one is genuinely reviewing the AI's decisions.
❌ Automation bias: humans defer automatically. The safeguard exists formally but not substantively.

bias means:

✓ Correct — ✓ Proxy bias: ZIP code correlates with race, income correlates with gender. Excluding the protected attribute doesn't eliminate discrimination when correlated proxies remain.
❌ Proxy bias: correlated variables carry the discriminatory signal even when the protected attribute itself is excluded from inputs.

impossibility theorem in AI fairness proves:

✓ Correct — ✓ Mathematical proof: you must choose which fairness definition to optimize. That choice is a values judgment, not a technical one.
❌ The theorem proves you can't satisfy all fairness definitions simultaneously — you must choose, and that choice reflects values.

before deployment means:

✓ Correct — ✓ Red-teaming is adversarial pre-deployment testing — deliberately trying to find unsafe behaviors and edge cases.
❌ Red-teaming: deliberately attempt to make the system fail before it's deployed.

gaming occurs when:

✓ Correct — ✓ Specification gaming: the model maximizes the reward signal in unintended ways that violate the actual goal.
❌ Specification gaming: the model found a loophole to score well on the metric without achieving what was actually wanted.

fairness definition requires equal positive outcome rates across demographic groups?

✓ Correct — ✓ Demographic parity: the positive outcome rate (loan approved, job offered) is equal across all demographic groups.
❌ Demographic parity requires equal positive outcome rates across groups — regardless of underlying qualifications or base rates.