Trust calibration — knowing when and how much to rely on AI.
A company built an AI to screen job applications. It reviewed hundreds of resumes in seconds. But six months in, HR noticed almost no women were making it past the first screen for engineering roles.
The AI had been trained on the company's past successful hires — which had been overwhelmingly male. It learned that "engineer" meant "man" and filtered accordingly. The system wasn't trying to discriminate. It was doing exactly what it was trained to do. That was the problem.
AI systems fail for predictable reasons:
Appropriate trust requires domain-by-domain calibration. The higher the stakes and the less reversible the decision, the more independent verification is warranted — regardless of AI confidence scores.
AI can be highly useful. But "the AI said so" is never sufficient justification for a high-stakes, irreversible decision.
4 questions — free, untracked, retake anytime.
resume-screening AI discriminated against women. The most accurate explanation is:
is "distribution shift" in AI?
which decision should you use the LEAST AI trust and the MOST independent verification?
Law says: "When a measure becomes a target, it ceases to be a good measure." Which AI failure does this describe?
Develop a framework for when to trust AI and when to verify.
The AI guide will discuss the resume screener case and help you develop your own framework for AI trust.
Hallucination, confabulation, and the confidence problem.
In 2023, lawyers in a real U.S. federal court case submitted a legal brief citing six prior court cases. All six were generated by ChatGPT. None of them existed. The AI produced plausible case names, docket numbers, and fabricated quotes from fictional rulings — in perfect legal citation format. The judge sanctioned all parties. The lawyers hadn't verified the AI's output.
Language models predict text. They do not retrieve facts from a verified database — they generate what statistically fits the context. This produces hallucinations: confident, fluent, completely false outputs.
Hallucinations are hardest to detect on topics you know least about. The AI sounds equally confident whether it's right or wrong.
4 questions — free, untracked, retake anytime.
do language models produce hallucinations?
the legal brief case, what was the fundamental error?
content type is MOST vulnerable to AI hallucination going undetected?
are hallucinations hardest to detect on topics you know least about?
Analyze professional accountability for AI hallucinations.
The AI guide will discuss the legal brief case and what it means for professional responsibility.
Accountability chains and the human-in-the-loop.
In 2018, an autonomous vehicle struck and killed a pedestrian in Arizona. Investigations found the car's sensors detected the woman but the AI misclassified her as a false positive. The backup safety driver was watching a video on her phone at the time.
Fault was distributed: the AI system designer hadn't built adequate fail-safes. The company's safety protocols were insufficient. The safety driver had abdicated her responsibility. The regulator had approved testing without enough oversight. No single party was entirely at fault. No single party was entirely innocent.
Every AI deployment involves a chain of responsible parties: researchers who design the models, companies that build products, deployers who integrate AI into systems, users who act on outputs, and regulators who define what's permitted.
Keeping a human responsible for final decisions sounds like a safeguard. But research shows humans often defer to AI recommendations automatically — making oversight nominal rather than real.
4 questions — free, untracked, retake anytime.
the autonomous vehicle fatality, how was fault distributed?
is "automation bias" in the context of human oversight?
in an AI deployment chain typically has the LEAST direct legal accountability under current frameworks?
does "human in the loop" fail as a safeguard when automation bias is present?
Work through the accountability chain in the autonomous vehicle case.
Discuss the AV fatality case and the distributed accountability problem.
Where bias enters AI systems and how it spreads.
Medical AI systems trained mostly on lighter-skinned patients have shown lower accuracy for darker-skinned patients in detecting conditions including skin cancer and pulse oximetry errors. If the training data doesn't represent the full range of human variation, the model doesn't learn to recognize it — and the patients who most need accurate diagnosis are the ones it serves least well.
Bias enters AI systems at every stage of development:
4 questions — free, untracked, retake anytime.
AI trained mostly on lighter-skinned patients performs worse for darker-skinned patients. This is primarily:
lending model excludes "race" as an input but includes ZIP code. ZIP code correlates with race due to historical segregation. This is:
do feedback loops make AI bias worse over time?
is responsible for addressing bias in an AI system?
Trace how bias enters and compounds through an AI system.
Discuss the medical AI case and what you'd do to address the bias.
Why "fair" is harder to define than it sounds.
There are multiple mathematically valid definitions of fairness — and they often cannot all be satisfied at the same time.
Researchers proved in 2016 that no algorithm can simultaneously satisfy all common fairness definitions when base rates differ across groups. Every fairness metric is a value judgment about which tradeoff is acceptable — not a neutral technical decision.
Choosing which fairness metric to use is not a technical decision. It's a values decision about whose interests to prioritize and which type of error is worse.
4 questions — free, untracked, retake anytime.
parity requires:
proved in 2016 that you cannot simultaneously satisfy all common fairness definitions when:
is the choice of fairness metric a "values decision" rather than a "technical decision"?
opportunity (as a fairness metric) means:
Choose a fairness metric and defend the tradeoff.
You're designing a loan approval AI. The AI guide will push you to choose a fairness metric and justify it.
Systematic patterns in how AI breaks — and what builders do about it.
4 questions — free, untracked, retake anytime.
gaming means:
before deployment means:
learning occurs when:
cards are primarily designed to:
Design a pre-deployment safety plan for an AI hiring tool.
You're the safety lead before launching an AI hiring tool. The AI guide will help you build a mitigation plan.
8 questions covering all 6 lessons. Free, untracked, retake anytime.
resume-screening AI discriminated against women because:
the legal brief hallucination case, what was the core failure?
bias makes human-in-the-loop oversight fail because:
bias means:
impossibility theorem in AI fairness proves:
before deployment means:
gaming occurs when:
fairness definition requires equal positive outcome rates across demographic groups?