Lesson 1 · Module 2

The Architecture of Trust

How humans decide whether to believe, rely on, or delegate to an AI system — and what designers can do about it.

When does trust in AI become dangerous — and when does distrust become its own kind of failure?

At 9:58 p.m. on March 18, 2018, a self-driving Uber Volvo SUV struck and killed Elaine Herzberg as she walked her bicycle across Mill Avenue. The vehicle's perception system had detected her 5.6 seconds before impact and had reclassified her as a vehicle, then a bicycle, then a vehicle again — cycling through labels without triggering an emergency stop. The safety driver, Rafaela Vasquez, was monitoring a video stream on her personal phone. She looked up less than a second before impact.

The National Transportation Safety Board's 2019 report identified the root cause not as a sensor failure but as a trust calibration failure. Uber's system had disabled Volvo's built-in emergency braking. Vasquez had been trained to trust the automation. The company had deployed the vehicle in a mode that produced frequent spurious braking alerts — so engineers had suppressed the alert system entirely to reduce false positives. Every choice made trust easier. None of them made the system safer.

What Is Trust in AI?

Trust in AI systems is not a single variable. Researchers distinguish at minimum three separable dimensions: competence trust (does the system do what it claims?), integrity trust (does it operate within agreed constraints?), and benevolence trust (does it act in the user's interest rather than against it?). These dimensions can diverge sharply — a system can be highly competent while having low integrity, as when a recommendation algorithm maximizes engagement at the expense of user wellbeing.

A fourth dimension, increasingly important in deployed systems, is predictability: users extend more trust to systems whose failures they can anticipate. This is why pilots who understand the failure modes of autopilot systems maintain safer manual reversion than pilots who merely trust the system to work.

Over-trust Reliance on an AI system beyond its demonstrated competence, often produced by smooth interfaces, confident outputs, or habituation that suppresses vigilance.

Under-trust Rejection of valid AI outputs due to unfamiliarity, prior bad experience, or perceived opacity — a real failure mode that causes users to ignore beneficial recommendations.

Calibrated trust Trust that matches actual system reliability — neither over- nor under-reliant — updated as evidence about the system's competence accumulates.

The COMPAS Recidivism Case: Trust Miscalibrated at Scale

In May 2016, ProPublica published an analysis of COMPAS (Correctional Offender Management Profiling for Alternative Sanctions), a risk-scoring tool used by judges in Broward County, Florida and hundreds of other jurisdictions. The algorithm assigned defendants a "recidivism risk" score of 1–10. ProPublica found that Black defendants were nearly twice as likely as white defendants to be falsely flagged as high risk, while white defendants were more likely to be incorrectly labeled low risk when they later reoffended.

The failure was not purely algorithmic — it was a trust design failure. Judges were given a numeric score without adequate disclosure of the model's training data, its error rate distribution across demographic groups, or the fact that the score's predictive accuracy for violent recidivism was only marginally better than chance. The interface communicated false precision. Judicial users had no mechanism for appropriate skepticism. The system had been designed to be trusted rather than designed to be understood.

Design Implication

Interfaces that present AI outputs as crisp numbers or binary recommendations without surfacing uncertainty, training limitations, or known failure modes are not neutral — they actively manufacture over-trust. Designing for calibrated trust requires building skepticism affordances directly into the interface.

Trust Calibration as a Design Target

Lee and See's foundational 2004 framework in Human Factors described trust calibration as requiring three things: that the system's actual reliability be knowable, that the interface convey that reliability accurately, and that users have sufficient cognitive resources to integrate this information. In practice, AI systems routinely fail all three conditions. Reliability is difficult to measure because it varies across input distributions. Interfaces typically display confidence scores calibrated on training data that may not match deployment contexts. And users operating under time pressure default to heuristic trust — they trust whatever worked last time.

Contemporary research by Bansal et al. (2021, "Does the Whole Exceed Its Parts?") demonstrated that AI explanations improve human-AI team accuracy only when they help users identify when the AI is wrong — not when they simply increase understanding of how the AI works. This is a crucial distinction: transparency that explains correct outputs adds no value; transparency that reveals error signatures is what enables calibration.

Core Principle

The goal of trust design is not maximum trust — it is accurate trust. A user who trusts an AI system at 85% reliability and rejects its outputs 15% of the time in the right situations outperforms both the user who always trusts and the user who never trusts.

Lesson 1 Quiz

The Architecture of Trust — 4 questions

1. The 2018 Uber autonomous vehicle fatality was identified by the NTSB as primarily a failure of:

Correct. The NTSB report identified the disabling of Volvo's emergency braking, suppression of the alert system, and the safety driver's trained over-reliance on automation as the causal chain — a systemic trust calibration failure, not a sensor failure.

Incorrect. The NTSB identified trust calibration as the primary failure: suppressed emergency braking, eliminated alerts, and a driver habituated to trust the automation unconditionally.

2. In the COMPAS recidivism scoring case, the core design failure was:

Correct. COMPAS presented a numeric score that communicated confidence without disclosing its weak accuracy for violent recidivism, its differential error rates across racial groups, or the nature of its training data — a trust design failure that manufactured unwarranted judicial reliance.

Incorrect. COMPAS did not use race explicitly. The failure was that the interface was designed to communicate confident precision without surfacing the model's real limitations and disparate error rates.

3. According to Bansal et al. (2021), AI explanations improve human-AI team performance primarily when they:

Correct. Bansal et al. found that explanations only improve team accuracy when they surface error signatures — helping users know when to override the AI — not when they merely increase understanding of correct outputs.

Incorrect. Their finding was that transparency about correct outputs adds little value. What matters is whether explanations help users recognize the AI's error conditions.

4. "Calibrated trust" in an AI system is best described as:

Correct. Calibrated trust means neither over- nor under-relying on a system — matching reliance to demonstrated competence and adjusting as the system's performance in different contexts becomes known.

Incorrect. Calibrated trust is not maximizing trust, minimizing it, or setting a fixed threshold — it is accurately matching trust to actual system reliability and updating that estimate with experience.

Lab 1: Trust Calibration Analysis

Identify over-trust and under-trust failure modes in AI interface designs

Your Task

You're a design reviewer evaluating AI systems for trust calibration problems. Discuss real or hypothetical AI interface scenarios with the assistant — identify whether each case represents over-trust, under-trust, or miscalibration, and propose specific design interventions.

Start by describing an AI interface you've encountered (a navigation app, a medical tool, a content recommendation system, a fraud detection alert) and analyze its trust design. The assistant will probe your analysis and push you toward concrete design improvements.

Trust Calibration Lab

Welcome to the Trust Calibration Lab. I'm your design review partner for this session. Trust failures in AI interfaces are subtle — they're often designed in, not just stumbled into. Pick any AI-powered system you've used or know about, and let's examine how it manages user trust. Does it show confidence scores? Does it explain its reasoning? Does it tell users when it might be wrong? Start wherever you like, and I'll help you dig into the trust design implications.

Lesson 2 · Module 2

Explainability and Its Discontents

What AI explanations actually do to human decision-making — and why "explainable AI" often fails the people it is meant to help.

When an AI explains itself, does that make users smarter — or just more confident in the wrong answers?

Article 22 of the EU's General Data Protection Regulation, which took effect in May 2018, granted individuals a right "not to be subject to a decision based solely on automated processing" that significantly affects them — and, in contested readings, a right to "meaningful information about the logic involved." Within months, legal scholars and computer scientists were publicly disagreeing about what this meant. Could a GDPR-compliant explanation be a post-hoc rationalization of a black-box decision? Did "meaningful information" require technical accuracy — or was it enough to give a reason that felt meaningful to the user, even if it didn't reflect the model's actual computation?

The resulting implementations diverged wildly. Some companies provided genuine feature importance scores. Others produced what Wachter, Mittelstadt and Russell called "legally compliant but informationally vacuous" outputs: sentences like "You were declined because your financial profile did not meet our criteria." The regulation created a market for the appearance of explainability without creating a standard for its substance.

The Gap Between Explanation and Understanding

The most consequential finding from a decade of explainable AI (XAI) research is that explanations do not straightforwardly transfer understanding. Chromik and Butz (2021) surveyed 54 XAI user studies and found that the majority measured user satisfaction with explanations, not whether users had gained actionable understanding of the model's behavior. User satisfaction is a weak proxy — users rate confident, fluent explanations as more helpful regardless of whether those explanations are accurate.

The phenomenon of explanation-induced over-trust is particularly well-documented in medical AI. Cai et al. (2019) found that physicians shown AI diagnostic outputs with accompanying explanations accepted more AI errors than physicians shown outputs without explanations. The explanation created a sense of auditability that suppressed independent clinical judgment. The researchers called this the "explanation paradox": the explanation was doing the opposite of what it was designed to do.

LIME Local Interpretable Model-agnostic Explanations (Ribeiro et al., 2016) — generates local approximations of complex model behavior by perturbing inputs and observing output changes. Widely used but known to produce unstable explanations for similar inputs.

SHAP SHapley Additive exPlanations (Lundberg & Lee, 2017) — assigns feature contribution values based on game-theoretic Shapley values. More theoretically grounded than LIME but computationally expensive and difficult to communicate to non-experts.

Counterfactual explanation An explanation of the form "if X had been different, the outcome would have changed" — often more actionable than feature importance scores, but may not reflect actual causal mechanisms in the model.

The Instability Problem: When Explanations Contradict Each Other

A 2019 study by Alvarez-Melis and Jaakkola demonstrated that LIME-generated explanations for nearly identical inputs could differ substantially — producing different feature rankings for inputs that differed by a single pixel. If a user could generate a different explanation by slightly rephrasing their query, what does the explanation actually represent? Ghassemi, Oakden-Rayner and Beam (2021) made this point forcefully in The Lancet Digital Health, arguing that current XAI methods are insufficient for clinical deployment not because they are technically flawed but because they provide a false sense of auditability — they perform transparency without producing it.

This has direct interface design consequences. An interface that presents a LIME explanation without conveying its inherent instability is making an implicit claim about the explanation's reliability that the underlying method cannot support. Designing responsibly requires either not presenting such explanations, or designing interfaces that make their limitations legible — for example, showing confidence ranges around feature importance scores or providing comparison explanations across similar inputs.

Case: Skin Lesion Classifiers

Multiple studies of deep learning skin cancer classifiers (notably Narla et al., 2018, and Winkler et al., 2019) found that models achieved high accuracy partly by learning to associate surgical ruler markings and colored calibration patches with malignancy — a spurious correlation present in training data. When saliency map explanations were generated, they highlighted clinically irrelevant regions. Dermatologists shown these explanations could not detect the spurious correlation. The explanation system was functioning correctly; it was accurately depicting what the model had learned. The problem was that users had no reason to distrust it.

Designing Explanations That Actually Help

The research suggests several design principles that diverge from common practice. First, explanations should target decision support, not comprehension — users need to know what to do with an AI output, not how the model computed it. Second, explanations should be contrastive: "the system predicted X rather than Y because of Z" is more actionable than "the system predicted X because of Z." Third, explanations should be honest about their own limitations — an explanation that says "this feature was identified as important, but this explanation has a 30% probability of being unstable for similar inputs" is more trustworthy than a confident feature list.

The EU AI Act of 2024, which goes further than GDPR in addressing transparency requirements for high-risk AI systems, requires that AI systems be designed with "appropriate human oversight measures" rather than simply providing explanations. This framing — oversight rather than explanation — reflects a growing consensus that what users need is not understanding of the model but the ability to catch and correct model errors.

Design Principle

The question to ask of any AI explanation is not "does this help the user understand the model?" but "does this help the user catch the model's mistakes?" These are different design targets, and most existing XAI approaches optimize for the former while claiming to achieve the latter.

Lesson 2 Quiz

Explainability and Its Discontents — 4 questions

1. The "explanation paradox" documented by Cai et al. (2019) refers to:

Correct. The paradox is that explanations — intended to enable oversight — instead created a sense of auditability that suppressed independent clinical judgment, causing users to accept more errors.

Incorrect. The paradox was that physicians given explanations were more likely to accept AI errors, not less — the explanation created false confidence rather than enabling oversight.

2. Alvarez-Melis and Jaakkola's 2019 findings about LIME are most significant because:

Correct. Instability means that users cannot trust that the explanation reflects something consistent about the model's reasoning — which undermines the core purpose of providing explanations at all.

Incorrect. The key finding was instability: minimally different inputs could produce very different explanations, suggesting LIME explanations may not reliably reflect the model's actual decision logic.

3. The skin lesion classifier studies (Narla et al., Winkler et al.) demonstrated that saliency map explanations:

Correct. The explanations were technically accurate — they showed what the model attended to. The problem was that users trusted these explanations and could not use them to identify that the model had learned irrelevant features like ruler markings.

Incorrect. The explanations were technically functioning — they accurately showed the model had learned spurious correlations. The failure was that users had no reason to distrust what the explanations showed them.

4. According to the lesson, what is the key design question that should guide AI explanation design?

Correct. Understanding the model and catching its errors are different targets. Most XAI systems optimize for comprehension or satisfaction, but the operationally important question is whether the explanation helps users identify when the AI is wrong.

Incorrect. The lesson's core design principle is that the right question is whether the explanation helps users catch errors — not whether it satisfies regulation, produces understanding, or generates user satisfaction.

Lab 2: Explanation Design Critique

Evaluate AI explanation approaches against the "error detection" standard

Your Task

You are critiquing explanation designs for high-stakes AI systems. For each scenario you discuss, analyze whether the explanation approach enables users to catch errors — or merely creates the appearance of transparency. Propose specific redesigns.

Choose a domain: credit scoring, medical diagnosis, content moderation, hiring screening, or criminal justice. Describe how the AI system currently explains its decisions to users, then we'll evaluate whether that explanation design actually helps users detect errors.

Explanation Design Lab

Welcome to the Explanation Design Lab. We're going to stress-test AI explanation designs against the most important question: does this explanation help users catch mistakes? Pick a domain — credit, medicine, hiring, content moderation, or criminal justice — and describe how AI decisions are currently explained to the people affected by them. We'll then work through what's actually useful about that explanation, what's misleading, and how you'd redesign it.

Lesson 3 · Module 2

Designing for Appropriate Reliance

Interface patterns that help users delegate to AI when it is right and override it when it is wrong — the operational challenge of human-AI teaming.

What does an interface look like that makes it easier to be appropriately skeptical than to blindly follow?

IBM's Watson for Oncology was deployed in hospitals across the United States, India, South Korea, Thailand, and elsewhere between 2015 and 2018, promoted as a tool that could recommend cancer treatments aligned with leading oncologists' judgment. In 2018, STAT News reported on internal IBM documents showing that Watson had recommended "unsafe and incorrect" treatment options, including suggesting a patient with severe bleeding be given a drug that could worsen hemorrhaging. An IBM slide deck from 2017 described Watson's recommendations as "sometimes wrong" and noted that the system had been trained on a limited set of hypothetical patient cases rather than real clinical data.

Multiple hospitals quietly discontinued use. The University of Texas MD Anderson Cancer Center ended its Watson contract after spending $62 million. The Memorial Sloan Kettering collaboration, which had trained Watson's recommendations, was revealed to have an undisclosed financial relationship with IBM. What the case illustrated was not simply bad AI — it was the deployment of AI in an interface that provided no affordances for appropriate skepticism. Physicians were given recommendations without access to the reasoning, the training data's limitations, or the confidence distribution across different cancer types where Watson's accuracy varied enormously.

The Complementarity Hypothesis

The theoretical basis for human-AI teaming is complementarity: humans and AI systems have different error profiles, and a well-designed collaboration should outperform either alone. AI systems tend to be consistent across high-volume repetitive judgments but fail on distributional shifts, rare cases, and inputs far from training data. Humans are inconsistent across repetition (a well-documented finding in clinical and legal judgment studies) but often better at reasoning about novel situations, ethical edge cases, and contextual factors that fall outside training distributions.

The practical challenge is building interfaces that activate this complementarity. Dietvorst, Logg and colleagues showed in a series of studies (2015–2018) that humans frequently abandon algorithms after observing a single error — even when the algorithm significantly outperforms human judgment in aggregate. This is called algorithm aversion. Counterintuitively, when users were allowed to slightly modify algorithm outputs, aversion decreased dramatically — not because the modifications improved accuracy but because users' sense of agency reduced their psychological need to distance themselves from the system after errors.

Algorithm aversion The tendency to lose confidence in an algorithmic system after observing a single error, even when the algorithm's aggregate accuracy exceeds human judgment — documented by Dietvorst et al. in forecasting, medical, and legal contexts.

Automation bias The tendency to over-rely on automated systems, accepting their outputs without sufficient independent scrutiny — the opposite failure mode from algorithm aversion, and more dangerous in high-stakes deployment contexts.

Complementarity The design goal in which human-AI collaboration produces better outcomes than either agent alone, by deliberately pairing human strengths with AI capabilities in ways that compensate for each system's distinctive weaknesses.

Cockpit Design as a Model

Aviation's transition from manual flight to increasing automation over 50 years provides the most thoroughly studied natural experiment in appropriate reliance design. Following a series of accidents in the 1970s and 1980s caused by automation over-reliance — notably Air France Flight 447 (2009), where pilots responded incorrectly to an autopilot disconnect because they had lost situational awareness — aviation human factors researchers developed a set of design principles that have broadly influenced AI interface design.

The key principle is mode awareness: the interface must make the current automation state, its constraints, and the conditions under which it will disengage legible to the operator at all times. Mode confusion — where pilots were unaware of which automation mode was active — was a contributing factor in multiple fatal accidents. Applied to AI systems, this principle argues that interfaces should continuously communicate not just what the AI recommends but what kind of task the AI is currently performing, what its confidence level is, and under what conditions it would expect to be wrong.

Design Pattern: Confidence-Qualified Displays

Google Maps' navigation system displays different visual treatments for high-confidence routing (a solid blue line) and lower-confidence estimates (a dotted or grayed display during signal loss). This is a simple implementation of confidence-qualified display — the interface communicates not just a recommendation but the epistemic status of that recommendation. The same principle, applied to medical AI or legal risk tools, would require displaying not just a risk score but a clear indication of the confidence range and the conditions under which the score is most likely to be inaccurate.

Structured Override Mechanisms

One of the most consequential design decisions in human-AI systems is how the interface handles human disagreement with AI recommendations. Systems that make override easy tend toward under-reliance; systems that make override difficult or invisible tend toward over-reliance. The optimal design, empirically, appears to be structured override: the ability to override is prominent and requires minimal friction, but the interface asks users to briefly categorize their reason for overriding.

This approach has been tested in radiology AI deployment at Massachusetts General Hospital and in ICU alarm systems at Emory University. In both cases, requiring clinicians to briefly indicate why they were overriding the AI served two functions: it modestly slowed impulsive overrides without preventing deliberate ones, and it generated a dataset of override reasons that could be used to improve the model's failure modes. The override mechanism was simultaneously a user support tool and a system improvement mechanism.

Design Principle

The interface should make it slightly easier to scrutinize and slightly harder to blindly accept an AI recommendation than it is to blindly reject it. Most current interfaces invert this: they present AI outputs in ways that require active effort to question, while making acceptance the path of least resistance.

Lesson 3 Quiz

Designing for Appropriate Reliance — 4 questions

1. IBM Watson for Oncology's deployment failure was primarily a demonstration of:

Correct. The case illustrates how AI can be deployed in interfaces that present recommendations without their limitations, training data constraints, or confidence variations — making appropriate skepticism structurally difficult for users.

Incorrect. The lesson's focus is on the interface design failure: physicians were given recommendations without access to reasoning, training data limitations, or confidence variation — the interface was designed to be trusted, not questioned.

2. Dietvorst et al.'s research on algorithm aversion found that aversion decreased when:

Correct. The modification condition reduced aversion even when modifications didn't improve accuracy — the psychological mechanism was agency, not accuracy. Users needed to feel they were collaborating rather than being overridden.

Incorrect. Dietvorst et al. found that allowing users to modify outputs — even slightly, even without accuracy improvement — significantly reduced algorithm aversion by preserving users' sense of agency.

3. The aviation principle of "mode awareness" translates to AI interface design as:

Correct. Mode confusion — not knowing what the automation is doing or when it will fail — contributed to multiple aviation accidents. In AI interfaces, this translates to making the system's current state, confidence level, and known failure conditions continuously legible.

Incorrect. Mode awareness means the interface continuously communicates the automation's current state, confidence, and expected failure conditions — making the system's epistemic status legible at all times.

4. The "structured override" approach, as tested at Massachusetts General Hospital and Emory University, was valuable because:

Correct. Structured override served dual purposes: it modestly slowed impulsive rejections (improving aggregate decision quality) while capturing override reasons as a dataset for model improvement — making the human oversight mechanism simultaneously a system improvement loop.

Incorrect. Structured override kept override ability prominent and frictionless but asked users to briefly categorize their reason — this slowed impulsive overrides and generated improvement-relevant data without blocking human judgment.

Lab 3: Appropriate Reliance Design

Design interface patterns that activate human-AI complementarity

Your Task

You're a UX designer asked to redesign an AI-assisted decision-making interface to better support appropriate reliance. Work through specific design choices with the assistant — what information to show, how to display confidence, how to structure overrides, how to prevent both over-trust and algorithm aversion.

Pick a concrete AI decision-support context: a radiologist reviewing AI-flagged scans, a judge consulting a risk score, a loan officer reviewing an AI credit assessment, or a content moderator checking AI decisions. Describe the current interface (real or plausible), and we'll redesign it together.

Appropriate Reliance Lab

Welcome to the Appropriate Reliance Design Lab. We're going to design interfaces that make it easier to be a good human collaborator with AI — neither blindly trusting nor reflexively skeptical. Choose a concrete context: radiology AI, criminal risk scoring, credit underwriting, or content moderation. Describe the decision-maker, what the AI tells them, and how they currently interact with the system. We'll work through specific interface design choices to improve the calibration of their reliance.

Lesson 4 · Module 2

Transparency at Scale

When AI systems affect millions of people, individual-level interface transparency is insufficient — and institutional transparency requirements become a design problem.

Who is transparency for — the individual user, the affected community, regulators, or the historical record?

Between 2018 and 2022, Facebook's automated content moderation systems made approximately 100 billion content decisions per year. The individual transparency design was consistent: when a post was removed, users received a notification explaining the violated policy category ("This post goes against our Community Standards on hate speech"). Appeals were possible through an automated review queue. This satisfied the surface requirements of individual transparency.

What it concealed was the system's aggregate behavior. Frances Haugen, a former Facebook product manager, released internal documents in October 2021 showing that the company's own research found its systems consistently under-moderated harmful content in non-English languages, disproportionately flagged content from political conservatives in the United States (producing political controversy over the system's intent), and that the appeals process resolved fewer than 1% of appeals in favor of users. Individual transparency — telling each user why their specific post was removed — coexisted with near-complete opacity about how the system behaved across populations, which communities it failed, and what error rates were acceptable.

The Levels of Transparency

Zerilli et al. (2019) distinguish between local transparency (explanation of a specific decision to the individual affected) and global transparency (disclosure of how a system behaves across populations). These require different design approaches. Local transparency is an interface design problem: how do you communicate this decision to this person? Global transparency is an institutional and regulatory design problem: what aggregate performance data should be disclosed, to whom, in what form, and how often?

Current practice is heavily weighted toward local transparency. Privacy laws like GDPR and CCPA create rights to individual explanation. But individual explanations can be accurate about a specific case while concealing systemic failures — COMPAS could accurately tell an individual defendant "your score was 7 because of factors X, Y, Z" while the system's differential false positive rates across racial groups remained undisclosed. The explanation is locally accurate but systemically misleading.

Algorithmic auditing Third-party examination of AI system behavior across populations, typically using statistical analysis of outputs rather than access to model internals — the primary mechanism for achieving global transparency.

Model cards Structured documentation proposed by Mitchell et al. (2019) at Google disclosing a model's intended use, performance across demographic groups, known limitations, and evaluation data — a transparency artifact targeting developers and deployers rather than end users.

Disparate impact When a facially neutral policy or algorithm produces disproportionately negative outcomes for a protected group — detectable through population-level analysis that individual explanation mechanisms cannot surface.

The EU AI Act's Transparency Architecture

The EU AI Act (2024) creates a tiered transparency regime based on risk level. High-risk AI systems — defined to include systems used in employment, credit, education, criminal justice, and critical infrastructure — must maintain technical documentation, keep logs of system operation, provide information sufficient for human oversight, and in some categories undergo conformity assessment before deployment. General-purpose AI systems above a compute threshold must publish summaries of training data and comply with copyright obligations.

Notably, the Act's transparency requirements are largely directed at providers and deployers — not at the affected individuals. The Act creates obligations to document and disclose to regulators and downstream users (businesses deploying the AI), while individual transparency remains governed by GDPR. This creates a layered transparency architecture in which regulatory transparency (disclosure to oversight bodies), market transparency (disclosure to business customers), and user transparency (disclosure to individuals) are legally distinct with different requirements.

Case: Palantir's Predictive Policing and Community Transparency

Palantir Technologies' predictive policing software was deployed in New Orleans from 2012 to 2018 under a secret contract disclosed publicly only through a 2018 Veritas investigation. The city's residents — and their elected representatives — had no knowledge of the system's existence during this period. This represents a transparency failure at the institutional level: neither the individual affected (a person placed on a watch list) nor the community governed by the system had access to information about it. When transparency operates only at the individual level, it is possible to maintain complete opacity about whether a system exists at all.

Designing for Institutional Transparency

The practical design challenge of institutional transparency involves three components that are distinct from individual explanation design. First, audit trails: systems must log decisions in forms that support retrospective analysis of aggregate behavior. This is a data architecture problem that must be built into systems from the start, not added later. Second, disclosure mechanisms: there must be channels through which aggregate performance data reaches appropriate audiences — regulators, civil society, affected communities — not just the organizations that deploy the system. Third, contestability structures: there must be mechanisms for affected communities to challenge not just individual decisions but the deployment of the system itself.

Amsterdam and Helsinki's AI registers — public databases of AI systems used in municipal governance — represent one approach to institutional transparency design. Published since 2020, these registers require city departments to disclose the purpose, data sources, vendor, and known limitations of AI systems they deploy. The registers are publicly accessible, creating a form of systemic transparency that individual explanation mechanisms cannot provide. By 2023, Amsterdam's register listed over 50 active AI applications used in municipal services.

Design Principle

Individual transparency is necessary but not sufficient. Systems that affect large populations create obligations of institutional transparency that require audit trail architecture, disclosure mechanisms, and contestability structures — none of which can be satisfied by improving individual explanation interfaces alone.

Lesson 4 Quiz

Transparency at Scale — 4 questions

1. The Facebook content moderation case (2018–2022) illustrated that individual transparency mechanisms can coexist with:

Correct. Facebook told individual users why their posts were removed — local transparency — while concealing systemic under-moderation in non-English languages, differential political impact, and a less-than-1% appeal success rate — global opacity.

Incorrect. The case showed that local transparency can mask global failures: individual policy-category notices coexisted with complete opacity about aggregate error rates, community disparities, and the appeals system's near-total failure.

2. According to Zerilli et al. (2019), "global transparency" differs from "local transparency" in that it:

Correct. Global transparency addresses aggregate population-level behavior, error rate distributions, and systemic disparities — questions that cannot be answered by improving individual explanation interfaces and require institutional disclosure mechanisms.

Incorrect. Global transparency is about population-level behavior and systemic performance, which requires institutional mechanisms (audits, registers, regulatory disclosure) that are different in kind from individual explanation interfaces.

3. The Palantir predictive policing case in New Orleans demonstrated which transparency failure mode?

Correct. The New Orleans deployment ran secretly for six years. Individual transparency was impossible because affected people didn't know the system existed. Community contestability was impossible for the same reason. This is the limit case of why institutional transparency must precede individual transparency.

Incorrect. The failure was institutional: the system operated under a secret contract for six years, unknown to the public or elected officials. You cannot contest or exercise rights over a system whose existence is undisclosed.

4. Amsterdam and Helsinki's AI registers are significant as transparency mechanisms because they:

Correct. The registers address institutional transparency by creating a public record of what AI systems exist, what they do, whose data they use, and what their known limitations are — enabling democratic oversight at the community level.

Incorrect. The registers are public databases of municipal AI systems disclosing purpose, data sources, vendors, and limitations — creating institutional-level transparency about the existence and nature of AI systems used in governance.

Lab 4: Institutional Transparency Design

Design audit trails, disclosure mechanisms, and contestability structures for AI systems at scale

Your Task

You're advising a city, healthcare system, or large organization on designing institutional transparency for an AI system that will affect large numbers of people. Go beyond individual explanation interfaces to design the full transparency architecture: what gets logged, what gets disclosed, to whom, in what form, and how affected communities can contest the system.

Choose a deployment context: a city deploying AI in social services allocation, a hospital system deploying AI in triage prioritization, a court system using AI risk scoring, or an employer using AI in hiring screening. Describe the system and its affected population, then we'll design the institutional transparency architecture together.

Institutional Transparency Lab

Welcome to the Institutional Transparency Lab. We're moving beyond individual explanation interfaces to design the full transparency architecture for AI systems that affect large populations. Local transparency — telling individuals why a specific decision was made — is necessary but not sufficient. We need audit trails, population-level disclosure, and contestability mechanisms. Choose a deployment context: city social services AI, hospital triage AI, court risk scoring, or employer hiring AI. Describe the system and who it affects, and we'll build a transparency architecture that operates at the institutional level.

Module 2 Test

Trust and Transparency — 15 questions · 80% to pass

1. Which of the following best defines "calibrated trust" in an AI system?

Correct. Calibrated trust tracks actual system performance — neither over- nor under-reliant, and updated as the system's behavior in different contexts becomes known.

Incorrect. Calibrated trust means accurately matching reliance to demonstrated reliability — neither maximizing nor minimizing trust.

2. The NTSB's 2019 investigation of the Uber autonomous vehicle fatality identified the primary causal factor as:

Correct. The NTSB identified systemic trust design failures: disabled safety systems, alert suppression to reduce false positives, and a driver trained to over-rely on automation.

Incorrect. The NTSB identified trust calibration as the root cause — a series of design decisions that eliminated the conditions for appropriate human oversight.

3. ProPublica's analysis of COMPAS (2016) found that the algorithm's core interface design problem was:

Correct. COMPAS displayed a 1–10 score that implied precision without disclosing its weak accuracy for violent recidivism, its differential error rates, or the nature of its training data.

Incorrect. The design failure was false precision: a numeric score that manufactured confidence without disclosing real accuracy limitations and demographic disparities.

4. The "explanation paradox" documented by Cai et al. (2019) in medical AI refers to the finding that:

Correct. Explanations created a false sense of auditability that suppressed independent clinical judgment, causing physicians to accept AI errors they would otherwise have caught.

Incorrect. The paradox was that explanations — intended to enable oversight — instead caused physicians to accept more AI errors by creating confidence in the system's auditability.

5. LIME (Local Interpretable Model-agnostic Explanations) was shown to have a critical limitation by Alvarez-Melis and Jaakkola (2019):

Correct. This instability means LIME explanations cannot be trusted to represent something consistent about the model's reasoning — undermining their value as transparency tools.

Incorrect. The key limitation is instability: minimally different inputs produce very different explanations, calling into question whether LIME explanations reliably represent the model's actual decision logic.

6. The skin lesion classifier studies (Narla et al., Winkler et al.) found that saliency map explanations:

Correct. The explanations were technically functioning and showed models had learned to associate surgical rulers with malignancy — but users trusted these explanations and couldn't detect the spurious correlation from them.

Incorrect. The explanations accurately reflected the model's (flawed) reasoning. The problem was users trusted them and could not detect from the explanation alone that the model had learned irrelevant features.

7. According to Bansal et al. (2021), AI explanations improve human-AI team accuracy when they:

Correct. Transparency about correct outputs adds little team performance value. What matters is whether explanations help users recognize the conditions under which the AI makes errors.

Incorrect. Bansal et al.'s key finding is that only explanations that help users catch errors improve team accuracy — explanations of correct outputs do not significantly help.

8. IBM Watson for Oncology's deployment failure (2015–2018) is best characterized as:

Correct. Watson was deployed in interfaces that presented recommendations as authoritative without disclosing the system's training data limitations, accuracy variation across cancer types, or mechanisms for appropriate physician skepticism.

Incorrect. The lesson emphasizes the interface design failure: physicians had no mechanism for appropriate skepticism — no reasoning access, no accuracy variation disclosure, no structured override.

9. Algorithm aversion, documented by Dietvorst et al., refers to:

Correct. Dietvorst et al. found users would abandon significantly superior algorithms after a single observed error — a calibration failure that trades aggregate performance for psychological comfort after witnessing failure.

Incorrect. Algorithm aversion is specifically the loss of confidence after a single observed error — a psychological response that causes users to abandon superior algorithms.

10. The aviation principle of "mode awareness" translates to AI interface design as a requirement to:

Correct. Mode confusion contributed to multiple fatal aviation accidents. Applied to AI, this means the interface must make the system's epistemic status — what it's doing, how confident it is, when it expects to fail — continuously legible.

Incorrect. Mode awareness means the interface continuously surfaces what the automation is currently doing, its confidence, and its expected failure conditions — not user control over mode selection.

11. "Structured override" mechanisms, as tested in clinical AI deployments, are valuable because they:

Correct. Structured override keeps override friction low but asks users to categorize their reason — this dual mechanism slows impulsive rejections while generating training data about error conditions.

Incorrect. Structured override makes disagreement easy but asks for a brief reason — creating a feedback loop for system improvement while slightly slowing impulsive overrides.

12. Zerilli et al.'s distinction between "local" and "global" transparency is important because:

Correct. COMPAS could accurately explain an individual's score while its differential false positive rates across racial groups remained undisclosed. These require different transparency mechanisms that cannot substitute for each other.

Incorrect. The distinction matters because local accuracy can coexist with global failure — individual explanations cannot surface population-level disparities, which require separate institutional transparency mechanisms.

13. "Model cards," proposed by Mitchell et al. (2019) at Google, are primarily transparency artifacts targeted at:

Correct. Model cards address transparency at the developer-to-deployer layer — structured documentation of demographic performance, intended use, and limitations that enables responsible deployment decisions.

Incorrect. Model cards target the developer-deployer layer: structured disclosure of performance characteristics, known limitations, and intended use contexts — not end user explanation.

14. The Palantir predictive policing deployment in New Orleans (2012–2018) represents which transparency failure mode?

Correct. You cannot exercise rights over or contest a system whose existence is unknown. The New Orleans case shows that institutional transparency — public disclosure that a system exists — must precede all other transparency mechanisms.

Incorrect. The system operated secretly for six years. Neither individual transparency nor community contestability is possible when the system's existence itself is undisclosed.

15. Amsterdam and Helsinki's AI registers are significant as transparency design because they:

Correct. The registers implement institutional transparency through public disclosure of what AI systems exist in municipal governance and what they do — enabling the democratic oversight that individual explanation interfaces cannot provide.

Incorrect. The registers are public databases disclosing the existence, purpose, data sources, vendors, and limitations of AI systems in municipal governance — a form of institutional transparency enabling community oversight.