Human-AI Interaction Design

1. Google's AI Overviews' controversial "eat rocks" and "glue on pizza" outputs in 2024 most directly illustrate which design failure?

Correct. AI Overviews violated minimalism by substituting lengthy synthesis for simple answers, and failed to signal uncertainty — causing users to trust fluent, confidently-presented but incorrect outputs.

The failure combined violated minimalism (using AI synthesis where direct answers sufficed) with absent uncertainty signals — making users trust fluent but incorrect outputs as authoritative.

2. Algorithm aversion, documented by Dietvorst et al., refers to:

Correct. Dietvorst et al. found users would abandon significantly superior algorithms after a single observed error — a calibration failure that trades aggregate performance for psychological comfort after witnessing failure.

Incorrect. Algorithm aversion is specifically the loss of confidence after a single observed error — a psychological response that causes users to abandon superior algorithms.

3. The Stanford Copilot security study hypothesized that AI-generated code contained more vulnerabilities because of:

Correct. Cognitive distance: code you didn't write is evaluated less deeply than code you constructed, leading to shallower security review of AI outputs.

Incorrect. The proposed mechanism was cognitive distance — externally-generated content receives shallower scrutiny than self-generated content, causing security logic flaws to pass undetected.

4. In Wang et al.'s (2019) AI failure taxonomy, a "context-caused failure" refers to which situation?

Correct. Context-caused failures are the scope drift problem: the model works as designed within its training distribution, but deployment exposed it to inputs outside that distribution. The fix is often scope boundary design rather than model retraining.

Context-caused failure in Wang et al.'s taxonomy means the model performed correctly within its design scope but was deployed in a context outside that scope. This is a deployment design problem. Review L4 section 4.3.

5. The NTSB's 2019 investigation of the Uber autonomous vehicle fatality identified the primary causal factor as:

Correct. The NTSB identified systemic trust design failures: disabled safety systems, alert suppression to reduce false positives, and a driver trained to over-rely on automation.

Incorrect. The NTSB identified trust calibration as the root cause — a series of design decisions that eliminated the conditions for appropriate human oversight.

6. The skin lesion classifier studies (Narla et al., Winkler et al.) found that saliency map explanations:

Correct. The explanations were technically functioning and showed models had learned to associate surgical rulers with malignancy — but users trusted these explanations and couldn't detect the spurious correlation from them.

Incorrect. The explanations accurately reflected the model's (flawed) reasoning. The problem was users trusted them and could not detect from the explanation alone that the model had learned irrelevant features.

7. Which design intervention involves requiring users to document independent reasoning before acting on an AI output?

Correct. The Epic sepsis challenge protocol is the canonical example of a forced verification moment — clinicians must document independent clinical support before acting on the AI's alert.

Incorrect. This is the definition of a forced verification moment — the intervention used in Epic sepsis deployments to prevent algorithmic scores from bypassing clinical judgment.

8. The distinction between "productive" and "corrosive" cognitive offloading is based on:

Correct. Productive offloading: non-core tasks, freeing expertise. Corrosive offloading: core expertise tasks, causing atrophy of the skills that make the human valuable in the collaboration.

Incorrect. The distinction is whether the task is core expertise. Offloading core expertise causes it to atrophy; offloading peripheral tasks frees cognitive resources for higher-order work.

9. The Amazon Alexa recording incident (2018) arose primarily from:

Correct. The incident was a mental model mismatch — users believed the system only processed audio after the wake word, but it continuously processed to detect the wake word.

The incident was a mental model mismatch — users' understanding of when Alexa listened differed from the system's actual continuous audio processing behavior.

10. LIME (Local Interpretable Model-agnostic Explanations) was shown to have a critical limitation by Alvarez-Melis and Jaakkola (2019):

Correct. This instability means LIME explanations cannot be trusted to represent something consistent about the model's reasoning — undermining their value as transparency tools.

Incorrect. The key limitation is instability: minimally different inputs produce very different explanations, calling into question whether LIME explanations reliably represent the model's actual decision logic.

11. A contrastive explanation answers which question, and why is this structure preferable?

Correct. Contrastive framing maps onto the decision-making question users actually face: "Should I go with this AI output or a different one?" It makes explanations immediately actionable rather than informational.

Contrastive explanations answer "why X rather than Y" — the question embedded in every real decision about whether to act on an AI output. This alignment with actual decision structure is what makes them more useful. Review L3 section 3.2.

12. The February 2024 Air Canada chatbot tribunal ruling established which principle most relevant to AI interaction designers?

Correct. The tribunal ruled that Air Canada was responsible for its AI's incorrect policy descriptions — establishing that absent feedback and escalation architecture carries legal consequence.

The ruling established that companies are legally responsible for AI representations, elevating feedback loop and escalation path design from UX best practice to legal requirement.

13. GitHub Copilot's design of presenting code as an explicit suggestion requiring user acceptance satisfies which heuristic?

Correct. Requiring explicit acceptance preserves developer control and freedom to reject AI-generated code — heuristic #3.

The correct answer is User Control and Freedom. Requiring explicit acceptance keeps the human in control rather than automating code insertion.

14. In Kasparov's Advanced Chess experiments, what was the primary factor that determined the strongest competitive performance?

Correct. Kasparov's key finding: process quality dominated both human skill level and hardware power.

Incorrect. Process quality — how well the human used the tool — was the determining factor, not skill level or hardware.

15. Which of Nielsen's original ten heuristics most directly addresses the need for AI systems to communicate their current processing state and uncertainty level?

Correct. Visibility of System Status requires that users always know what the system is doing — for AI, this extends to uncertainty and confidence levels.

The correct answer is Visibility of System Status — the heuristic requiring users to always know the system's current state, including AI uncertainty.

16. What term describes a common understanding of task requirements, roles, and environment among team members that enables coordination without constant explicit communication?

Correct. Shared mental model (SMM), formalized by Cannon-Bowers et al. (1993): the common understanding that allows teams to coordinate with minimal explicit communication.

Incorrect. This is the definition of a shared mental model — the foundational concept from Cannon-Bowers et al. (1993), well-established in aviation and surgery research.

17. The design goal for trust in AI interfaces is best described as:

Correct. Trust calibration — matching user confidence to actual system reliability — is the goal. Both overtrust and undertrust are calibration failures. An interface that maximizes trust without improving reliability has made the product more dangerous.

The goal is calibration accuracy, not a specific trust level. Maximizing or minimizing trust both miss the target. Review L2 section 2.1.

18. According to Bansal et al. (2021), AI explanations improve human-AI team accuracy when they:

Correct. Transparency about correct outputs adds little team performance value. What matters is whether explanations help users recognize the conditions under which the AI makes errors.

Incorrect. Bansal et al.'s key finding is that only explanations that help users catch errors improve team accuracy — explanations of correct outputs do not significantly help.

19. The Air France 447 accident is classified as primarily a failure of which SMM component?

Correct. The autopilot disengaged without conveying aircraft state, leaving all three crew members with different situational models in the critical minutes before impact.

Incorrect. AF447 is a situational awareness failure — the transition from automated to manual flight without adequate state communication left the crew with incompatible world-models.

20. Hancock et al.'s (2011) meta-analysis of human-automation trust studies identified three factor clusters that determine trust. Which set correctly names them?

Correct. The three clusters — performance, process, and purpose — are important because all three are addressable through UX design, even when the underlying model's performance cannot be changed.

Hancock et al.'s three clusters are performance (actual reliability), process (predictability and appropriateness of behavior), and purpose (whether the system seems oriented toward user benefit). Review L2 section 2.1.

Final Exam