Making AI Explainable

1. What did the University of Washington's SHAP audit of a hospital sepsis prediction model reveal about its primary feature?

Correct. The model had learned that sicker patients generate more nursing documentation — a true correlation but a causally meaningless one, making the model's logic clinically unsound.

SHAP revealed that note count was the top feature — the model had picked up that sicker patients generate more documentation, using this as a proxy for illness severity.

2. The academic debate between Jain & Wallace (2019) and Wiegreffe & Pinter (2019) concluded that:

Correct. The debate is genuinely unresolved. Practical consensus: don't rely on raw attention weights alone for high-stakes explanations. Use gradient-based or SHAP-based methods as primary attribution; treat attention as supplementary diagnostic information.

The academic debate ended without resolution. Jain & Wallace showed attention doesn't reliably correlate with feature importance; Wiegreffe & Pinter challenged the criteria. Practical takeaway: attention alone is insufficient for high-stakes XAI; use gradient or SHAP methods as primary attribution.

3. The three conditions for AI trust recovery identified from organisational trust repair research are:

Correct. Kim, Dirks, and Cooper's research on trust repair, applied to AI systems, identifies acknowledgment, structural change, and accountability as the three essential recovery conditions.

Incorrect. The three conditions are: explicit acknowledgment of the failure and its impacts, structural change in processes (observable by affected parties), and visible accountability for what occurred.

4. TCAV (Testing with Concept Activation Vectors) addresses what gap in standard feature attribution methods like SHAP?

Correct. A radiologist cannot act on "pixel 312 contributed 0.004 to the prediction." TCAV bridges this gap by letting experts define meaningful concepts — "irregular borders," "mass effect" — and then testing whether the model uses those concepts, producing explanations in clinical vocabulary.

TCAV's distinguishing contribution is its explanation vocabulary — human-defined concepts rather than raw features. It does require internal access (to activation vectors) and is not faster than SHAP by design. SHAP also works on images and text.

5. TreeSHAP achieves exact Shapley values in polynomial time by:

Correct. TreeSHAP (Lundberg et al., 2020) recursively tracks how each feature's decision nodes split the prediction from the root expectation, avoiding exponential coalition enumeration.

TreeSHAP exploits tree structure — recursively computing marginal contributions at each node — to achieve exact values in O(TLD²) time rather than exponential coalition sampling.

6. Alvarez-Melis and Jaakkola (2019) recommended practitioners should address LIME's instability by:

Correct. Their paper proposed stability metrics and recommended running LIME multiple times; if feature rankings vary substantially across runs, the explanation is unreliable and should not be used for high-stakes decisions without further investigation.

Alvarez-Melis and Jaakkola proposed stability metrics: run LIME multiple times on the same instance and measure agreement across runs. If explanations vary substantially, don't trust a single run. A fixed seed would mask the instability rather than quantify it.

7. The UK's Algorithmic Transparency Recording Standard (ATRS) differs from previous transparency approaches by:

Correct. ATRS inverts the traditional burden: institutions must proactively disclose AI use rather than waiting for individuals to discover and challenge it.

Incorrect. ATRS requires proactive disclosure as a default obligation — government bodies publish AI tool information as standard practice, not in response to individual challenges.

8. Proxy discrimination occurs when:

Correct. Proxy discrimination is particularly dangerous because it produces discriminatory outcomes without technically encoding protected characteristics — making it harder to detect and challenge.

Incorrect. Proxy discrimination occurs through correlated variables — using postcode as a proxy for race, or healthcare cost as a proxy for health need — producing discriminatory outcomes without directly encoding the protected characteristic.

9. GDPR Article 22's right against automated decision-making is triggered only when decisions are made in what manner?

Correct. The "solely automated" threshold is a critical limitation — human review, even superficial, can remove a decision from Article 22's scope.

The trigger is "solely automated" decisions with significant effects. Human review — even nominal — can break this threshold.

10. LIME for images uses superpixels rather than individual pixels as the unit of perturbation because:

Correct. Superpixels group contiguous pixels into semantically coherent regions (a bird's wing, a background tree). Masking an entire superpixel produces interpretable ablations; toggling individual pixels would create imperceptible noise without meaningful explanations.

LIME uses superpixels because they are semantically meaningful units. A superpixel might represent "the frog's eye" or "the water background." Masking it on/off produces an interpretable test. Individual pixel toggling creates imperceptible noise, not a meaningful ablation.

11. Generalized Additive Models (GAMs) are interpretable because:

Correct. The additivity constraint means each feature's contribution to the prediction is independent of all others — you can plot the effect of age on predicted outcome without worrying about how it interacts with income. This separability is the source of interpretability.

GAMs allow non-linear per-feature effects (unlike linear regression) but require those effects to be additive (no interactions). This separability — not linearity or parameter count — is what makes them interpretable.

12. Which three axioms does SHAP satisfy that most other attribution methods do not?

Correct. These three Shapley axioms are what distinguish SHAP as theoretically principled — they guarantee consistent and fair attribution across features.

The three Shapley axioms are efficiency, symmetry, and dummy — they guarantee that attributions are consistent, fair, and sum to the full prediction value.

13. What legal protection did Northpointe use to prevent defendants from accessing COMPAS's algorithm?

Correct. Northpointe invoked trade-secret law throughout the litigation, and courts generally upheld this protection.

Northpointe relied on trade-secret law, arguing that commercial algorithms are proprietary business information.

14. ProPublica's 2016 COMPAS investigation found that Black defendants were flagged as high risk for violent recidivism at what false-positive rate compared to white defendants?

Correct. ProPublica's analysis of 7,000 Florida defendants found this stark disparity in violent recidivism false-positive rates.

Incorrect. ProPublica found 77.4% false-positive rate for Black defendants vs. 41.4% for white defendants in violent recidivism predictions — nearly double.

15. ProPublica's 2016 investigation found COMPAS was how much more likely to falsely flag Black defendants as high-risk compared to white defendants?

Correct. The 2× false-positive rate disparity is the core empirical finding of the ProPublica investigation.

ProPublica found the false-positive rate for Black defendants was approximately twice that for white defendants.

16. The "illusion of explanatory depth" in AI contexts refers to:

Correct. This phenomenon — where confidence in understanding exceeds actual understanding — was demonstrated in the 2020 CHI study on loan-rejection feature importance displays.

Incorrect. The illusion of explanatory depth is about users' subjective confidence in understanding exceeding their actual ability to understand or predict model behaviour after seeing an explanation.

17. Distributed representations in neural networks make interpretation difficult because:

Correct. Distributed representation is the architectural property that makes neural networks powerful and illegible simultaneously. Concepts are not localized; they emerge from the interaction of all parameters, which is why reading individual weights yields no interpretable information.

Distributed representation is an architectural property of how information is encoded — not infrastructure, not training noise, not cross-layer communication. The key fact is that knowledge is spread non-locally, making any particular location uninformative in isolation.

18. Adebayo et al.'s "Sanity Checks for Saliency Maps" (2018) tested whether saliency methods:

Correct. The sanity check is a randomisation test: if you randomise model weights, a faithful saliency method should produce clearly different maps. Vanilla gradients and guided backpropagation failed — their maps look similar regardless of whether weights are trained or random.

The sanity check randomises model weights and checks whether the saliency map changes. If a method produces similar-looking maps for trained and randomly-weighted models, its maps reflect input structure (e.g., edges), not what the model learned. Vanilla gradients and guided backprop failed this test.

19. Which explanation format does Miller's 2019 research suggest is most cognitively natural for humans, based on social science literature?

Correct. Miller found that human explanation is naturally contrastive — people ask "why this rather than that?" not "what are all the causes of this?" This supports counterfactual explanation formats.

Incorrect. Miller's synthesis found contrastive explanations most natural — humans explain by contrast, asking "why this rather than that?", not by enumerating causal chains or feature rankings.

20. Adversarial examples — images perturbed imperceptibly to fool classifiers — demonstrate:

Correct. The fact that noise invisible to humans causes catastrophically wrong confident predictions reveals that the network's internal concept space does not map onto human concept space — which matters enormously for explainability and trust calibration.

Adversarial examples are primarily a window into the nature of learned representations — they show that the network's understanding of categories differs structurally from human understanding, which is the core XAI implication of Szegedy et al.'s 2013 findings.

Final Exam