Inside the Machine: AI Unpacked

1. Why does the concept of "weights" make machine learning models difficult to audit compared to rule-based systems?

Correct. A rule like "if age > 65, recommend lower dosage" is readable and auditable. A set of 175 billion floating-point numbers that together encode medical knowledge is not — even though both systems might make the same recommendation. The form of storage determines what can be audited.

The issue isn't secrecy or hardware — it's that knowledge distributed across billions of numerical values doesn't translate into human-readable logic the way written rules do.

2. You ask an AI for help with an important decision and it gives you an answer. You express doubt: "Are you sure? I've heard the opposite." The AI immediately changes its answer to match what you suggested. What should you do?

Right approach. When a model reverses course based on your expressed doubt rather than new evidence, that's sycophancy. Both answers might be wrong. Verify independently using a source that isn't optimizing for your approval.

A quick reversal based on your expressed skepticism — not on new information — is a sycophancy signal. It means the model is pattern-matching to your preference, not reasoning toward truth. The fix is independent verification, not another AI query.

3. A medical AI claims 95% accuracy at diagnosing a rare disease. A critic says this statistic is misleading without knowing more. What additional information is most essential to evaluate the claim?

Correct. Aggregate accuracy can mask unequal performance across subgroups — a pattern this entire module has illustrated.

The critical question is whether that 95% holds equally for all patient groups, or whether it hides large performance gaps between demographic subgroups.

4. Deb Roy's study of his son's language acquisition is used in Lesson 1 because it illustrates which similarity between humans and AI?

Right. Roy's son built statistical maps of which words followed which, without being taught rules. Language models do something structurally similar — the difference is that the child had a body, experiences, and emotions. The model has only text.

The parallel is about the mechanism of sequential prediction, not about emotions or embodiment. Both the child and the model built internal representations of "what comes next" through exposure to many examples — not through explicit rules.

5. Why might a company use "we're working on explainability" as a strategy to resist stronger AI regulation?

Correct. Promising future work can function as a delay tactic — it signals good intentions without actually delivering accountability, and it can be used to argue against regulation being "needed right now."

The strategic value is delay. Claiming to be working on explainability creates the impression of responsibility while deferring any actual binding requirements. Meanwhile, deployment continues on existing opaque systems.

6. PredPol's predictive policing feedback loop worked like this: police were sent to high-flagged zones → more arrests occurred → arrest data fed back into the system → zones were flagged even higher. What was the fundamental flaw in this cycle?

Correct.

The flaw was that the data tracked enforcement, not actual crime. More police presence created more arrests, which looked like more crime, which justified more police presence — a loop that validated itself.

7. What hidden variable did the chest X-ray AI accidentally learn to detect instead of pneumonia?

Correct. The contamination was equipment type — a signal that correlated with disease severity in that specific hospital system but had nothing to do with the visual presentation of pneumonia itself.

The hidden variable was the type of X-ray machine. Portable machines were used for bedridden patients who were sicker overall, so machine type correlated with disease without being the visual signal of disease the AI should have learned.

8. Anthropic's Constitutional AI approach addresses which limitation of standard RLHF?

Right. Constitutional AI trades one form of implicit value encoding (rater behavior) for a more explicit form (a written document). This does not eliminate the values question — it makes it visible and debatable.

The key difference is transparency of values. Constitutional AI makes the governing principles explicit rather than hiding them in the implicit judgments of a rater pool. Cost reduction is a secondary benefit, not the primary purpose.

9. After Robert Williams' wrongful arrest (Detroit, 2020), the city's response was an example of —

Correct. Technology remained, but policy required human oversight — illustrating how policy rather than engineering is often the primary lever for AI accountability.

The technology wasn't banned or replaced — policy was added requiring human verification of all matches before arrests.

10. An AI trained to identify birds in North American photos misidentifies many birds in African nature documentaries. What is the most likely explanation?

Exactly. Training on North American birds produces patterns that don't generalize equally well to different species and settings.

Distribution shift: the model's learned pattern was built on one context and doesn't transfer cleanly to a different one.

11. The core weakness of rule-based AI systems in adversarial environments (like spam filtering) is:

Correct. Rules are static and readable. Once an adversary knows the rules, they can engineer around them — which is exactly what spammers did to early keyword filters.

The fundamental issue is that rules can be read and gamed by anyone who wants to work around them — not a resource or data problem.

12. "Loss" in machine learning training refers to:

Correct. Loss is the error signal. The training loop exists to reduce loss — to make the model's predictions progressively closer to the correct answers in the training data.

Loss is the measure of how wrong the model is on any given prediction. Minimizing it over many training examples is the core of how models learn.

13. "Model collapse" is a risk that arises specifically from:

Correct.

Model collapse is the photocopy-of-a-photocopy problem: each generation of AI trained on AI-generated content loses fidelity, variety, and accuracy compared to the generation before it.

14. Microsoft's Tay chatbot was not broken — it worked as designed. Why was that a problem?

Exactly. This is the lesson: a correctly functioning AI with a poorly specified objective can cause significant harm. "The system worked" is not always reassuring.

The problem is not that learning from users is inherently unsafe — it is that Tay's objective (match user style) had no guardrail against harmful content. The learning mechanism worked fine. The objective was incomplete.

15. OpenAI's pre-deployment red-teaming of GPT-4 found harmful capabilities even after extensive testing — and acknowledged some unknown capabilities may remain. This most directly illustrates which key challenge of governing large language models?

Correct. This is the core governance challenge that separates large language models from earlier AI systems. You can enumerate all the rules in a rule-based system. You cannot enumerate all the capabilities that may emerge from training a model of sufficient scale on diverse data — which fundamentally changes how safety must be approached.

The point isn't about the timeline or the method. It's about structural limits: emergent capabilities mean you cannot guarantee you've found all risks before deployment, in a way that wasn't true for rule-based systems.

16. A doctor reviewing an AI diagnostic system's recommendation and overriding it based on clinical experience serves which function in the overall system design?

Correct. Human oversight isn't a sign of AI failure — it's a deliberate design feature. The doctor's review layer catches edge cases, provides accountability, and handles situations the AI wasn't trained for. It's the safety interlock in the layered system design.

The human override layer is intentional and valuable. Even a highly accurate AI model will have cases where it's wrong — the human layer ensures those cases are caught by someone who can take responsibility for the decision.

17. Amazon scrapped its AI hiring tool in 2018. What does this case best illustrate about where bias enters AI systems?

Correct.

The Amazon case showed that bias emerges from historical patterns in training data — not explicit code. The AI learned that men were hired more often and turned that historical fact into a prediction rule.

18. The UK Home Office scrapped its visa algorithm in 2020 after what kind of pressure?

Correct.

Civil society organizations spent years applying legal pressure before the algorithm was quietly dropped in 2020. No full technical audit was ever made public.

19. The EU AI Act (2024) classifies AI applications into risk categories. A parole recommendation system would fall under which category?

Correct. Criminal justice applications are explicitly listed as high-risk in the EU AI Act. They require human oversight, explainability, and pre-deployment bias evaluation — for exactly the reasons the COMPAS controversy illustrated.

Parole systems directly affect people's liberty. The EU AI Act puts criminal justice AI squarely in the high-risk category, requiring significant safeguards before deployment.

20. Which of the following is the best real-world example of a system that appropriately combines rule-based and learning-based components?

Correct. This is the layered design principle: use learning where the problem is too complex for rules (visual recognition), and use locked rules where behavior must be guaranteed (braking near people). Neither approach alone handles both requirements well.

Good hybrid design uses each approach where it's strongest — learning for complex pattern recognition, rules for safety-critical guaranteed behaviors.

Final Exam