Building with AI — Final Exam

1. What are the two documented benefits of chain-of-thought prompting for complex reasoning tasks?

Correct. CoT improves accuracy by preventing shortcut pattern-matching, and it makes reasoning auditable — you can see exactly where a multi-step chain goes wrong and fix that specific step.

Chain-of-thought provides two benefits: improved accuracy (the model works through steps rather than pattern-matching to surface answers) and auditability (you can inspect and diagnose where reasoning fails).

2. What does "progressive disclosure of confidence" mean in AI UX design?

Correct. Otter.ai's visual fading of low-quality transcript segments is a clean implementation: uncertainty signal without requiring users to understand transcription models.

Progressive disclosure of confidence means encoding uncertainty into the visual layer — different visual treatment for high vs. low confidence outputs — so users know where to apply review attention.

3. What input modality best captures temporal dynamics like rhythm and intonation?

Correct!

Incorrect. Review Lesson 1 for more information.

4. Why does Lesson 3 recommend version-controlling your prompts?

Correct. Version control enables reversion and causal attribution. If you change a prompt and performance drops, you need to know what changed and be able to undo it. Treat prompts as code: track changes, test before promoting, document why.

Version control enables you to revert when changes cause regressions and to understand what specifically caused a quality change. Prompts are load-bearing code and deserve the same engineering discipline.

5. How does multimodal redundancy improve AI system reliability?

Correct!

Incorrect. Review Lesson 3 for more information.

6. Which four components does the lesson identify as consistently present in high-performing prompts?

Correct. Role, Task, Context, and Format are the four anatomy components identified in research from Google DeepMind, Anthropic, and OpenAI.

The four components are Role (who the model should be), Task (what to do), Context (background information), and Format (shape of the output).

7. According to the documented tradeoff from Salesforce's AI ethics review process (2019), what is the correct framing of upfront ethical review costs?

Correct. Salesforce teams documented this tradeoff explicitly: upfront friction bought avoided remediation, and remediation cycles are more expensive than planning cycles.

The documented tradeoff was positive: 2–4 weeks of upfront friction eliminated expensive post-launch remediation cycles, producing net positive ROI for the ethics review process.

8. The Clearview AI regulatory actions (2022) established which important legal principle?

Correct. Multiple regulators — UK ICO, France's CNIL, Canada's OPC — found that scraping publicly posted photos to build a biometric database violated privacy law. "Publicly available" does not mean "available for any purpose."

Not correct. The Clearview AI rulings established that the public availability of images on social media does not grant permission for commercial use in biometric databases. Purpose limitation and consent requirements apply to publicly visible data.

9. What does the "Expertise Calibration" element of TRACE require you to do?

Correct. Expertise Calibration is about honest self-assessment: where you have deep expertise, knowledge-based review may suffice; where you don't, you must compensate with more external verification. Using your expertise as a substitute for verification in unfamiliar domains is a dangerous overconfidence pattern.

Expertise Calibration requires you to honestly assess your own knowledge relative to the claim's domain, then adjust external verification effort accordingly — more verification where your knowledge can't serve as a reliable check, less where it can.

10. A responsible builder's "misuse report" should document, for each significant system capability:

Correct. A misuse report systematically addresses: adversarial user profiles, attack vectors, harm magnitude, harm probability, and mitigation trade-offs. This structured analysis should be completed before the launch plan is finalized.

Not correct. A misuse report addresses the four key questions: Who are likely adversarial users? How would they misuse this capability? What harm would result (and how likely)? What mitigations are available, and what do those mitigations cost legitimate users?

11. What is the functional description of what a large language model does when generating text?

Correct. Token-by-token probabilistic prediction based on training data patterns. This functional description underpins every practical building decision: knowledge cutoffs, sensitivity to framing, hallucination tendency.

Language models predict the most plausible next token given a context, based on statistical patterns in training data. They do not retrieve, reason formally, or run symbolic systems.

12. What is the relationship between input quality and system reliability?

Correct!

Incorrect. Review Lesson 4 for more information.

13. The AI-Fit Test asks whether the output is variable. What does this criterion reveal about a task?

Correct. Output variability is the core condition: AI earns its cost when the correct answer changes based on context, tone, or incomplete input.

Incorrect. Output variability determines whether AI adds value over simpler tools — fixed outputs belong in lookup tables, not language models.

14. "Capability overhang" in the context of LLM safety refers to:

Correct. Capability overhang describes the condition where a model can produce harmful content but safety training makes it unlikely under normal prompting. The underlying capability remains — adversarial attacks exploit this gap.

Not correct. Capability overhang means the model retains the ability to generate harmful content even after safety fine-tuning — the fine-tuning shifts probability distributions, it doesn't remove the capability. Adversarial attacks like suffix attacks exploit this latent capability.

15. Goodhart's Law most directly predicts which failure pattern in AI products?

Correct. YouTube's watch-time optimization producing extreme content, and Instagram's engagement optimization harming teen wellbeing, are both documented Goodhart failures at scale.

Goodhart's Law predicts metric decoupling: AI systems optimize proxies so efficiently that the proxies decouple from the values they represented, as YouTube's watch-time failure demonstrated.

16. The Cambridge Analytica case involved data from approximately how many Facebook users?

Correct. An API loophole allowing the quiz app to harvest friends' data — not just the quiz-taker's — resulted in data from approximately 87 million Facebook users being collected without their knowledge or consent.

Not correct. The API loophole that allowed Cambridge Analytica to harvest friend data without consent resulted in data from approximately 87 million Facebook users being collected and used for political profiling.

17. In the five-stage workflow framework, which stage is responsible for preventing malformed or refused AI outputs from reaching end users?

Correct. Output processing parses, validates, filters, and handles malformed or refused outputs before delivery — the critical stage most first-time builders skip.

Incorrect. Output processing is where raw model responses are validated and sanitized. Skipping it is the most common first-project failure point.

18. What is a "token" in the context of AI API billing?

Correct. Tokens are the billing unit — roughly 0.75 words or 4 characters of English text.

Incorrect. Tokens are sub-word units — roughly 0.75 English words — used to denominate both input and output for billing purposes.

19. What is the documented performance effect of role specificity in prompts?

Correct. Role specificity compounds: each additional dimension (domain, seniority, organization, geography) narrows the output distribution further toward the patterns you actually need.

More specific roles activate narrower, more relevant patterns in training data. "Senior employment attorney at an early-stage startup" produces more targeted output than "you are an expert."

20. What is the core mechanism of Retrieval-Augmented Generation (RAG)?

Correct. RAG retrieves relevant text at query time and passes it as context — no retraining required, knowledge can be updated by modifying the document store.

RAG retrieves relevant passages from an external document store and includes them in the prompt. The model reasons over retrieved text rather than relying on training memory alone.