1. What is a "prompt regression" and why is a golden dataset the standard defense against it?
Correct. Prompt regressions are the core failure mode in iterative prompt development. A golden dataset covering the full input distribution — including edge cases and previously failed cases — is the standard defense, as demonstrated by Stripe's 40% catch rate.
Regressions happen when you optimize for the test cases you're watching and break the ones you're not. A golden dataset representing the full distribution catches these before they reach production.
2. Why did Johann Rehberger's 2023 markdown exfiltration attack on ChatGPT plugins succeed?
Correct. This was a testing and detection failure. Adversarial documents were never part of the pre-launch red-team exercise, and no output monitoring existed to catch the model generating malicious markdown exfiltration links in response to hostile document content.
Incorrect. The vulnerability was entirely at the application layer — no authentication bypass or browser exploit was needed. Plugin developers simply never tested what happened when the model processed adversarial documents, and had no output monitoring in place.
3. Schema validation is preferable to exact-match string comparison for JSON outputs because:
Correct.
Two JSON objects with the same content but different key order or whitespace are logically identical but string-comparison-different. Schema validation checks what matters: structure and types.
4. In a multi-tenant layered prompt architecture, tenant-specific overrides are validated against:
Correct.
Incorrect. Tenant overrides must be validated against the global policy layer to preserve safety invariants.
5. A minimal production eval system requires which four components?
Correct.
The four components are: test dataset (input/expected pairs), scoring function, runner (executes prompt against dataset), and results store (enables comparison across versions).
6. What technical term describes the technique of placing a small number of input-output examples directly in a prompt, with no model weight updates?
Correct. Few-shot in-context learning — placing examples in the context window without updating weights — is the defining technique.
The technique is called few-shot in-context learning. Fine-tuning involves updating model weights; this technique does not.
7. GitHub Copilot's primary evaluation metric for code suggestions was functional correctness. What made this metric especially powerful for their use case?
Correct.
The power of functional correctness for code is its binary, externally verifiable nature — tests either pass or fail, no human needed, and the existing test infrastructure means zero extra tooling cost.
8. Which of the following is NOT one of the five architectural defense tiers discussed in Lesson 2?
Correct. The five tiers are: input sanitization, delimiter isolation, least-privilege prompting, output/action gating, and two-LLM pipeline separation. Fine-tuning on injection-resistant data is a research direction but was not among the five architectural tiers — and would not substitute for them even if effective.
Incorrect. That defense tier was discussed in Lesson 2. Model weight fine-tuning on injection-resistant data was not among the five tiers — the tiers are all architectural and prompt-design controls that work independently of model fine-tuning.
9. BLEU measures n-gram precision. What is the fundamental quality dimension it cannot capture?
Correct.
BLEU requires surface n-gram overlap. A paraphrase that conveys identical meaning with different words receives a low BLEU score, making it unreliable for tasks where paraphrase is common.
10. Shadow testing differs from canary deployment in that:
Correct.
Incorrect. Shadow testing runs both prompts but protects users by only returning the old prompt's output.
11. Zero-shot CoT was introduced in which paper?
Correct. Kojima et al.'s paper showed that "Let's think step by step" alone could trigger effective reasoning without any worked examples.
Zero-shot CoT came from Kojima et al. 2022, "Large Language Models are Zero-Shot Reasoners."
12. Delimiter isolation (using XML tags or triple backticks to wrap user input) is best described as:
Correct. Delimiter isolation forces injected text into the data section and raises attack complexity — but sufficiently creative attackers can sometimes escape delimiter context. It is a necessary baseline, not a complete solution.
Incorrect. Delimiters are a useful baseline but not a complete solution. Attackers can craft injections that escape delimiter context. Multiple independent defenses are required for robust protection.
13. Riley Goodside's 2023 LangChain agent attack demonstrated which specific vulnerability in early agentic frameworks?
Correct. The agent browsed a legitimate page that linked to an attacker-controlled page with hidden instructions. The agent read those instructions and acted on them — including attacker URLs in its response — with no confirmation step. Early agentic frameworks had powerful capabilities but essentially no injection defenses.
Incorrect. The attack was purely prompt-level: the agent browsed a page containing hidden adversarial instructions and executed them without any validation. The lesson is that tool capabilities must be accompanied by commensurate defensive architecture.
14. Which of the following is the most critical quality dimension to audit when few-shot output is inconsistent, before adding more examples?
Correct. Format inconsistency is the most common cause of inconsistent few-shot outputs and should be audited before any other intervention.
Format consistency is the primary thing to audit. Mismatched delimiters, casing, or field order across examples is the most common source of inconsistent outputs.
15. Why do researchers recommend placing the Identity zone first in a system prompt?
Correct. Anthropic's 2024 model card documentation and independent research confirm early-context instructions receive stronger attention weighting, making the Identity zone's first position mechanistically significant.
This has a real technical basis: transformer attention is not uniform across context, and early-context instructions receive stronger weighting — documented in Anthropic's 2024 model card and interpretability research.
16. Why does RLHF training cause LLMs to add explanatory prose around structured output even when instructed not to?
Correct. RLHF trains on human preferences — raters prefer contextual explanations, so models learn to include them.
RLHF rewards what human raters find helpful. Raters prefer context, so models learn to add it even when developers do not want it.
17. Which attack technique involves embedding injection instructions across multiple conversation turns rather than in a single message?
Correct. Many-shot jailbreaking (identified in Anthropic's 2024 research) uses a sequence of turns — each individually borderline or benign — to gradually prime the model's context until a later injection instruction succeeds.
Incorrect. Many-shot jailbreaking is the technique that spreads injection instructions across multiple turns to gradually prime the model's context, bypassing single-turn detection.
18. Why does BERTScore handle paraphrase better than BLEU?
Correct.
BERTScore uses contextual embeddings from BERT, where semantically similar tokens have high cosine similarity even with different surface forms. This captures paraphrase that n-gram overlap misses entirely.
19. What is the recommended API temperature setting for self-consistency sampling?
Correct. Temperature > 0 is essential for self-consistency — at temperature 0, all N chains would be identical, making voting meaningless.
Self-consistency requires temperature > 0 (typically 0.5–0.8) to generate diverse reasoning paths. At temperature 0, all chains would be identical.
20. Why should few-shot examples include a case where an optional field is null?
Correct. Models generalize from the distribution of examples — if no example shows null, the model learns to avoid null outputs.
Models learn what is "normal" from examples. Without null cases, they resist producing null even when the input calls for it.