AI Code Review Fundamentals

1. Step C3 of the checklist addresses Confident Hallucination in dynamic languages by requiring:

Correct. In Python and JavaScript, a hallucinated method call on a real object is syntactically valid and only fails at runtime. A simple smoke test eliminates this class entirely.

Not quite. Step C3 is specifically about running the code before approval — because in dynamic languages, hallucinated methods are syntactically valid and only surface at execution time.

2. Atul Gawande's surgical checklist research is cited in this module to support what claim about code review?

Correct. Gawande's research shows experts under cognitive load skip known-important steps — the same dynamic makes a dedicated documentation review pass necessary for code review.

The research shows that even experts skip important steps under load. A separate documentation pass counters this by removing documentation review from the cognitive competition with correctness review.

3. In mutation testing, what does a high "kill rate" indicate?

Correct.

Kill rate measures test quality: the percentage of artificially introduced mutations that at least one test detects. High kill rate means the suite genuinely validates correctness.

4. What three sub-items should accompany every AI-delegated checklist category?

Correct. These three transform AI assistance from trust-and-forget into a managed handoff with clear accountability for what the human reviewer still owns.

The three required sub-items are: (1) which tool is approved, (2) what human verification you perform after reviewing AI output, and (3) the data classification level above which this tool should not be used.

5. What is "version-surface mismatch"?

Correct.

Version-surface mismatch occurs when AI-generated code references API signatures or constants from library versions different from those in the deployment environment — exemplified by the Pillow ANTIALIAS incident.

6. What is the primary reason hallucinated API method names pass superficial code review?

Correct.

The answer is that hallucinated names follow library naming conventions — the model learned these conventions from training data and applies them convincingly.

7. In the Cursor AI authentication incident, how many generated tests were present, and what did they fail to test?

Correct.

The Cursor incident had 14 tests, all passing, but none tested the JWT null-algorithm path — a well-known security vulnerability the model never considered testing.

8. Why must decision records be stored version-controlled alongside code rather than in a separate wiki?

Correct. Records in the repository appear in PR diffs — making them visible during review of changes and requiring a conscious update decision rather than silent obsolescence.

The key reason is drift prevention: wiki-based records become silently obsolete when code changes. Version-controlled records appear in diffs and demand attention during code review.

9. The FSE 2023 study found that AI-generated functions contained off-by-one errors in loop bounds at 2.3× the human rate. Which mutation was most effective at catching these?

Correct. Flipping < to <= in loop conditions caught 91% of the off-by-one errors in the study corpus.

The mutation that caught 91% of off-by-one errors was flipping < to <= in loop conditions — directly targeting the boundary comparison that AI models most often get wrong.

10. Bacchelli and Bird's 2013 ICSE study found reviewers defaulted to which type of comments when lacking explicit review structure?

Correct. Without structure, reviewers gravitate toward surface-level style issues. Logic and security defects — the categories developers most want peers to catch — fall through.

Without explicit structure, reviewers default to style and surface issues. This is the core finding that motivates structured checklist design: the gap between what reviewers produce and what they're most needed to catch.

11. Why do AI code generators produce logic errors more frequently than syntax errors?

Correct. LLMs predict likely continuations. Syntax correctness is strongly correlated with likelihood in training data; semantic correctness for your specific domain is not.

Not quite. The asymmetry is due to the training objective: next-token prediction strongly enforces syntactic patterns but cannot enforce semantic correctness for a domain-specific problem the model has never seen.

12. What does "missing failure documentation" mean in the context of AI-generated code?

Correct. Missing failure documentation means callers cannot determine from documentation alone how to handle the function's error states.

Missing failure documentation is the absence of documented failure modes — what exceptions are intentional, what error states exist, and what callers must do about them.

13. How should documentation checklist rigor be calibrated across different types of AI-generated code?

Correct. Risk-tiered documentation standards apply proportionally scaled rigor — preventing over-documentation of trivial utilities while ensuring comprehensive coverage of high-risk code.

Risk-tiered calibration is the documented approach: low-risk utilities need minimal documentation, high-risk compliance and safety code requires comprehensive coverage including external ADR files.

14. Which of the following is an example of a "phantom configuration key" as defined in this module?

Correct.

A phantom configuration key is a fabricated kwarg that looks like it configures something but is silently ignored because the library never defined it.

15. The 2022 Stanford Perry et al. study showed that developers using AI assistants:

Correct. Perry et al. is a landmark study: AI users had more vulnerabilities and higher confidence simultaneously.

Perry et al. found AI-assisted developers introduced more vulnerabilities and were simultaneously more confident their code was secure — the fluency illusion in a controlled study.

16. Python's PEP 484 type hints are described as "gradual" because:

✓ Correct — Correct.

Incorrect. Gradual typing means annotations are optional and unenforced at runtime. Review Lesson 1.

17. Microsoft's 2023 internal playbooks found AI-generated service integration code consistently encoded what assumption?

Correct. Binary outcome assumptions — success or clean exception — were the documented pattern, missing the common real-world case of partial or malformed responses.

The documented assumption was binary outcomes: full success or clean exception. Real-world APIs frequently return partial, malformed, or ambiguous responses that this assumption cannot handle.

18. What is consumer-driven contract testing?

Correct.

Consumer-driven contracts (as implemented by Pact.io) require each consumer to publish expectations that the provider must verify, ensuring both sides agree on the API shape at CI time.

19. A function catches all exceptions and returns 0 as a price. What category of error is this?

Correct. Returning a default value (0) when an error occurs is the canonical silent failure pattern — the caller cannot distinguish "price is legitimately 0" from "price lookup failed."

Not quite. This is a silent failure: an error condition is masked by returning a plausible-looking default value, giving the caller no signal that something went wrong.

20. Frances Allen's 1970 data flow analysis framework was later applied to security by:

✓ Correct — Correct.

Incorrect. Coverity and Fortify applied Allen's data flow methods to security analysis. Review Lesson 3.

Final Exam