L1
·
Quiz
·
Lab
L2
·
Quiz
·
Lab
L3
·
Quiz
·
Lab
L4
·
Quiz
·
Lab
Module Test
Module 6 · Lesson 1

Why AI-Generated Code Lacks Documentation Instinct

Language models predict tokens — they do not document intentions, trade-offs, or the reasons a design was chosen over its alternatives.
What does a function's comment tell you that its code cannot?

An Amazon internal post-mortem on CodeWhisperer-assisted development noted a recurring problem: generated functions were syntactically correct and passed unit tests, yet reviewers consistently flagged them for missing intent documentation — no explanation of why a particular algorithm was chosen, what edge cases the author had consciously accepted, or what the function was not designed to handle. The code said what; nothing said why.

The Structural Gap Between Code and Documentation

Documentation in professional codebases serves at least three distinct purposes: it records intent (what the author meant to accomplish), constraints (what the code explicitly does not handle), and rationale (why this approach rather than alternatives). A language model trained to predict syntactically plausible completions has no mechanism for generating authentic content in any of these categories.

When an LLM writes a docstring, it is producing a statistically likely description of what a function with that name and signature would probably do — not a record of deliberate design choices. This distinction becomes critical during incident response: engineers reading generated code have no documented trail of the original assumptions.

Critical Distinction

A model can describe what code does by reading it. It cannot describe why the code was written this way rather than another — because that reason lives in the human's mind, not the token stream.

The 2021 GitHub Copilot Study: Documentation Completeness

A 2021 analysis by researchers at NYU studying early Copilot output found that generated code produced docstrings at a rate comparable to human-written code — but those docstrings were almost exclusively descriptive (restating what parameters were) rather than normative (explaining constraints, failure modes, or intent). Human-written docstrings in the same dataset included constraint language ("does not handle null input"), assumption language ("assumes sorted input"), and rationale ("uses insertion sort for n<10 for cache efficiency") at rates ten to twenty times higher.

Three Documentation Failures in AI-Generated Code

Reviewers should watch for three specific absences in AI-generated documentation:

Missing constraints The code handles the happy path but no docstring or comment states what inputs are invalid, what size limits apply, or what concurrent access behavior is assumed.
Missing rationale An algorithm was selected (e.g., quicksort over mergesort) with no comment explaining the trade-off. Future engineers cannot judge whether the choice is still appropriate after requirements change.
Missing failure documentation No comment identifies what exceptions are thrown intentionally, what error states are expected, or what the caller must do when the function returns an error code.
Reviewer Practice

For every function in an AI-assisted pull request, ask: if the original developer were unavailable, could an engineer reading only this code understand what it deliberately excludes? If not, documentation is incomplete regardless of how syntactically correct the code is.

Why This Matters for Code Review

Code review is not merely defect detection — it is the primary checkpoint for ensuring that institutional knowledge is encoded into the codebase. When AI generates the code, no institutional knowledge was ever present to begin with. The reviewer must supply or elicit it. This means the reviewer's job expands: not only checking correctness but reconstructing and documenting intent before the PR merges.

Teams at Stripe and Shopify that publicly discussed their AI code review policies in 2023 each independently converged on the same rule: AI-assisted PRs require a mandatory documentation section in the PR description that a human author explicitly writes — not generated by the model.

Lesson 1 Quiz

Why AI-Generated Code Lacks Documentation Instinct
1. According to the NYU Copilot study, how did AI-generated docstrings differ most from human-written ones?
Correct. Human docstrings included constraint language, assumption statements, and rationale at rates far higher than AI-generated ones.
Not quite. The key finding was that AI docstrings described parameters but omitted normative content: constraints, assumptions, and rationale.
2. Why can't a language model produce authentic rationale documentation?
Correct. The model has no design intent to document — it generated statistically plausible code, not the output of a deliberate trade-off analysis.
The deeper reason is that rationale requires an actual decision-maker with alternatives considered. A generative model has no such decision process.
3. Which of the following best describes "missing constraints" in AI-generated code?
Correct. Missing constraints means the code handles valid inputs but no documentation records what the code explicitly does not handle.
Missing constraints specifically refers to the absence of documentation about invalid inputs, size limits, and concurrency assumptions — not syntax or naming issues.
4. What did Stripe and Shopify require for AI-assisted pull requests in their 2023 policies?
Correct. Both companies required human-authored documentation in the PR description to capture intent that the model could not supply.
The key rule was that humans must write the documentation — requiring a human-authored PR description section rather than relying on AI to document its own output.
5. A reviewer's job in AI-assisted PRs expands beyond defect detection to include what additional responsibility?
Correct. Because the AI had no institutional intent to encode, the reviewer must reconstruct what the intent should be and ensure it is documented.
The expanded responsibility is specifically about documentation of intent — ensuring the codebase records what the code is designed to do and not do.

Lab 1 — Documentation Gap Analysis

Practice identifying missing intent, constraints, and rationale in AI-generated code snippets.

Your Objective

You'll be given short AI-generated code snippets. Your job is to identify what documentation is missing and explain what a human reviewer would need to add before the PR could be approved.

The AI assistant will give you a code snippet, you analyze the documentation gaps, then it will respond with feedback and a new snippet. Complete at least 3 exchanges.

Start by typing: "Give me my first code snippet to review for documentation gaps."
Documentation Gap Lab
AI Assistant
Welcome to Lab 1. I'll give you AI-generated code snippets and you'll identify what documentation is missing — specifically missing intent, constraints, and rationale. Type the prompt below to get your first snippet.
Module 6 · Lesson 2

Assumption Auditing: What the Model Silently Believed

Every AI-generated function encodes assumptions about its environment. None of them are written down.
If a function works perfectly in the model's training distribution but fails in yours, whose fault is that?

Air Canada's AI chatbot told a customer that a bereavement fare discount could be applied retroactively after purchase. The chatbot's response was generated from policy documents — but the model inferred a policy behavior that did not exist, encoding an assumption about how the refund system worked. The assumption was never documented. Air Canada's customer service team had no record that the bot was operating on this inference. A British Columbia tribunal ruled Air Canada liable for the refund.

The deeper engineering lesson: the system's assumed behavior was nowhere written down — not in the bot's configuration, not in any system document, not in any code comment. Nobody had audited what the model silently believed about the refund workflow.

What Is an Assumption Audit?

An assumption audit is the practice of systematically identifying the implicit beliefs encoded in AI-generated code — beliefs about data formats, environmental conditions, system states, user behavior, and external service behavior that the code relies on but does not verify or document.

Unlike a traditional code review that checks correctness against stated requirements, an assumption audit asks: what would this code need to be true about the world in order to work correctly? The answers are almost never written in the code itself.

Definition

An assumption audit catalogs every implicit precondition in a piece of code — data shape, range, encoding, concurrency model, API contract, user privilege level — and verifies that each assumption is either validated at runtime or explicitly documented as a known constraint.

The Toyota Unintended Acceleration Case and Implicit Assumptions

The 2014 NASA/NHTSA analysis of Toyota's Electronic Throttle Control System source code — which involved 400,000 lines of embedded C — identified over 7,000 violations of MISRA C coding standards. More critically, the expert witnesses testified that the code contained numerous implicit assumptions about task scheduling timing that were never documented. When real-world timing violated those undocumented assumptions, the system could enter unintended states. This case, though predating LLMs, established the legal and engineering precedent: undocumented assumptions in safety-relevant code constitute a defect, regardless of whether the code is otherwise syntactically correct.

Categories of Implicit Assumptions in AI-Generated Code

A structured audit should examine five categories:

Data Shape Assumptions

The code expects a specific JSON structure, column order, or array length. The model generated code matching its training data patterns — not your actual schema.

Range and Encoding Assumptions

Numeric ranges (age must be positive), string encodings (assumes UTF-8), date formats (assumes ISO 8601) — none verified, none documented.

Concurrency Assumptions

The code assumes single-threaded execution, or that a shared resource is accessed sequentially, without documenting or enforcing that constraint.

External Service Assumptions

The code assumes an API will always return a specific field, or that a service will respond within a timeout window, without defensive handling or documentation.

Environmental assumptions The code assumes a specific OS, filesystem layout, environment variable presence, or library version — assumptions that may differ between development and production.
Running an Assumption Audit in Practice

A practical assumption audit proceeds in three passes. In the first pass, read the function signature and body and list every variable or parameter that the code does not validate before using. In the second pass, trace every external call and ask what the code assumes about the response. In the third pass, read any error handling — the absence of error handling often reveals the most consequential assumptions: that the operation will always succeed.

Microsoft's responsible AI team documented this three-pass method in their internal engineering playbooks in 2023 after finding that AI-generated service integration code consistently omitted handling for partial failures — encoding the assumption that external calls either fully succeed or throw an exception, ignoring the common case of partial or malformed responses.

Rule of Thumb

Every place in AI-generated code where there is no validation, no guard clause, and no error branch — there is an assumption. The reviewer's job is to name it, decide if it's acceptable, and document it either as a constraint or as a guard that needs to be added.

Lesson 2 Quiz

Assumption Auditing: What the Model Silently Believed
1. What is the central question an assumption audit asks about AI-generated code?
Correct. An assumption audit surfaces implicit preconditions — the beliefs about data, environment, and external systems that the code relies on without documenting.
An assumption audit specifically asks: what does this code silently require to be true? It goes beyond requirements matching to surface undocumented preconditions.
2. What lesson did the Air Canada chatbot case (January 2024) demonstrate for AI system engineering?
Correct. The bot encoded an inference about refund policy that was never written down anywhere — making it invisible to auditors and support teams until a tribunal ruling.
The engineering lesson was specifically about undocumented assumptions: the inferred behavior existed nowhere in the system's documentation, making it unauditable.
3. According to Microsoft's internal playbooks (2023), what did AI-generated service integration code consistently omit?
Correct. Partial or malformed responses are a common real-world case that AI-generated code systematically ignored — an implicit assumption of binary outcomes.
The documented pattern was missing partial failure handling — the assumption that every call is either a full success or a clean exception, ignoring messy real-world partial responses.
4. In the three-pass assumption audit, what does the third pass (reading error handling) most often reveal?
Correct. The absence of error handling is itself a documented assumption: "this will always succeed." That's often the most dangerous implicit belief in the code.
Reading error handling — or its absence — reveals assumptions of success. Where there is no error branch, the code assumes the operation cannot fail.
5. The Toyota ETCS case established what principle for code review?
Correct. The expert testimony established that implicit, undocumented timing assumptions were defects — even though the code compiled and ran correctly under nominal conditions.
The Toyota case established the legal and engineering precedent: undocumented assumptions are defects, not just style issues, regardless of syntactic correctness.

Lab 2 — Running an Assumption Audit

Practice the three-pass assumption audit on real-pattern AI-generated service integration code.

Your Objective

You'll apply the three-pass audit method to AI-generated code snippets. For each snippet, identify data shape assumptions, range/encoding assumptions, concurrency assumptions, and external service assumptions.

The assistant will provide code, you audit it using the three-pass method, and it will evaluate your audit and provide the next snippet. Complete at least 3 exchanges.

Start by typing: "Give me an AI-generated code snippet to run a three-pass assumption audit on."
Assumption Audit Lab
AI Assistant
Welcome to Lab 2. I'll give you AI-generated code snippets that contain implicit assumptions — your job is to apply the three-pass audit and catalog every assumption you find. Type the prompt below to begin.
Module 6 · Lesson 3

Documenting Decisions You Didn't Make

When AI wrote the code, the reviewer must retroactively construct the decision record — or reject the PR until one exists.
How do you document a design decision that was made by a statistical process with no intent?

Google DeepMind's 2023 internal engineering guidance for AI-assisted code specifically introduced the concept of retroactive design documentation — a requirement that any PR where more than 50% of the code was AI-generated must include a "Decision Record" attachment. The record had to answer three questions: why this approach rather than alternatives, what constraints were accepted, and what a future engineer would need to know to safely modify this code. The model could not fill in this document; the human author had to write it.

The Architecture Decision Record (ADR) Pattern

Architecture Decision Records — popularized by Michael Nygard's 2011 blog post and widely adopted at Thoughtworks — are short documents that record a significant architectural decision: the context, the decision, the consequences, and the alternatives considered. They were designed to preserve institutional memory when human engineers make complex choices.

The challenge with AI-generated code is that ADRs assume a human made a decision. When the model generated the code, there was no decision — there was a generation. The reviewer's task is to reverse-engineer the decision that would justify the code, assess whether that decision is defensible, and write it down. This is fundamentally different from documenting a decision you made.

Key Insight

Reviewing AI code without writing a retroactive decision record leaves a gap in institutional memory that cannot be reconstructed later. Future engineers will not know whether an implementation detail was intentional, incidental, or a model artifact.

The Volkswagen Emissions Case and Undocumented Intentionality

During the 2015–2016 Volkswagen emissions scandal investigation, regulatory engineers discovered that the defeat device code contained no documentation distinguishing intentional behavior (detecting test cycles) from normal operation. Investigators had to reconstruct intent forensically — examining branch conditions, timing logic, and sensor thresholds to infer what the code was designed to do. The absence of documentation did not make the behavior legal; it made reconstruction expensive and left Volkswagen unable to credibly argue any alternative interpretation.

While the VW case involved intentional fraud, the documentation lesson applies to AI-generated code: undocumented behavior will be reconstructed by others under adversarial conditions. Better to document intent precisely when writing than to leave it to forensic inference.

How to Write a Retroactive Decision Record

A retroactive decision record for AI-generated code should answer four questions explicitly:

  • What does this code do that requires explanation? Identify the non-obvious implementation choices — algorithm selection, data structure choice, concurrency model, error handling strategy.
  • What alternatives could have been used? Name at least one other approach that the model might have generated instead and explain why the chosen approach is preferable for your specific context.
  • What constraints does this code rely on? List the assumptions catalogued in the assumption audit (from L2) that are being accepted rather than eliminated.
  • What would a future engineer need to know to safely modify this? This is the most practically valuable section — it preserves the context that prevents the next change from breaking an undocumented invariant.
When the Decision Cannot Be Justified

Sometimes the retroactive decision record process reveals that the AI's implementation choice cannot be defended for the target context. The model used a linked list where the use case demands O(1) access. The model chose a recursive implementation for a stack that could overflow. The model hardcoded a timeout that is wrong for the production SLA.

In these cases, the correct response is not to write a document defending the indefensible — it is to reject the AI-generated implementation and require a human-authored replacement. The retroactive decision record process is diagnostic: it will surface implementations that were statistically plausible but contextually wrong.

Practical Rule

If you cannot write a defensible retroactive decision record for an AI-generated implementation, do not merge it. The inability to justify the decision is evidence that the implementation is wrong for your context, not merely under-documented.

Inline vs. External Documentation

Decision records can live inline (as extended block comments above the function) or externally (as ADR files in a docs/ directory). The choice depends on team convention. What matters is that the record is version-controlled alongside the code — so that when the code changes, the record must be updated. A decision record in a separate wiki page will drift and become misleading; one in the repository will at minimum be visible during code review of future changes.

Lesson 3 Quiz

Documenting Decisions You Didn't Make
1. What did Google DeepMind's 2023 engineering guidance require for PRs where >50% of code was AI-generated?
Correct. The Decision Record had to be human-authored and answer three specific questions the model could not answer.
DeepMind's requirement was a human-authored Decision Record — the model could not write it because the questions required genuine contextual judgment.
2. Why is writing a retroactive decision record for AI-generated code fundamentally different from writing one for human-authored code?
Correct. There was no decision to document — only a statistical generation. The reviewer must construct the decision retroactively and verify it is defensible.
The fundamental difference is that no decision was made — the model generated code without intent. The reviewer must reverse-engineer what decision would justify the implementation.
3. What documentation lesson does the Volkswagen emissions case provide for AI code review?
Correct. Investigators had to reconstruct intent from code alone — an expensive, adversarial process. The lesson: document intent precisely when you can.
The documentation lesson is that undocumented behavior will eventually be reconstructed — and that reconstruction happens under adversarial conditions when you can no longer control the narrative.
4. When the retroactive decision record process reveals that an AI implementation choice cannot be justified, what is the correct response?
Correct. The inability to justify the decision is evidence the implementation is contextually wrong — not merely under-documented. Rejection, not documentation, is the right response.
If you cannot defend the implementation, no amount of documentation fixes it. The retroactive decision record process is diagnostic — it reveals when code must be rejected.
5. Why should decision records be stored version-controlled alongside the code rather than in a separate wiki?
Correct. Co-location with the code means the record appears in the diff when code changes — making it visible to reviewers and requiring a conscious update decision.
The key reason is drift prevention: a wiki page drifts silently when code changes, but a version-controlled record appears in the PR diff and demands attention.

Lab 3 — Writing Retroactive Decision Records

Practice constructing defensible decision records for AI-generated implementations.

Your Objective

Given AI-generated code snippets, you will write a retroactive decision record answering: what non-obvious choices were made, what alternatives exist, what constraints are accepted, and what a future engineer needs to know.

The assistant will provide code, evaluate your decision record, and guide you toward completeness. Complete at least 3 exchanges.

Start by typing: "Give me an AI-generated code snippet so I can write a retroactive decision record for it."
Decision Record Lab
AI Assistant
Welcome to Lab 3. I'll give you AI-generated code snippets — your task is to write a retroactive decision record for each one. A good record answers: what non-obvious choices were made, what alternatives exist, what constraints are accepted, and what future engineers need to know. Type the prompt below to get your first snippet.
Module 6 · Lesson 4

Building a Documentation Review Checklist for AI Code

A structured checklist converts abstract documentation standards into repeatable reviewer behavior.
What does your team's documentation standard look like when it must apply to code no human designed?

Palantir's 2023 engineering blog post on AI code integration described a documentation review checklist they developed after six months of incidents with AI-assisted PRs. The checklist had three sections: Intent Documentation (does the code explain what it is designed to do and not do), Assumption Documentation (are all implicit preconditions named), and Change Safety (does the documentation give a future engineer enough context to safely modify the code without breaking undocumented invariants). PRs missing any section were returned without review.

Why Checklists Work for Documentation Review

Atul Gawande's 2009 research on surgical checklists — and the subsequent World Health Organization Surgical Safety Checklist adoption — demonstrated that expert practitioners under cognitive load systematically skip steps they know to be important. The same dynamic applies to code review: senior engineers reviewing complex AI-generated code will focus cognitive effort on correctness and security, and documentation gaps will be deprioritized under time pressure.

A documentation checklist forces explicit attention to documentation quality as a separate review pass, not an afterthought. It also creates a common standard across a team — preventing the situation where documentation rigor depends entirely on individual reviewer preference.

The Five-Element AI Documentation Checklist

Based on documented practices from Palantir, Stripe, and Google DeepMind, the following five elements should appear on every AI code documentation review checklist:

1. Intent Statement Is there a human-authored statement of what this code is designed to do — not just what it does syntactically — including what it deliberately excludes from its scope?
2. Assumption Catalog Have all implicit preconditions (data shape, ranges, encoding, concurrency model, external service behavior) been named explicitly, and is each either validated at runtime or documented as an accepted constraint?
3. Failure Documentation Are all failure modes documented — what exceptions are intentionally thrown, what error states are possible, and what the caller must do when the operation does not succeed?
4. Design Rationale Is there a documented reason why this implementation approach was chosen over at least one alternative — making the choice reviewable rather than opaque?
5. Change Safety Note Does the documentation tell a future engineer what they must not change without understanding — what invariants the code relies on that are not enforced by types or tests?
The Knight Capital Group Incident and Documentation Gaps

The 2012 Knight Capital Group incident — in which a software deployment error caused $440 million in losses in 45 minutes — was partially attributable to undocumented legacy code behavior. A flag that had been repurposed in new code retained its original name from a system called SMARS; no documentation connected the old behavior to the new deployment. Engineers had no way to know from documentation alone that reusing the flag would activate dormant code. Knight Capital filed for bankruptcy within days.

While this predates LLMs, it illustrates the category of failure: when code behavior is not documented, future modifications operate on incomplete information. In AI-generated code, this risk is compounded — the original implementation had no human author who could be consulted. The documentation checklist is the only defense.

Integrating the Checklist into PR Review Process

Effective integration requires three structural changes to PR review workflow. First, the checklist should be embedded in the PR template — not a separate document, but a section authors must complete before requesting review. Second, reviewers should make a separate documentation review pass before the correctness review pass, not simultaneously. Third, documentation failures should be blocking — the same status as a failing test — not advisory comments that authors can resolve at their discretion.

Implementation Note

The most common reason documentation checklists fail is that they are treated as advisory rather than blocking. If a reviewer can approve a PR while noting "documentation could be improved," the checklist will not change behavior. Make it blocking or don't use it.

Calibrating Checklist Rigor to Risk

Not all AI-generated code warrants the same documentation rigor. A configuration helper script that is trivially replaceable carries different risk than a payment processing function or a data retention policy enforcement routine. Teams should define documentation tiers — typically three: low-risk utilities, medium-risk business logic, and high-risk safety/security/compliance-adjacent code — and apply proportionally scaled checklists. High-risk code should require all five elements plus external ADR files; low-risk code might require only an intent statement and assumption catalog.

The Core Principle

Documentation review for AI-generated code is not about style or thoroughness — it is about ensuring that the codebase contains enough information for the organization to understand, defend, and safely evolve every system component, even after the people who reviewed it have left.

Lesson 4 Quiz

Building a Documentation Review Checklist for AI Code
1. According to Palantir's 2023 engineering standards, what happened to PRs missing any section of their documentation checklist?
Correct. Palantir treated documentation as blocking — PRs missing checklist sections were returned without the code review even beginning.
Palantir's approach was to return incomplete PRs without review — making documentation a hard gate, not a soft recommendation.
2. What does the "Change Safety Note" element of the documentation checklist specifically require?
Correct. The Change Safety Note preserves context that prevents future modifications from breaking undocumented invariants — the most practically valuable documentation element.
The Change Safety Note specifically calls out invariants and dependencies that aren't captured by types or tests — the implicit constraints a modifier must understand.
3. What did the Knight Capital Group 2012 incident demonstrate about undocumented code behavior?
Correct. Reusing an undocumented flag activated dormant code — engineers had no way to know from documentation that this connection existed. $440M in losses followed.
The Knight Capital case illustrated that undocumented behavior makes future modifications dangerous — engineers cannot know what they don't know when documentation is absent.
4. Why should documentation review be a separate pass from correctness review?
Correct. Gawande's checklist research shows experts under cognitive load skip steps they know to be important. A separate pass counters this by making documentation an explicit focus.
The reason is cognitive load: during a complex correctness review, documentation gaps are consistently deprioritized. A dedicated pass forces the attention that combined review doesn't provide.
5. What is the most common reason documentation checklists fail to change reviewer behavior?
Correct. Advisory checklists are routinely overridden under time pressure. Only blocking status — the same as a failing test — reliably changes behavior.
The core failure mode is advisory status. When reviewers can approve a PR while noting documentation gaps as comments, the checklist won't change outcomes.

Lab 4 — Building Your Team's Documentation Checklist

Design and test a documentation review checklist tailored to a specific codebase context.

Your Objective

You'll design a documentation review checklist for a specific context the assistant gives you — either a payment processing service, a data pipeline, or a public API. Your checklist must include all five elements, specify blocking vs. advisory status for each, and calibrate rigor to risk tier.

The assistant will evaluate your checklist against real-world cases and ask you to refine it. Complete at least 3 exchanges.

Start by typing: "Give me a codebase context so I can design an AI documentation review checklist for it."
Checklist Design Lab
AI Assistant
Welcome to Lab 4. I'll give you a specific codebase context and you'll design a documentation review checklist for AI-generated code in that context. Your checklist should cover all five elements (intent, assumptions, failures, rationale, change safety), specify blocking vs. advisory status, and scale to risk tier. Type the prompt below to receive your context.

Module 6 Test

Documentation and Assumption Auditing · 15 Questions · Pass at 80%
1. A language model generates a docstring for a function. What is the docstring most likely to contain?
Correct. Model-generated docstrings are statistically plausible descriptions — not records of actual design decisions or constraints.
Model docstrings reflect statistical prediction, not authentic intent. They typically describe parameters without constraint, rationale, or failure documentation.
2. According to the NYU Copilot study, human-written docstrings included constraint and rationale language at what rate compared to AI-generated ones?
Correct. Human docstrings included normative content at rates ten to twenty times higher than AI-generated docstrings in the same dataset.
The study found a ten-to-twenty-times difference in normative content — a substantial gap, not a marginal one.
3. An assumption audit specifically asks reviewers to identify what?
Correct. An assumption audit catalogs every implicit precondition and requires each to be either validated at runtime or documented as an accepted constraint.
Assumption auditing specifically targets implicit preconditions — the silent beliefs encoded in code about the state of the world, not correctness or performance issues.
4. In the Air Canada chatbot case, what was the primary engineering failure?
Correct. The assumed behavior existed in no system document — making it unauditable, undetectable, and undefendable until a court ruled on it.
The engineering failure was the complete absence of documentation for the model's inferred policy behavior — it could not be audited because it was written nowhere.
5. What does "missing failure documentation" mean in the context of AI-generated code?
Correct. Missing failure documentation means callers cannot determine from documentation alone how to handle the function's error states.
Missing failure documentation is the absence of documented failure modes — what exceptions are intentional, what error states exist, and what callers must do about them.
6. The Toyota ETCS expert testimony established that undocumented timing assumptions were considered what?
Correct. The Toyota case established the legal and engineering precedent: undocumented assumptions in safety-relevant code are defects, not documentation preferences.
Expert witnesses testified that undocumented assumptions constituted defects — regardless of whether the code compiled and ran correctly under nominal conditions.
7. A retroactive decision record for AI-generated code is fundamentally different from a standard ADR because:
Correct. Standard ADRs record a decision that was made. Retroactive records must construct the decision that would justify code that was generated without intent.
The fundamental difference is that no decision existed — the reviewer must reverse-engineer what decision would justify the generation, then verify it's defensible.
8. Microsoft's 2023 internal playbooks found AI-generated service integration code consistently encoded what assumption?
Correct. Binary outcome assumptions — success or clean exception — were the documented pattern, missing the common real-world case of partial or malformed responses.
The documented assumption was binary outcomes: full success or clean exception. Real-world APIs frequently return partial, malformed, or ambiguous responses that this assumption cannot handle.
9. In the three-pass assumption audit, what does the absence of error handling typically indicate?
Correct. Where there is no error branch, the code encodes the assumption of success — often the most dangerous implicit belief because it is the most invisible.
Absent error handling is itself an assumption: "this cannot fail." The third pass specifically looks for this pattern because it reveals the most consequential silent beliefs.
10. Atul Gawande's surgical checklist research is cited in this module to support what claim about code review?
Correct. Gawande's research shows experts under cognitive load skip known-important steps — the same dynamic makes a dedicated documentation review pass necessary for code review.
The research shows that even experts skip important steps under load. A separate documentation pass counters this by removing documentation review from the cognitive competition with correctness review.
11. Which of the following is the most practically valuable element of the five-element documentation checklist?
Correct. The Change Safety Note has the most immediate protective value for future development — it prevents the class of incident where a reasonable-seeming change breaks an undocumented invariant.
While all five elements are important, the Change Safety Note is described as most practically valuable because it directly prevents the modification errors that cause incidents like the Knight Capital case.
12. The Knight Capital Group 2012 incident ($440M loss) illustrates what specific documentation failure?
Correct. The SMARS flag's behavior was undocumented — its connection to dormant code was invisible to engineers making what appeared to be a routine deployment change.
The Knight Capital failure was specifically a documentation gap: undocumented flag behavior made a reasonable-seeming deployment change catastrophic. No one could know what they didn't know.
13. Why must decision records be stored version-controlled alongside code rather than in a separate wiki?
Correct. Records in the repository appear in PR diffs — making them visible during review of changes and requiring a conscious update decision rather than silent obsolescence.
The key reason is drift prevention: wiki-based records become silently obsolete when code changes. Version-controlled records appear in diffs and demand attention during code review.
14. How should documentation checklist rigor be calibrated across different types of AI-generated code?
Correct. Risk-tiered documentation standards apply proportionally scaled rigor — preventing over-documentation of trivial utilities while ensuring comprehensive coverage of high-risk code.
Risk-tiered calibration is the documented approach: low-risk utilities need minimal documentation, high-risk compliance and safety code requires comprehensive coverage including external ADR files.
15. If a reviewer cannot write a defensible retroactive decision record for an AI-generated implementation, the correct response is:
Correct. The retroactive decision record is diagnostic. When it reveals that the implementation cannot be justified, rejection — not additional documentation — is the right response.
The principle is clear: if you cannot write a defensible record, the implementation is wrong for your context. Documentation cannot fix a contextually inappropriate implementation.