Module 4 · Lesson 1

Anatomy of an AI-Assisted PRD

What product requirements documents are, why they break, and how AI changes the drafting equation.

What makes a PRD effective — and where does AI actually add leverage?

In early 2022, Figma's product team publicly discussed how their PRD process had accumulated what one PM called "the archaeology problem" — requirements documents that were written, then never meaningfully updated, so engineers were often working from specs that had been superseded by three rounds of design iteration. The documents existed; they simply no longer reflected reality. The team began experimenting with using large language models to summarize Figma comment threads, design changelogs, and Slack discussions into living requirement drafts — reducing the gap between what was decided and what was documented.

The insight wasn't that AI would write PRDs from scratch. It was that AI could close the latency between product decisions and written artifacts, making specs useful again rather than ceremonial.

What a PRD Actually Is

A Product Requirements Document is the contract between product thinking and engineering execution. It answers three questions: what is being built, why it is being built (the problem and strategic rationale), and how success is measured (acceptance criteria and metrics). Everything else — wireframes, technical architecture, sprint plans — is downstream of a well-formed PRD.

In practice, PRDs degrade in four predictable ways. They are written too late (after design has already locked). They are too vague (using hedges like "users should be able to easily…" without measurable thresholds). They become stale instantly when requirements change but the document does not. And they accumulate scope without explicit tradeoff decisions, making every feature sound equally important.

Why This Matters

A 2021 study by the Project Management Institute found that 29% of project failures trace to poor requirements gathering. In software product development, the cost of a misunderstood requirement compounds: each stage of the SDLC multiplies the rework cost by roughly 5–10x. A requirement misread at spec time costs minutes to fix; discovered in QA, it costs days.

The Classic PRD Structure

While formats vary by organization, a robust PRD contains six core sections. First, the problem statement: a crisp description of the user pain or business opportunity, grounded in evidence. Second, goals and non-goals: what this release will and explicitly will not do. Third, user stories or job-to-be-done framing: the specific scenarios the product must serve. Fourth, functional requirements: the specific behaviors the system must exhibit. Fifth, acceptance criteria: testable conditions that determine whether each requirement is satisfied. Sixth, open questions: unresolved decisions that must be made before or during development.

Each section serves a different stakeholder. Engineers care most about functional requirements and acceptance criteria. Designers need the problem statement and user stories. Executives need goals and non-goals to assess scope and risk.

Where AI Intervenes in the PRD Lifecycle

AI does not replace the thinking that produces good requirements. It accelerates and stress-tests it. The most productive insertion points are: problem statement sharpening (AI can critique vague framing and suggest more specific language), requirements completeness checks (AI can identify missing edge cases, unstated assumptions, or contradictory requirements), acceptance criteria generation (AI can draft testable criteria from a given user story), and template population (AI can scaffold the structure from a rough brief, leaving humans to fill in the judgment calls).

The critical constraint: AI has no access to your company's internal data, your users' actual behavior, or your team's unstated strategic context. A PRD produced entirely by AI will be structurally correct but contextually hollow. The practice is collaborative drafting — the PM provides the signal, the AI provides the structure and the stress test.

Problem StatementA concise, evidence-backed description of the user need or business opportunity a feature addresses. Good problem statements are falsifiable: they include specific evidence and define who has the problem.

Acceptance CriteriaTestable, binary conditions that determine whether a requirement has been met. Prefer "Given / When / Then" (Gherkin) format to ensure unambiguous verification.

Non-GoalsExplicit statements of what a release will NOT address. Non-goals are as important as goals — they prevent scope creep and set engineering expectations.

Framework to Remember

The "PRFAQ" method, used internally at Amazon since the Bezos era, requires PMs to write the press release and FAQ for a feature before a single line of code is written. When teams at Amazon began using AI to draft first-pass PRFAQs from meeting notes in 2023, they reported that the AI's FAQ section was particularly useful — not for its answers, but for surfacing questions the team hadn't yet articulated. The AI's ignorance was the feature.

Prompting Strategy for PRD Drafting

Effective AI-assisted PRD work requires structured prompting. The most reliable pattern is the "Context → Problem → Constraints → Output Format" structure. You give the AI your product context (what product, what segment, what stage), the specific problem being addressed, any hard constraints (technical, legal, timeline), and the specific section or format you want it to produce.

Generic prompts produce generic output. "Write a PRD for a notification feature" produces a document that could belong to any product. "Write the functional requirements section for a push notification preference center for a B2B SaaS tool used by operations managers, where the primary constraint is that users must be able to manage preferences without IT involvement and changes must propagate within 60 seconds" produces something actually useful.

Lesson 1 Quiz

Anatomy of an AI-Assisted PRD · 5 questions

1. Which of the following best describes the "archaeology problem" Figma identified with their PRD process?

Correct. The "archaeology problem" referred to specs that had been superseded by iteration but never updated, making them historical artifacts rather than living guides.

Not quite. The issue was spec staleness — documents that existed but reflected outdated decisions, creating a dangerous gap between what was decided and what was documented.

2. According to the lesson, what is the PRIMARY leverage point AI offers in the PRD lifecycle?

Correct. Figma's insight — and the core lesson point — is that AI's value is in reducing the gap between when decisions are made and when they are documented, not in replacing human judgment.

Not quite. AI's primary leverage is in reducing documentation latency — the gap between decisions and written specs — not in replacing PM judgment or generating complete documents from nothing.

3. In the "Context → Problem → Constraints → Output Format" prompting pattern, why does specificity in the Constraints field matter most?

Correct. Without specific constraints (technical, legal, timeline, user context), AI produces documents that could apply to any product — structurally sound but contextually useless.

Not quite. Specific constraints anchor the output to your actual situation. Without them, AI produces generic PRD sections that lack the specificity needed to guide real engineering work.

4. Which PRD section is described as especially useful for AI to generate — not for its answers, but for the questions it surfaces?

Correct. The Amazon PRFAQ example illustrated that AI-generated FAQ sections are valuable precisely because AI's ignorance of internal context forces it to ask questions the team hadn't articulated — surfacing hidden assumptions.

Not quite. The Amazon PRFAQ example highlighted the FAQ/Open Questions section. AI's lack of internal context makes it ask questions teams overlook, which is the "AI's ignorance was the feature" insight.

5. A PRD requirement that states "users should be able to easily search for products" is an example of which PRD failure mode?

Correct. "Easily" is unmeasurable. A well-formed requirement would specify latency targets, result accuracy, or input format constraints — conditions an engineer can build to and a QA engineer can test against.

Not quite. This is the vagueness failure mode. Words like "easily" and "should be able to" signal requirements that lack the measurable acceptance criteria needed for engineering to build and QA to verify.

Lab 1: PRD Anatomy & Problem Statement Sharpening

Practice with AI · Spec Writing Assistant

Your Task

You are a PM at a B2B SaaS company. You have a rough idea: "Add a dashboard for team leads to see how their reports are performing on tasks." Your job is to work with the AI assistant to transform that vague idea into a sharpened problem statement and a set of functional requirements.

Start by pasting or typing your rough idea, then ask the AI to help you apply the PRD sections you learned about — problem statement, goals, non-goals, and acceptance criteria. Push for specificity. If the AI gives you vague language, call it out.

Try starting with: "Here's my rough product idea: [paste idea]. Help me write a sharp problem statement using evidence-based framing. Then identify what's missing before we can write functional requirements."

PRD Drafting Assistant

Lesson 1

Welcome to the PRD Drafting Lab. I'm here to help you turn rough product ideas into well-structured requirements. Share a product idea — even a one-liner — and we'll work through problem statement sharpening, goals, non-goals, and acceptance criteria together. What are you building?

Module 4 · Lesson 2

Writing User Stories and Acceptance Criteria with AI

Translating product intent into testable engineering contracts — and using AI to do it faster and more completely.

How do you turn a vague feature idea into something engineers can build and QA can verify?

In 2023, Linear — the project management tool favored by engineering teams — published internal notes from their product development process describing how they began using GPT-4 to generate "edge case user stories" from their core feature descriptions. Their PM team had noticed a consistent pattern: when writing user stories manually, they captured the happy path and one or two error states. The AI, given the same feature description, would surface between four and nine additional edge cases — race conditions, permission boundary failures, empty states, and offline behavior — that the human team hadn't written down. Linear estimated this cut QA-discovered spec gaps by roughly 35% in the two quarters following adoption.

The User Story Format and Why It Works

The canonical user story format — "As a [role], I want to [action] so that [benefit]" — exists for a reason. It forces the writer to specify who has the need (the role), what behavior they need to perform (the action), and why it matters (the benefit). When any of these three elements is missing, the story becomes untestable and the requirement becomes contested.

The most common failure is collapsing the role to "user" — which tells engineers nothing about access levels, mental models, or frequency of the action. A notification preference story for an enterprise IT administrator implies different constraints than the same story for a frontline sales rep. Same feature, different requirements.

User StoryA structured requirement statement: "As a [role], I want [action] so that [benefit]." Each story should be independently testable and represent a single unit of user value.

Gherkin FormatA structured acceptance criteria format: "Given [precondition], When [action], Then [expected outcome]." Forces unambiguous, binary-testable criteria that QA can execute without interpretation.

Edge Case StoryA user story specifically written to cover non-happy-path scenarios: error states, empty states, permission boundaries, timeout behavior, concurrent access conflicts.

Acceptance Criteria: The Gherkin Advantage

Acceptance criteria written in Gherkin format (Given / When / Then) have a structural advantage: they are binary. Either the "Then" condition is true or it isn't. This eliminates interpretive disagreements between engineering and QA — the most expensive source of late-stage rework in software projects.

Consider: "The search results should appear quickly" vs. "Given a user has typed a query of at least three characters, When the user presses Enter or waits 300ms, Then search results appear within 500ms with a loading indicator displayed after 100ms." The second version can be tested by a machine. The first requires a human judgment call that will differ between everyone who evaluates it.

Pattern Alert

When prompting AI to generate acceptance criteria, always specify: (1) the user role, (2) the precondition state (logged in? empty account? first visit?), (3) the input or trigger action, and (4) the system's expected response. Without these four elements, AI will generate plausible-sounding but ambiguous criteria that don't actually constrain engineering behavior.

Using AI to Surface Edge Cases

The Linear finding — that AI surfaces four to nine edge cases per feature that humans miss — reflects a fundamental asymmetry: AI has read millions of software specifications and bug reports. It has a large "vocabulary" of things that go wrong. Human PMs working from memory tend to write the story they care about most and underinvest in failure states, offline behavior, concurrent access, and permission edge cases.

A reliable AI prompt for edge case generation: "Given this user story: [story], list every edge case, error state, empty state, and permission boundary condition that would need its own acceptance criteria. Be exhaustive. Do not filter for likelihood." The "do not filter for likelihood" instruction is critical — AI will otherwise suppress improbable but important edge cases like data corruption on mid-write server failure.

INVEST Criteria

Good user stories meet the INVEST criteria: Independent (can be developed without dependency on another story), Negotiable (not a contract, but a conversation starter), Valuable (delivers user benefit), Estimable (engineers can size it), Small (fits in a sprint), Testable (has verifiable acceptance criteria). AI can evaluate a story against all six criteria and flag which ones it fails — a powerful review mechanism before stories go into sprint planning.

The Role Specificity Problem

Vague roles produce vague stories. When prompting AI to generate or refine user stories, providing a detailed persona description dramatically improves output quality. Instead of "As a user," specify: "As an operations manager at a mid-market logistics company who accesses the dashboard twice daily, manages a team of 12, and has read-only access to financial data." This level of role specificity allows AI to correctly infer permission constraints, urgency signals, and data access patterns that would otherwise require multiple clarifying rounds.

Teams at Spotify, in their 2022 public documentation on squad-level requirements writing, described maintaining "persona cards" — structured role descriptions used as context when prompting AI for story generation. Each card included job title, team size, key workflows, technical proficiency, and primary pain points. Stories generated with persona card context required significantly fewer revision cycles than stories generated from generic role names.

Lesson 2 Quiz

User Stories and Acceptance Criteria · 5 questions

1. What did Linear's 2023 internal experiment find when using GPT-4 to generate edge case user stories?

Correct. Linear found AI surfaced 4–9 edge cases per feature that human teams hadn't written, and estimated a ~35% reduction in QA-discovered spec gaps in the two quarters following adoption.

Not quite. Linear found AI surfaced 4–9 additional edge cases per feature (race conditions, permission boundaries, empty states, offline behavior) and cut QA-discovered spec gaps by roughly 35%.

2. Why does the Gherkin "Given / When / Then" format produce better acceptance criteria than plain-language descriptions?

Correct. Gherkin's structural advantage is binary testability. The "Then" clause is either satisfied or it isn't — no judgment call required, which eliminates the most expensive source of late-stage rework.

Not quite. The advantage is binary testability. Gherkin's "Then" clause is either satisfied or not — unlike plain-language descriptions that require human interpretation and generate disagreements between engineering and QA.

3. In the INVEST framework, what does the "T" (Testable) criterion specifically require?

Correct. "Testable" means the story has acceptance criteria clear enough that anyone evaluating it can determine unambiguously whether it has been satisfied — no subjective judgment required.

Not quite. "Testable" in INVEST means the story has verifiable acceptance criteria — someone can determine objectively whether the requirement is met, without needing subjective interpretation.

4. Why is the instruction "do not filter for likelihood" important when prompting AI to generate edge cases?

Correct. Without this instruction, AI self-filters to only likely edge cases. But improbable scenarios (mid-write server failure, concurrent permission changes) often cause the most severe failures and must be specified even if rare.

Not quite. Without "do not filter for likelihood," AI tends to suppress low-probability but high-impact edge cases. Rare events like data corruption on mid-write server failure are exactly the scenarios that need explicit acceptance criteria.

5. What did Spotify's 2022 public documentation describe as "persona cards" in the context of AI-assisted story writing?

Correct. Spotify's persona cards were structured context documents — job title, team size, workflows, technical proficiency, pain points — provided as context when prompting AI to generate user stories, significantly reducing revision cycles.

Not quite. Persona cards at Spotify were structured role descriptions used as AI context. Stories generated with this rich context required significantly fewer revision cycles than stories generated from generic role names like "user."

Lab 2: User Stories, Gherkin Criteria & Edge Cases

Practice with AI · Story Writing Assistant

Your Task

Choose a feature you know well — or use this one: "Allow users to export their account data as a CSV file." Work with the AI to: (1) write the user story in proper format with a specific role, (2) generate Gherkin acceptance criteria for the happy path, and (3) get an exhaustive list of edge cases. Then check the stories against the INVEST criteria.

Start with: "Feature: [describe feature]. Role: [describe specific user role]. Help me write a properly formatted user story, then generate Gherkin acceptance criteria for the happy path, then list every edge case I need to cover."

User Story Assistant

Lesson 2

Ready to write user stories. Give me a feature and a specific user role — the more specific the role, the better the stories. What feature are we working with?

Module 4 · Lesson 3

Scope, Tradeoffs, and Non-Goals

Using AI to pressure-test feature scope, surface hidden assumptions, and write non-goals that actually hold.

How do you prevent a PRD from becoming a wishlist — and how does AI help you make that discipline stick?

When Basecamp shipped Basecamp 3 in 2015 and then publicly discussed their product development philosophy in their 2019 book "Shape Up," they documented a practice they called "circuit breakers" — hard time limits on feature development that forced the team to make explicit scope decisions rather than letting features expand. Co-founder Ryan Singer described the enemy as what he called "scope creep by good intention" — each feature accreting small additions that individually seemed reasonable but collectively doubled engineering time. The discipline of writing explicit non-goals was central to their process: every shaped pitch included a section called "No-verts" — features that specifically would NOT be built in this cycle.

Basecamp's insight was that non-goals are not admissions of failure. They are strategic choices made visible — and they make PRDs honest documents rather than aspirational ones.

Why Scope Creep Happens at the Spec Stage

Scope creep most often enters not during development but during requirements writing. The mechanism is additive optimism: as a PM writes a feature spec, they naturally imagine the ideal version of the feature, then the adjacent features that would make it better, then the edge cases that become features themselves. Without a structural forcing function, PRDs grow. And once scope is in a PRD, it is politically difficult to remove — stakeholders treat document presence as implicit commitment.

AI can serve as a scope auditor. By analyzing a PRD draft, AI can flag requirements that appear to have expanded beyond the stated problem scope, identify features that represent second-order additions rather than core solutions, and surface assumptions that suggest the scope is larger than stated.

Non-GoalsExplicit statements of what the current release will NOT do. Non-goals differ from "out of scope" notes — they are deliberate strategic choices, not deferrals, and they should include the reasoning behind the exclusion.

Scope CreepThe gradual expansion of a feature or product scope beyond its original intent, typically without corresponding adjustment to timeline, resources, or success metrics.

Shape UpBasecamp's internal product methodology (published 2019), which uses fixed time budgets ("appetites") and explicit scope decisions at the spec stage rather than estimation after spec completion.

Writing Non-Goals That Hold

Weak non-goals are vague deferrals: "Advanced analytics will be considered for a future release." Strong non-goals are specific and include the reasoning: "This release will NOT include per-user analytics views. The decision reflects our current single-tenant data architecture; per-user views would require schema changes that are scoped to Q3. Teams requiring individual user data should use the existing CSV export function."

The strong version accomplishes three things: it names the specific excluded feature, explains the constraint that drives the exclusion, and provides an alternative or timeline. Engineering knows what not to build. Stakeholders understand the reasoning. Future PMs know the constraint when revisiting the decision.

When prompting AI to generate non-goals, provide the feature scope and a description of your current technical and resource constraints. Prompt: "Given this feature description: [spec], and these constraints: [list], generate five non-goals with reasoning. Each non-goal should name a specific exclusion and explain why it's excluded in this release."

Tradeoff Visibility

One underused AI application in PRD work is explicit tradeoff documentation. Prompt: "For each of the following features in this PRD, what is being traded away in the current implementation choice, and what are the top two alternative approaches?" This forces tradeoffs to be named rather than buried in implementation choices that stakeholders will only discover during code review.

The Appetite Model: Scope as Budget, Not Estimate

Basecamp's Shape Up methodology reframes scope through the concept of "appetite" — a fixed time budget assigned before spec writing begins, not after. Instead of estimating how long a feature will take and then negotiating scope, teams declare how much time they are willing to spend and then scope the feature to fit that budget. This inverts the usual spec dynamics: scope is the variable, time is the constant.

AI can support this model by helping PMs "scope to an appetite." Prompt: "I have a two-week engineering appetite for this feature. Given this initial scope: [scope], which requirements are essential to the core problem, which are enhancements, and which should be non-goals? Prioritize ruthlessly to fit two weeks."

The result is not a watered-down feature — it is a tightly scoped version that delivers the core value. Enhancements become backlog candidates with documented reasoning, not dropped ideas.

Real Pattern · Notion, 2021

When Notion shipped their API in May 2021, their public engineering blog documented that their initial spec had included real-time event webhooks as a core feature. The team made a documented decision to move webhooks to a non-goal for the initial release — the reasoning was that the core jobs-to-be-done (read/write access for integrations) could be satisfied without real-time events, and webhooks would add significant infrastructure complexity for a use case they hadn't yet validated. The non-goal was published in their developer documentation, giving the community a clear signal and a rationale rather than a surprise absence.

AI-Assisted Scope Auditing

Beyond generating non-goals, AI can audit existing PRDs for scope integrity. Useful prompts include: "Read this PRD and identify any requirements that appear to address a different problem than the stated problem statement," "Identify requirements that appear to be second-order enhancements rather than core solutions," and "Flag any requirements where the stated acceptance criteria imply a larger technical investment than the feature description suggests."

These prompts surface the gap between what the PRD says it's doing and what it's actually specifying. This gap — between stated intent and implied scope — is where engineering surprises live.

Lesson 3 Quiz

Scope, Tradeoffs, and Non-Goals · 5 questions

1. What did Basecamp call their practice of excluding specific features from a release spec, as documented in "Shape Up"?

Correct. Basecamp's "Shape Up" documented "No-verts" — explicit sections of each pitch that named specific features that would not be built in the current cycle. Ryan Singer framed these as strategic choices made visible, not failures.

Not quite. Basecamp called them "No-verts" — a section in every shaped pitch explicitly naming what would NOT be built. The insight: non-goals are strategic choices made visible, not admissions of failure.

2. What distinguishes a strong non-goal from a weak one?

Correct. A strong non-goal names what's excluded, explains the constraint (technical, time, validation), and provides a path forward (alternative approach or future timeline). This gives engineering, stakeholders, and future PMs all the context they need.

Not quite. Strong non-goals are specific: they name the excluded feature, explain the constraint driving the exclusion (architecture, timeline, unvalidated use case), and point to an alternative or future resolution.

3. In Basecamp's Shape Up "appetite" model, what is the key inversion from traditional estimation?

Correct. The appetite model inverts the usual equation: instead of specifying the feature and then estimating time, the team commits to a time budget first and then scopes the feature to fit. Time is constant; scope is the variable.

Not quite. The appetite model inverts estimation: time is the constant (the appetite, declared before spec writing) and scope becomes the variable — features are shaped to fit the time budget, not estimated after the fact.

4. What documented decision did Notion make when shipping their API in May 2021?

Correct. Notion explicitly moved real-time webhooks to a documented non-goal for their API launch, noting that core read/write jobs-to-be-done could be satisfied without them, and that webhooks would add infrastructure complexity for an unvalidated use case.

Not quite. Notion made webhooks an explicitly documented non-goal — published in their developer docs — because the core integration use cases didn't require real-time events, and adding webhooks would have added significant unwarranted infrastructure complexity.

5. Which AI prompt is BEST for performing a scope audit on an existing PRD draft?

Correct. An effective scope audit prompt asks AI to compare requirements against the stated problem and to identify where acceptance criteria imply unexpected technical depth — these are the two main locations where hidden scope accumulates.

Not quite. A scope audit needs AI to compare requirements against the stated problem scope and to surface mismatches between feature descriptions and what the acceptance criteria actually imply. Generic summarization or rewriting prompts miss this entirely.

Lab 3: Scope Auditing and Non-Goal Generation

Practice with AI · Scope & Tradeoffs Assistant

Your Task

Take an existing feature spec (yours or a hypothetical one) and use the AI to: (1) run a scope audit that flags requirements that drift from the stated problem, (2) generate five strong non-goals with reasoning, and (3) apply an "appetite" model — tell the AI you have a fixed time budget and ask it to prioritize ruthlessly to fit.

Try the Notion example: "Feature: In-app notification center for a project management tool. Problem: Users miss important task updates. Two-week engineering appetite."

Start with: "Feature: [describe]. Problem statement: [problem]. Engineering appetite: [time]. First, run a scope audit on this feature. Then generate five non-goals with reasoning. Then tell me what to cut to fit the appetite."

Scope & Non-Goals Assistant

Lesson 3

Let's pressure-test your feature scope. Share a feature description, a problem statement, and your engineering appetite (how much time you have). I'll audit the scope, generate non-goals with reasoning, and help you prioritize to fit your budget.

Module 4 · Lesson 4

Iterating, Reviewing, and Handing Off AI-Assisted Specs

How to use AI for PRD review cycles, cross-functional alignment, and handoff artifacts that engineering actually uses.

A spec that lives only in a document is not a spec — how do you make AI-assisted PRDs actually drive alignment?

In late 2023, Atlassian published a case study on how their internal Confluence and Jira teams had integrated AI-assisted spec review into their product development workflow. The practice they called "spec diff review" involved feeding a PRD draft into an AI tool alongside the previous version, asking the AI to surface every meaningful change, flag any new requirements that lacked acceptance criteria, and identify requirements in the new version that potentially conflicted with requirements that had been removed in the previous version. What had previously taken a senior PM a half-day review now took under thirty minutes, and the AI-generated diff caught three classes of errors that human reviewers had systematically missed: implicit assumptions added through passive voice, acceptance criteria silently weakened between drafts, and dependencies on other systems introduced without explicit callout.

The PRD Review Problem

Most PRDs go through multiple review cycles before engineering begins. Each cycle is an opportunity for requirements to drift, for assumptions to become implicit, and for cross-functional stakeholders to add requests without removing equivalent scope. The review process, without discipline, is the engine of scope creep.

AI can serve two distinct roles in review cycles. First, as a structural reviewer: checking that every user story has acceptance criteria, that every goal has a corresponding success metric, that every non-goal has reasoning, and that no section references an undefined term or external dependency without callout. Second, as a consistency reviewer: checking that requirements do not contradict each other, that acceptance criteria are consistent with the stated problem, and that technical constraints mentioned in one section are respected in all others.

Spec Diff ReviewAI-assisted comparison of PRD versions to surface meaningful changes, flag new requirements lacking criteria, and identify potential conflicts introduced between drafts.

Handoff ArtifactA derived document produced from the PRD for a specific audience — engineering brief, QA test plan, design annotation, stakeholder summary — tailored to that audience's information needs.

Passive Voice AssumptionA requirement written in passive voice that implies a system actor or decision without naming it — e.g., "the data will be validated" without specifying where, by what, or to what standard.

Cross-Functional Alignment via Derived Artifacts

A single PRD serves multiple audiences, but not equally well. Engineers care about functional requirements and acceptance criteria; they do not need the strategic rationale section. QA engineers need acceptance criteria in testable format; they need the edge case stories in a structure they can directly translate into test cases. Design teams need the user stories and the problem statement; they do not need the technical constraints section. Executive stakeholders need the goals, non-goals, and success metrics; they rarely read the functional requirements in detail.

AI enables rapid production of audience-specific derived artifacts. From a single PRD, you can prompt AI to generate: an engineering brief (functional requirements + acceptance criteria + open questions), a QA test plan skeleton (all acceptance criteria in Gherkin, organized by story), a design brief (problem statement + user stories + key constraints), and an executive summary (goals, non-goals, success metrics, key risks). Each audience gets the signal without the noise.

Prompt Pattern

To generate an engineering brief from a PRD: "Given this PRD: [paste], extract and reformat ONLY: (1) functional requirements as numbered list, (2) acceptance criteria in Gherkin for each requirement, (3) open questions requiring engineering input, (4) explicit dependencies on other systems or teams. Do not include the problem statement, goals section, or design notes. Engineer audience: senior fullstack engineers familiar with [tech stack]."

The Handoff Conversation: PRD to Engineering

Even well-written PRDs produce misunderstandings at handoff. The most reliable mitigation is not a better document — it is a structured conversation that uses the PRD as the starting point rather than the ending point. Before development begins, senior engineers should be asked to read the PRD and produce a list of clarifying questions. This list is diagnostic: the questions engineers ask reveal which requirements are ambiguous or underspecified.

AI can accelerate this process. Prompt: "Read this PRD as a skeptical senior engineer. List every question you would ask the PM before starting development. Organize questions by: (1) ambiguous acceptance criteria, (2) missing technical constraints, (3) implicit dependencies on other systems, (4) requirements that appear to conflict with each other." The output serves as a pre-kickoff checklist — questions the PM can answer before the first engineering meeting, dramatically reducing kickoff meeting length and post-kickoff spec revisions.

Real Pattern · GitHub Copilot PRD Process, 2022

GitHub's Copilot product team, in a 2022 internal process documented in their engineering blog, described using AI to generate what they called "pre-mortem PRD reviews" — prompting the AI to read a completed spec and generate a list of scenarios in which the feature would fail to solve the stated problem. The scenarios were not used to kill the feature; they were used to identify which scenarios were accepted risks and which needed additional requirement coverage. This converted an implicit risk tolerance decision into an explicit, documented one — a practice now used by multiple product teams at GitHub.

Managing AI Hallucination Risk in Spec Writing

AI-assisted PRD work carries a specific hallucination risk: AI may generate requirements, acceptance criteria, or technical constraints that sound authoritative but reflect general software patterns rather than your specific system. A generated acceptance criterion referencing a "standard OAuth 2.0 flow" may not match your actual authentication implementation. A generated non-goal citing "current infrastructure limitations" may not accurately reflect your architecture.

The mitigation is systematic: treat all AI-generated PRD content as a first draft requiring domain review, not as a final specification. Establish a review checkpoint where a technical lead and a domain-expert PM sign off on all acceptance criteria before they enter sprint planning. When AI generates technical constraints or architectural references, verify them against your actual system documentation before committing them to the PRD. The AI's job is to scaffold and accelerate; the human's job is to verify and own.

Lesson 4 Quiz

Iterating, Reviewing, and Handoff · 5 questions

1. What three classes of errors did Atlassian find that AI spec diff review caught that human reviewers systematically missed?

Correct. Atlassian's case study identified three systematically missed error classes: implicit assumptions introduced via passive voice, acceptance criteria silently weakened between drafts, and new dependencies on other systems introduced without explicit callout.

Not quite. Atlassian found three specific error classes: passive voice assumptions, silently weakened acceptance criteria between drafts, and undeclared system dependencies. These are subtle changes that human reviewers tend to overlook in iterative review cycles.

2. Which of the following is a "passive voice assumption" in a requirement?

Correct. "The data will be validated before submission" is a passive voice assumption — it implies validation will happen but doesn't specify where (client? server?), by what (which service?), or to what standard (which rules?). The actor and mechanism are invisible.

Not quite. "The data will be validated before submission" is the passive voice assumption. It implies validation occurs but hides who validates, where validation happens, and what rules apply — making it impossible for engineering to implement without guessing.

3. What is the purpose of AI-generated "pre-mortem PRD reviews," as practiced by GitHub's Copilot team?

Correct. GitHub's pre-mortem reviews asked AI to generate scenarios where the feature would fail. The outputs were not used to kill features but to identify which failure scenarios were accepted risks versus which needed additional requirements — converting implicit risk decisions into explicit ones.

Not quite. Pre-mortem reviews generate failure scenarios to convert implicit risk tolerance into explicit documentation. The PM reviews the scenarios and either adds requirements to cover them or explicitly accepts the risk — making hidden assumptions visible.

4. When generating audience-specific derived artifacts from a PRD, which content should be included in an engineering brief but NOT in an executive summary?

Correct. The engineering brief contains functional requirements, acceptance criteria in Gherkin, and open engineering questions. Executive summaries contain goals, non-goals, success metrics, and key risks — not detailed functional requirements that would bury the signal executives need.

Not quite. Engineering briefs contain functional requirements, Gherkin acceptance criteria, and open engineering questions. Executive summaries contain goals, non-goals, success metrics, and risks. Mixing these signals for the wrong audience reduces the document's utility.

5. What is the correct risk mitigation for AI hallucination in AI-assisted PRD writing?

Correct. The mitigation is systematic: AI produces first drafts, a technical lead and PM sign off on acceptance criteria before sprint planning, and all technical references (architecture, auth flows, API patterns) are verified against actual system docs.

Not quite. The correct mitigation treats all AI-generated content as a first draft requiring expert review — specifically a technical lead reviewing acceptance criteria and a PM verifying that technical constraints reflect the actual system, not general software patterns.

Lab 4: PRD Review, Pre-Mortem, and Handoff Artifacts

Practice with AI · Spec Review & Handoff Assistant

Your Task

Take a PRD you've written in earlier labs or create a short one now. Use the AI to: (1) run a structural review (every story has criteria, no passive voice assumptions, no undeclared dependencies), (2) generate a pre-mortem failure scenario list, and (3) produce two derived artifacts — an engineering brief and an executive summary — from the same PRD.

Start with: "Here's my PRD: [paste text]. First, run a structural review and flag passive voice assumptions, missing acceptance criteria, and undeclared dependencies. Then give me five pre-mortem failure scenarios. Then generate an engineering brief and a one-paragraph executive summary."

PRD Review & Handoff Assistant

Lesson 4

Ready to review and transform your PRD. Paste your spec — even a rough one — and I'll run a structural review, generate pre-mortem failure scenarios, and produce audience-specific handoff artifacts. What's the PRD?

Module 4 Test

Spec and PRD Writing with AI · 15 questions · Pass at 80%

1. Which of the following best describes what a Product Requirements Document is?

Correct.

A PRD is the contract between product thinking and engineering execution — answering what, why, and how success is measured.

2. According to the 2021 Project Management Institute study cited in the module, what percentage of project failures trace to poor requirements gathering?

Correct. The PMI 2021 study found 29% of project failures trace to poor requirements gathering.

The PMI 2021 study found 29% of project failures trace to poor requirements gathering — a significant but frequently underestimated contributor.

3. What is the primary reason AI-generated PRD content must be treated as a first draft rather than a final specification?

Correct. AI lacks access to internal data, actual user behavior, and team-specific context — making its output contextually unreliable without domain expert review.

AI produces structurally sound but contextually hollow PRD content because it has no access to internal data, user behavior, or strategic context — all of which require human input to fill in accurately.

4. In the canonical user story format "As a [role], I want [action] so that [benefit]," what is the most common role-specification failure?

Correct. Generic "user" roles strip the story of all context engineers need to infer permission levels, data access, and interaction frequency.

The most common failure is collapsing the role to "user" — which tells engineers nothing about access levels, workflows, or use patterns that differentiate how the feature should behave.

5. Which Gherkin acceptance criterion is correctly formatted and testable?

Correct. This is properly formatted Gherkin: Given (precondition), When (action), Then (measurable outcome) — binary testable with no interpretation required.

Only the Gherkin "Given / When / Then" format with measurable thresholds is binary testable. "Quickly," "good," and user story format are not acceptance criteria.

6. What key insight drove Spotify's use of "persona cards" when prompting AI to generate user stories?

Correct. Persona cards gave AI the context to infer access levels, urgency signals, and data constraints — producing stories that required significantly fewer revision cycles than generic "user" role stories.

Persona cards worked because they gave AI rich role context — job title, workflows, technical proficiency, pain points — enabling it to correctly infer constraints that generic role labels would miss.

7. Ryan Singer's concept of "scope creep by good intention" (Shape Up) describes which mechanism?

Correct. Each small addition seems reasonable individually, but collectively they compound into major scope expansion — without anyone having explicitly decided to expand the scope.

"Scope creep by good intention" specifically describes individually reasonable additions that collectively double engineering time — the cumulative effect of small yeses that no one explicitly approved as a scope increase.

8. What made Notion's published non-goal for real-time webhooks in their May 2021 API launch an example of good non-goal practice?

Correct. Notion's non-goal was specific, explained its reasoning, connected it to validated use cases, and was published publicly — all elements of a strong non-goal that provides genuine context rather than vague deferral.

Notion's non-goal was strong because it named the excluded feature, explained why (infrastructure complexity, unvalidated use case), noted core jobs-to-be-done could be served without it, and was published for community transparency.

9. In the appetite-based scoping model, what should the AI be instructed to do when given a feature spec and a fixed time budget?

Correct. The appetite model uses time as a constant and scope as the variable — AI should identify what's essential, classify the rest as backlog or non-goals, and preserve the core value in the budget.

In appetite-based scoping, AI should prioritize requirements by essentiality to the core problem, move enhancements to a documented backlog, and generate non-goals for the rest — making time the constant and scope the variable.

10. What are the two distinct roles AI can play in PRD review cycles, according to the lesson?

Correct. AI as structural reviewer checks every story has criteria, every goal has metrics, etc. As consistency reviewer, it checks for internal contradictions and cross-section integrity violations.

AI plays two review roles: structural (completeness — does every story have criteria?) and consistency (integrity — do requirements contradict each other or violate stated constraints?).

11. Which of the following is the correct prompt instruction for maximizing edge case coverage when asking AI to generate user story edge cases?

Correct. "Do not filter for likelihood" is the critical instruction — without it, AI self-filters to common scenarios and omits rare but catastrophic edge cases like mid-write server failures.

The critical instruction is "do not filter for likelihood" — AI will otherwise omit improbable but high-impact edge cases. The goal is exhaustiveness at this stage; the PM can later triage which to specify in detail.

12. The "INVEST" criteria for user stories includes six checks. Which of the following is NOT one of the INVEST criteria?

Correct. INVEST stands for Independent, Negotiable, Valuable, Estimable, Small, Testable. "Validated" is not one of the six criteria.

INVEST = Independent, Negotiable, Valuable, Estimable, Small, Testable. "Validated" is not in the framework — though validating stories with users is a separate good practice.

13. What is a "spec diff review" as implemented by Atlassian's product teams?

Correct. Spec diff review uses AI to compare PRD versions, surfacing what changed, what's now missing criteria, and what may conflict — reducing half-day senior PM reviews to under 30 minutes.

Spec diff review is AI-assisted PRD version comparison — surfacing meaningful changes, flagging new requirements without acceptance criteria, and identifying conflicts between the old and new versions.

14. When asking AI to generate an engineering brief derived from a PRD, which content should be explicitly EXCLUDED from the brief?

Correct. Engineering briefs should contain functional requirements, acceptance criteria, open engineering questions, and dependencies — not the problem statement or strategic rationale that executives and PMs need but engineers don't.

Engineering briefs should exclude problem statements, strategic goals, and design rationale. Engineers need functional requirements, acceptance criteria, open technical questions, and system dependencies — not the strategic context.

15. What is the most accurate description of how AI should be positioned in an organization's PRD writing workflow?

Correct. This is the collaborative model: AI provides structure, speed, and analytical stress-testing; humans provide context, judgment, and accountability. Neither role is optional.

The correct model is collaborative: AI scaffolds drafts, surfaces edge cases, audits consistency, and generates artifacts. Humans provide the contextual knowledge, make tradeoff decisions, and verify technical references — then own the final document.