In early 2022, Figma's product team publicly discussed how their PRD process had accumulated what one PM called "the archaeology problem" — requirements documents that were written, then never meaningfully updated, so engineers were often working from specs that had been superseded by three rounds of design iteration. The documents existed; they simply no longer reflected reality. The team began experimenting with using large language models to summarize Figma comment threads, design changelogs, and Slack discussions into living requirement drafts — reducing the gap between what was decided and what was documented.
The insight wasn't that AI would write PRDs from scratch. It was that AI could close the latency between product decisions and written artifacts, making specs useful again rather than ceremonial.
A Product Requirements Document is the contract between product thinking and engineering execution. It answers three questions: what is being built, why it is being built (the problem and strategic rationale), and how success is measured (acceptance criteria and metrics). Everything else — wireframes, technical architecture, sprint plans — is downstream of a well-formed PRD.
In practice, PRDs degrade in four predictable ways. They are written too late (after design has already locked). They are too vague (using hedges like "users should be able to easily…" without measurable thresholds). They become stale instantly when requirements change but the document does not. And they accumulate scope without explicit tradeoff decisions, making every feature sound equally important.
A 2021 study by the Project Management Institute found that 29% of project failures trace to poor requirements gathering. In software product development, the cost of a misunderstood requirement compounds: each stage of the SDLC multiplies the rework cost by roughly 5–10x. A requirement misread at spec time costs minutes to fix; discovered in QA, it costs days.
While formats vary by organization, a robust PRD contains six core sections. First, the problem statement: a crisp description of the user pain or business opportunity, grounded in evidence. Second, goals and non-goals: what this release will and explicitly will not do. Third, user stories or job-to-be-done framing: the specific scenarios the product must serve. Fourth, functional requirements: the specific behaviors the system must exhibit. Fifth, acceptance criteria: testable conditions that determine whether each requirement is satisfied. Sixth, open questions: unresolved decisions that must be made before or during development.
Each section serves a different stakeholder. Engineers care most about functional requirements and acceptance criteria. Designers need the problem statement and user stories. Executives need goals and non-goals to assess scope and risk.
AI does not replace the thinking that produces good requirements. It accelerates and stress-tests it. The most productive insertion points are: problem statement sharpening (AI can critique vague framing and suggest more specific language), requirements completeness checks (AI can identify missing edge cases, unstated assumptions, or contradictory requirements), acceptance criteria generation (AI can draft testable criteria from a given user story), and template population (AI can scaffold the structure from a rough brief, leaving humans to fill in the judgment calls).
The critical constraint: AI has no access to your company's internal data, your users' actual behavior, or your team's unstated strategic context. A PRD produced entirely by AI will be structurally correct but contextually hollow. The practice is collaborative drafting — the PM provides the signal, the AI provides the structure and the stress test.
The "PRFAQ" method, used internally at Amazon since the Bezos era, requires PMs to write the press release and FAQ for a feature before a single line of code is written. When teams at Amazon began using AI to draft first-pass PRFAQs from meeting notes in 2023, they reported that the AI's FAQ section was particularly useful — not for its answers, but for surfacing questions the team hadn't yet articulated. The AI's ignorance was the feature.
Effective AI-assisted PRD work requires structured prompting. The most reliable pattern is the "Context → Problem → Constraints → Output Format" structure. You give the AI your product context (what product, what segment, what stage), the specific problem being addressed, any hard constraints (technical, legal, timeline), and the specific section or format you want it to produce.
Generic prompts produce generic output. "Write a PRD for a notification feature" produces a document that could belong to any product. "Write the functional requirements section for a push notification preference center for a B2B SaaS tool used by operations managers, where the primary constraint is that users must be able to manage preferences without IT involvement and changes must propagate within 60 seconds" produces something actually useful.
You are a PM at a B2B SaaS company. You have a rough idea: "Add a dashboard for team leads to see how their reports are performing on tasks." Your job is to work with the AI assistant to transform that vague idea into a sharpened problem statement and a set of functional requirements.
Start by pasting or typing your rough idea, then ask the AI to help you apply the PRD sections you learned about — problem statement, goals, non-goals, and acceptance criteria. Push for specificity. If the AI gives you vague language, call it out.
In 2023, Linear — the project management tool favored by engineering teams — published internal notes from their product development process describing how they began using GPT-4 to generate "edge case user stories" from their core feature descriptions. Their PM team had noticed a consistent pattern: when writing user stories manually, they captured the happy path and one or two error states. The AI, given the same feature description, would surface between four and nine additional edge cases — race conditions, permission boundary failures, empty states, and offline behavior — that the human team hadn't written down. Linear estimated this cut QA-discovered spec gaps by roughly 35% in the two quarters following adoption.
The canonical user story format — "As a [role], I want to [action] so that [benefit]" — exists for a reason. It forces the writer to specify who has the need (the role), what behavior they need to perform (the action), and why it matters (the benefit). When any of these three elements is missing, the story becomes untestable and the requirement becomes contested.
The most common failure is collapsing the role to "user" — which tells engineers nothing about access levels, mental models, or frequency of the action. A notification preference story for an enterprise IT administrator implies different constraints than the same story for a frontline sales rep. Same feature, different requirements.
Acceptance criteria written in Gherkin format (Given / When / Then) have a structural advantage: they are binary. Either the "Then" condition is true or it isn't. This eliminates interpretive disagreements between engineering and QA — the most expensive source of late-stage rework in software projects.
Consider: "The search results should appear quickly" vs. "Given a user has typed a query of at least three characters, When the user presses Enter or waits 300ms, Then search results appear within 500ms with a loading indicator displayed after 100ms." The second version can be tested by a machine. The first requires a human judgment call that will differ between everyone who evaluates it.
When prompting AI to generate acceptance criteria, always specify: (1) the user role, (2) the precondition state (logged in? empty account? first visit?), (3) the input or trigger action, and (4) the system's expected response. Without these four elements, AI will generate plausible-sounding but ambiguous criteria that don't actually constrain engineering behavior.
The Linear finding — that AI surfaces four to nine edge cases per feature that humans miss — reflects a fundamental asymmetry: AI has read millions of software specifications and bug reports. It has a large "vocabulary" of things that go wrong. Human PMs working from memory tend to write the story they care about most and underinvest in failure states, offline behavior, concurrent access, and permission edge cases.
A reliable AI prompt for edge case generation: "Given this user story: [story], list every edge case, error state, empty state, and permission boundary condition that would need its own acceptance criteria. Be exhaustive. Do not filter for likelihood." The "do not filter for likelihood" instruction is critical — AI will otherwise suppress improbable but important edge cases like data corruption on mid-write server failure.
Good user stories meet the INVEST criteria: Independent (can be developed without dependency on another story), Negotiable (not a contract, but a conversation starter), Valuable (delivers user benefit), Estimable (engineers can size it), Small (fits in a sprint), Testable (has verifiable acceptance criteria). AI can evaluate a story against all six criteria and flag which ones it fails — a powerful review mechanism before stories go into sprint planning.
Vague roles produce vague stories. When prompting AI to generate or refine user stories, providing a detailed persona description dramatically improves output quality. Instead of "As a user," specify: "As an operations manager at a mid-market logistics company who accesses the dashboard twice daily, manages a team of 12, and has read-only access to financial data." This level of role specificity allows AI to correctly infer permission constraints, urgency signals, and data access patterns that would otherwise require multiple clarifying rounds.
Teams at Spotify, in their 2022 public documentation on squad-level requirements writing, described maintaining "persona cards" — structured role descriptions used as context when prompting AI for story generation. Each card included job title, team size, key workflows, technical proficiency, and primary pain points. Stories generated with persona card context required significantly fewer revision cycles than stories generated from generic role names.
Choose a feature you know well — or use this one: "Allow users to export their account data as a CSV file." Work with the AI to: (1) write the user story in proper format with a specific role, (2) generate Gherkin acceptance criteria for the happy path, and (3) get an exhaustive list of edge cases. Then check the stories against the INVEST criteria.
When Basecamp shipped Basecamp 3 in 2015 and then publicly discussed their product development philosophy in their 2019 book "Shape Up," they documented a practice they called "circuit breakers" — hard time limits on feature development that forced the team to make explicit scope decisions rather than letting features expand. Co-founder Ryan Singer described the enemy as what he called "scope creep by good intention" — each feature accreting small additions that individually seemed reasonable but collectively doubled engineering time. The discipline of writing explicit non-goals was central to their process: every shaped pitch included a section called "No-verts" — features that specifically would NOT be built in this cycle.
Basecamp's insight was that non-goals are not admissions of failure. They are strategic choices made visible — and they make PRDs honest documents rather than aspirational ones.
Scope creep most often enters not during development but during requirements writing. The mechanism is additive optimism: as a PM writes a feature spec, they naturally imagine the ideal version of the feature, then the adjacent features that would make it better, then the edge cases that become features themselves. Without a structural forcing function, PRDs grow. And once scope is in a PRD, it is politically difficult to remove — stakeholders treat document presence as implicit commitment.
AI can serve as a scope auditor. By analyzing a PRD draft, AI can flag requirements that appear to have expanded beyond the stated problem scope, identify features that represent second-order additions rather than core solutions, and surface assumptions that suggest the scope is larger than stated.
Weak non-goals are vague deferrals: "Advanced analytics will be considered for a future release." Strong non-goals are specific and include the reasoning: "This release will NOT include per-user analytics views. The decision reflects our current single-tenant data architecture; per-user views would require schema changes that are scoped to Q3. Teams requiring individual user data should use the existing CSV export function."
The strong version accomplishes three things: it names the specific excluded feature, explains the constraint that drives the exclusion, and provides an alternative or timeline. Engineering knows what not to build. Stakeholders understand the reasoning. Future PMs know the constraint when revisiting the decision.
When prompting AI to generate non-goals, provide the feature scope and a description of your current technical and resource constraints. Prompt: "Given this feature description: [spec], and these constraints: [list], generate five non-goals with reasoning. Each non-goal should name a specific exclusion and explain why it's excluded in this release."
One underused AI application in PRD work is explicit tradeoff documentation. Prompt: "For each of the following features in this PRD, what is being traded away in the current implementation choice, and what are the top two alternative approaches?" This forces tradeoffs to be named rather than buried in implementation choices that stakeholders will only discover during code review.
Basecamp's Shape Up methodology reframes scope through the concept of "appetite" — a fixed time budget assigned before spec writing begins, not after. Instead of estimating how long a feature will take and then negotiating scope, teams declare how much time they are willing to spend and then scope the feature to fit that budget. This inverts the usual spec dynamics: scope is the variable, time is the constant.
AI can support this model by helping PMs "scope to an appetite." Prompt: "I have a two-week engineering appetite for this feature. Given this initial scope: [scope], which requirements are essential to the core problem, which are enhancements, and which should be non-goals? Prioritize ruthlessly to fit two weeks."
The result is not a watered-down feature — it is a tightly scoped version that delivers the core value. Enhancements become backlog candidates with documented reasoning, not dropped ideas.
When Notion shipped their API in May 2021, their public engineering blog documented that their initial spec had included real-time event webhooks as a core feature. The team made a documented decision to move webhooks to a non-goal for the initial release — the reasoning was that the core jobs-to-be-done (read/write access for integrations) could be satisfied without real-time events, and webhooks would add significant infrastructure complexity for a use case they hadn't yet validated. The non-goal was published in their developer documentation, giving the community a clear signal and a rationale rather than a surprise absence.
Beyond generating non-goals, AI can audit existing PRDs for scope integrity. Useful prompts include: "Read this PRD and identify any requirements that appear to address a different problem than the stated problem statement," "Identify requirements that appear to be second-order enhancements rather than core solutions," and "Flag any requirements where the stated acceptance criteria imply a larger technical investment than the feature description suggests."
These prompts surface the gap between what the PRD says it's doing and what it's actually specifying. This gap — between stated intent and implied scope — is where engineering surprises live.
Take an existing feature spec (yours or a hypothetical one) and use the AI to: (1) run a scope audit that flags requirements that drift from the stated problem, (2) generate five strong non-goals with reasoning, and (3) apply an "appetite" model — tell the AI you have a fixed time budget and ask it to prioritize ruthlessly to fit.
Try the Notion example: "Feature: In-app notification center for a project management tool. Problem: Users miss important task updates. Two-week engineering appetite."
In late 2023, Atlassian published a case study on how their internal Confluence and Jira teams had integrated AI-assisted spec review into their product development workflow. The practice they called "spec diff review" involved feeding a PRD draft into an AI tool alongside the previous version, asking the AI to surface every meaningful change, flag any new requirements that lacked acceptance criteria, and identify requirements in the new version that potentially conflicted with requirements that had been removed in the previous version. What had previously taken a senior PM a half-day review now took under thirty minutes, and the AI-generated diff caught three classes of errors that human reviewers had systematically missed: implicit assumptions added through passive voice, acceptance criteria silently weakened between drafts, and dependencies on other systems introduced without explicit callout.
Most PRDs go through multiple review cycles before engineering begins. Each cycle is an opportunity for requirements to drift, for assumptions to become implicit, and for cross-functional stakeholders to add requests without removing equivalent scope. The review process, without discipline, is the engine of scope creep.
AI can serve two distinct roles in review cycles. First, as a structural reviewer: checking that every user story has acceptance criteria, that every goal has a corresponding success metric, that every non-goal has reasoning, and that no section references an undefined term or external dependency without callout. Second, as a consistency reviewer: checking that requirements do not contradict each other, that acceptance criteria are consistent with the stated problem, and that technical constraints mentioned in one section are respected in all others.
A single PRD serves multiple audiences, but not equally well. Engineers care about functional requirements and acceptance criteria; they do not need the strategic rationale section. QA engineers need acceptance criteria in testable format; they need the edge case stories in a structure they can directly translate into test cases. Design teams need the user stories and the problem statement; they do not need the technical constraints section. Executive stakeholders need the goals, non-goals, and success metrics; they rarely read the functional requirements in detail.
AI enables rapid production of audience-specific derived artifacts. From a single PRD, you can prompt AI to generate: an engineering brief (functional requirements + acceptance criteria + open questions), a QA test plan skeleton (all acceptance criteria in Gherkin, organized by story), a design brief (problem statement + user stories + key constraints), and an executive summary (goals, non-goals, success metrics, key risks). Each audience gets the signal without the noise.
To generate an engineering brief from a PRD: "Given this PRD: [paste], extract and reformat ONLY: (1) functional requirements as numbered list, (2) acceptance criteria in Gherkin for each requirement, (3) open questions requiring engineering input, (4) explicit dependencies on other systems or teams. Do not include the problem statement, goals section, or design notes. Engineer audience: senior fullstack engineers familiar with [tech stack]."
Even well-written PRDs produce misunderstandings at handoff. The most reliable mitigation is not a better document — it is a structured conversation that uses the PRD as the starting point rather than the ending point. Before development begins, senior engineers should be asked to read the PRD and produce a list of clarifying questions. This list is diagnostic: the questions engineers ask reveal which requirements are ambiguous or underspecified.
AI can accelerate this process. Prompt: "Read this PRD as a skeptical senior engineer. List every question you would ask the PM before starting development. Organize questions by: (1) ambiguous acceptance criteria, (2) missing technical constraints, (3) implicit dependencies on other systems, (4) requirements that appear to conflict with each other." The output serves as a pre-kickoff checklist — questions the PM can answer before the first engineering meeting, dramatically reducing kickoff meeting length and post-kickoff spec revisions.
GitHub's Copilot product team, in a 2022 internal process documented in their engineering blog, described using AI to generate what they called "pre-mortem PRD reviews" — prompting the AI to read a completed spec and generate a list of scenarios in which the feature would fail to solve the stated problem. The scenarios were not used to kill the feature; they were used to identify which scenarios were accepted risks and which needed additional requirement coverage. This converted an implicit risk tolerance decision into an explicit, documented one — a practice now used by multiple product teams at GitHub.
AI-assisted PRD work carries a specific hallucination risk: AI may generate requirements, acceptance criteria, or technical constraints that sound authoritative but reflect general software patterns rather than your specific system. A generated acceptance criterion referencing a "standard OAuth 2.0 flow" may not match your actual authentication implementation. A generated non-goal citing "current infrastructure limitations" may not accurately reflect your architecture.
The mitigation is systematic: treat all AI-generated PRD content as a first draft requiring domain review, not as a final specification. Establish a review checkpoint where a technical lead and a domain-expert PM sign off on all acceptance criteria before they enter sprint planning. When AI generates technical constraints or architectural references, verify them against your actual system documentation before committing them to the PRD. The AI's job is to scaffold and accelerate; the human's job is to verify and own.
Take a PRD you've written in earlier labs or create a short one now. Use the AI to: (1) run a structural review (every story has criteria, no passive voice assumptions, no undeclared dependencies), (2) generate a pre-mortem failure scenario list, and (3) produce two derived artifacts — an engineering brief and an executive summary — from the same PRD.