Claude Cowork — Final Exam

1. What is "prompt drift" as identified in Stripe's documentation workflow post-mortem?

Correct. Drift is organizational change — new terminology, updated voice guides, policy changes — that isn't reflected in the prompts that depended on the old reality.

Drift is the mismatch between prompt assumptions and current reality. New products, updated guides, changed policies all cause it — the prompt was accurate when written but the world moved on.

2. Salesforce Agentforce's "supervised autonomy" model graduated agent trust by:

Correct. Confidence thresholds per tool created a graduated system: below threshold → human queue; above threshold → auto-execute. Trust was calibrated, not binary.

Not quite. Agentforce used per-tool confidence thresholds: low-confidence actions went to a human queue; high-confidence actions executed automatically. Graduated trust, not all-or-nothing.

3. The four architectural elements of sustainable oversight are:

Correct. These four elements transform oversight from an attitude into an architecture: a named human, defined stop conditions, independent review, and a traceable record.

Incorrect. The four architectural elements are: named accountability, clear stopping criteria, separation of generation and approval, and audit trail.

4. What is the first step of an effective frame correction, as described in Lesson 4?

Correct. The three-part frame correction structure is: specific observation → explicit re-constraint → first-action reassignment. Beginning with a specific observation tells Claude exactly what is wrong with the current frame.

Not quite. An effective frame correction starts with a specific observation about what's miscalibrated — not a general complaint, but a precise description of what the output is doing versus what it should be doing.

5. Klarna's customer-service agent handled 2.3 million conversations in its first month. The six-month prerequisite to that launch was:

Correct. The efficiency gains came from precision of task boundaries, not model capability. Six months of delegation design preceded the launch.

Not quite. Klarna's team spent six months defining exact task-ownership boundaries — what the agent could resolve autonomously versus what required a human agent.

6. The "bookending" technique recommended by Anthropic for important instructions involves:

Correct. Bookending compensates for recency bias by restating key instructions near the end of the context — close to where Claude actually generates its response.

Not quite. Bookending means stating key instructions at the system prompt start AND briefly restating them just before the user turn, compensating for recency bias in long contexts.

7. What is the "oversight paradox" as described in Lesson 4?

Correct. Complacency drift means that improving agent quality actively undermines reviewer vigilance — requiring deliberate effort to stay sharp exactly when things are going well.

Incorrect. The oversight paradox is that a better agent produces fewer errors, which reduces reviewer vigilance, which makes rare-but-serious errors more likely to slip through.

8. Knight Capital Group lost $440 million in 45 minutes in 2012. The company's fate after this incident was:

Correct. Knight Capital was sold to Getco LLC within months of the incident. A single oversight architecture failure destroyed one of the largest equity trading firms in the US.

Incorrect. Knight Capital was sold to Getco LLC and ceased to exist as an independent firm — the company did not survive the $440 million loss caused by the algorithmic failure.

9. According to Zapier's documented workflow practices, where do multi-step AI chains most commonly break?

Correct. Garbage in propagates through every subsequent step. Standardizing the first input had more impact than any downstream prompt improvement.

Chains break most often at Step 1. Inconsistent initial inputs create errors that cascade through every downstream step regardless of how well those prompts are written.

10. Notion's 34% improvement in AI response satisfaction came from:

Correct. Context injection — fetching and prepending relevant prior documents — was the architectural change. Claude's model didn't change; its context did.

Not quite. Notion built a context injection layer: fetching and prepending the three most relevant prior documents per call. The model was unchanged.

11. The "two strikes and restart" heuristic is triggered when:

Correct. Two unsuccessful frame corrections indicate the opening message lacked sufficient specification — and the most efficient path is a new, better-specified opening rather than further incremental correction.

Not quite. The two-strikes rule is specifically about frame correction failures: if two frame corrections haven't materially improved calibration, the problem is in the foundation, not fixable through more corrections.

12. What does the Notion team's "one-hour reversibility" rule determine?

Correct. If wrong output can't be corrected within an hour, it requires a hard gate. Reversibility within one hour is the threshold for soft versus hard gate classification.

The one-hour rule is a gate-type classifier: errors fixable in under an hour → soft gate acceptable. Errors that take longer to fix → hard gate required before any action proceeds.

13. What notation did HubSpot's content team use to mark variable inputs in their prompt templates?

Correct. HubSpot used double curly braces — {{topic}}, {{target_persona}} — to mark the variable inputs that change each run without touching stable logic.

HubSpot's convention was {{double curly braces}} for variables. Any consistent notation works — the key is that anyone scanning the prompt can instantly identify what to fill in.

14. What does the four-step output review protocol's "goal alignment check" specifically test?

Correct. Goal alignment asks whether the agent answered the actual question asked — agents frequently answer a slightly different, more tractable version of the question posed.

Incorrect. Goal alignment checks whether the output addresses what was actually asked — agents often answer adjacent questions that are easier to answer, not the specific one posed.

15. What structural component of a prompt template is most commonly omitted by teams writing their first reusable prompts?

Correct. Authors know their context so well they forget to write it down. This is the invisible knowledge that breaks prompts when someone else runs them.

The context block is the most commonly missing component. Authors assume their knowledge of brand, audience, and constraints — but that knowledge isn't in the prompt.

16. Which of the following best describes an "accountability vacuum" in agent-assisted workflows?

Correct. Accountability vacuums form when everyone assumes someone else reviewed the output. The result is active use of content no human has actually taken ownership of.

Incorrect. An accountability vacuum describes a human organizational failure: agent output is being used but no person has formally taken responsibility for verifying its accuracy.

17. Allen & Overy responded to context drift in Harvey sessions by implementing what protocol?

Correct. The re-anchor protocol treated periodic context restatement as a professional standard — recognizing that the drift problem was structural and required a systemic response.

Not quite. Allen & Overy's response was a re-anchor protocol: associates were trained to periodically restate the deal's key parameters during long sessions to counteract the gradual drift they had documented.

18. Which memory pattern is best for a long research task where early findings must influence a final deliverable written many steps later?

Correct. RAG with summarization compression retrieves semantically relevant early findings at each step while keeping the context budget manageable — the right pattern for multi-step research workflows.

Not quite. RAG with summarization compression is the right pattern: it retrieves relevant early-step findings at each later step without blowing the context budget on full history.

19. HubSpot's 2023 AI content merge requirements specifically addressed the accountability vacuum by:

Correct. Three specific conditions — named owner, revision history entry, verified factual claims — ensured a human was accountable and had genuinely engaged with the content.

Incorrect. HubSpot solved the accountability vacuum through three specific attestations: a named owner, revision history documentation, and verified factual claims.

20. The three degradation patterns in long AI sessions described in Lesson 2 are goal drift, constraint erosion, and:

Correct. Persona drift is when Claude stops using the audience calibration established in the opening and reverts to a default professional register — separate from goal drift and constraint erosion.

Not quite. The third degradation pattern is persona drift: Claude reverts from the audience calibration you established to its default register, losing the specific tone and vocabulary level you specified.