When OpenAI released ChatGPT Plugins in May 2023, the first wave of public demos revealed a striking pattern: the same plugin produced wildly different results depending on how users phrased their requests. A travel plugin would return vague suggestions for "plan a trip to Japan" but would produce detailed day-by-day itineraries when asked for "a 7-day Japan itinerary for a first-time visitor with a $150/day budget, prioritizing temples and food markets, formatted as a daily schedule." The underlying AI was identical. The prompt was the variable.
This became one of the clearest public demonstrations that prompt quality is the primary lever users control β and it launched a wave of interest in what would soon be called prompt engineering.
Over five previous modules you have built five distinct skills. In Module 6 you will practice using all of them together on demanding, multi-part tasks β the kinds of challenges you actually face in school, work, and creative life.
Think of the five layers as a stack. Each layer you add narrows the AI's uncertainty about what you want:
You can know all five layers and still write weak prompts if you apply them in isolation. The challenge is weighing them: sometimes a detailed role matters more than format; sometimes a clear task statement does more work than any amount of context. Expert prompters develop a sense for which layers are load-bearing in a given situation.
In 2023, researchers at Google DeepMind published a study examining which prompt elements most reliably improved output quality on complex reasoning tasks. Their finding: explicit task decomposition β breaking a multi-step task into numbered sub-tasks in the prompt itself β was the single highest-leverage technique across model families. Not role, not format, not tone β but clarity about what steps needed to happen and in what order.
For simple tasks, any one or two layers may be enough. For complex tasks, all five layers matter β and the order in which you present them affects how the model parses your intent. Generally: role β context β task β format β iteration cues.
Expert prompters think in reverse: they start with the output they want and ask "what does the AI need to know to produce this?" That reverse-engineering instinct lets them skip layers that don't add information and deepen layers that carry the most uncertainty.
For example: if you ask for a poem, format is obvious (it's a poem). But role and tone may be everything. If you ask for a legal summary, role and format are critical. If you ask for a creative brainstorm, context and iteration matter most. The skill is knowing which layers to load for the task at hand.
In the four lessons of this module, you will face four distinct challenge types β each requiring a different emphasis across the five layers. You will also learn a simple self-evaluation framework so you can judge your own prompts before you even send them.
By the end of Module 6, you should be able to write prompts for complex real-world tasks on your first attempt β without relying on trial and error to find the right framing.
Your lab assistant will give you a complex task scenario. Your job is to write a single prompt that uses all five layers β Role, Context, Task, Format, and Iteration cue. The assistant will score your attempt and suggest improvements.
Complete at least 3 exchanges to unlock the next lesson.
When Anthropic released Claude 2 in July 2023, early adopters discovered that the model's responses to open-ended creative requests were often technically competent but generically styled β safe, balanced, uninteresting. Writers who learned to revise their prompts by specifying what to avoid as well as what to include found they could unlock strikingly different output. One documented pattern on the LessWrong forums: adding a single negative constraint β "avoid clichΓ©s, do not start with a weather description, do not use the word 'journey'" β consistently produced more original prose than adding three positive instructions.
The lesson: sometimes subtraction is more powerful than addition in prompt revision.
Most weak AI responses trace back to one of five root causes. Knowing which root cause is at work tells you exactly how to revise:
The AI guessed at what you meant. Fix: restate the task as a specific verb + object + scope.
The AI wrote for a generic reader. Fix: name your exact audience β their age, expertise level, or goals.
The AI defaulted to formal/cautious. Fix: give a tone instruction with an example of the style you want.
The AI produced a wall of text. Fix: explicitly request bullets, headers, tables, or numbered steps.
The AI answered a broader question than you asked. Fix: add a constraint β "focus only on X" or "do not include Y."
The AI gave obvious examples. Fix: say "give me non-obvious examples" or "avoid the most common answers."
The single most common prompting mistake is revising by adding more words without diagnosing the actual problem. If the response is too long, adding "be concise" rarely works as well as adding "respond in exactly 3 bullet points." If the tone is wrong, adding "be more interesting" rarely works as well as naming a specific voice β "write like a seasoned science journalist explaining to a curious 16-year-old."
Diagnosis first, then targeted revision. This is the difference between iterating in circles and converging on the output you want.
"Make it better and more interesting and more detailed please."
"Rewrite this as a numbered list of 5 concrete steps. For each step, add one specific example from a real company. Keep total length under 200 words."
"That's too long. Try again."
"Condense this to a single paragraph of exactly 3 sentences. Keep the main claim and drop all the qualifications."
Build a library of precise revision phrases so you don't fall back on vague adjectives:
After every AI response that isn't quite right, pause and name the root cause before typing your revision. One named root cause + one specific fix is worth more than three vague adjectives.
The lab assistant will show you a weak AI response and a weak revision attempt. Your job is to (1) name the root cause of the problem, and (2) write a better, targeted revision. Complete at least 3 exchanges to unlock Lesson 3.
In September 2023, the Harvard Business School published a study involving 758 consultants at Boston Consulting Group who were given access to GPT-4 for a series of complex business tasks. The consultants who performed best were not those who wrote the longest prompts β they were those who decomposed complex tasks into distinct phases: first asking for an analysis framework, then using that framework to analyze specific data, then asking for recommendations grounded in the analysis. The worst-performing approach was asking a single massive question expecting one complete answer. The research was widely cited as evidence that sequential, multi-prompt strategies outperform single-shot attempts on complex tasks.
Complex real-world tasks β writing a research paper, planning a project, preparing for a negotiation β have natural phases. Each phase benefits from its own focused prompt. The key is knowing where the natural seams are:
In a multi-prompt workflow, each new prompt should explicitly reference the work done in previous steps. Don't assume the AI remembers what matters to you β re-anchor with a one-sentence summary of where you are:
"We've established that [summary of previous step]. Now I need you to [specific next task]." β This one-sentence anchor prevents the AI from drifting back to generic responses and keeps the conversation on track.
Here is how a four-phase approach looks on a concrete task β writing a 1,500-word essay on renewable energy transition for a high school environmental science class:
Not every task needs four phases. Use a single combined prompt when:
Split into multiple prompts when:
The BCG consultants who used AI most effectively treated it as a capable collaborator on focused sub-tasks, not a machine that could absorb a complex brief and deliver a finished product in one shot. Phase your work. The AI is ready for each phase when you are.
Work through a multi-phase task with your lab assistant. Start with the Frame phase β ask for a structure or outline. Then, in follow-up messages, move through Fill, Review, and Refine. Your assistant will guide you through each phase. Complete at least 3 exchanges to unlock Lesson 4.
In early 2024, OpenAI published its "Prompt Engineering Guide" as part of its developer documentation. One section described what its researchers called the "zero-shot quality audit": before sending any complex prompt, ask yourself whether a new colleague β someone smart but unfamiliar with your context β could read the prompt and know exactly what to do. If they would have to ask clarifying questions, the prompt isn't ready. This heuristic, simple as it sounds, became one of the most referenced prompt quality checks in developer communities throughout 2024. It works because it externalizes your evaluation β you stop checking whether you know what you meant and start checking whether the prompt communicates it to someone (or something) that doesn't.
The CRAFT rubric gives you a five-point checklist you can run on any prompt before sending it. Score each criterion 0β2 (0 = missing, 1 = partial, 2 = complete). A prompt scoring 8β10 is ready. A prompt scoring below 6 needs revision.
| Letter | Criterion | Score 0 | Score 1 | Score 2 |
|---|---|---|---|---|
| C Clarity | Is the task unambiguous? | Multiple interpretations possible | One clear interpretation but scope uncertain | Exactly one interpretation, clear scope |
| R Role | Is the AI's role specified? | No role given | Vague role ("be an expert") | Specific role with relevant expertise named |
| A Audience | Is the target audience defined? | No audience mentioned | General audience implied | Specific audience with relevant attributes named |
| F Format | Is the output format specified? | No format instruction | Vague format ("a list" or "a paragraph") | Specific format with length, structure, or style |
| T Task Scope | Are the limits of the task clear? | No scope limits β could go anywhere | Partial scope ("focus on X" but no exclusions) | Both inclusion and exclusion criteria stated |
Here is a prompt scored with the CRAFT rubric:
"You are an experienced high school debate coach. Write a 200-word argument for a 9th-grader preparing to argue that social media does more harm than good in a formal school debate. Use 3 bullet points. Do not include statistics β focus on logical reasoning only."
C β Clarity: 2/2. The task is "write a 200-word argument for X position" β exactly one interpretation.
R β Role: 2/2. "Experienced high school debate coach" β specific and relevant.
A β Audience: 2/2. "9th-grader preparing for a formal school debate" β specific with context.
F β Format: 2/2. "200 words, 3 bullet points" β specific length and structure.
T β Task Scope: 2/2. Inclusion: logical reasoning. Exclusion: no statistics. Both stated.
Total: 10/10. This prompt is ready to send.
After running the CRAFT rubric, apply the "New Colleague" test from OpenAI's 2024 guide: imagine a smart person with no background on your task reads this prompt. Could they complete the work without asking a single clarifying question? If they would need to ask "who is this for?" or "how long should it be?" or "what format do you want?" β your prompt needs one more pass.
These two tools together β CRAFT scoring plus the New Colleague test β give you a reliable pre-send quality check that works across any topic, format, or AI system.
Expert prompters keep a library of their best prompts. When they find a prompt that scores 9β10 on CRAFT and produces excellent results, they save it as a template. The next time they face a similar task, they adapt the template rather than starting from scratch. Over time, this portfolio becomes one of their most valuable tools.
Think of your prompt portfolio as a cookbook: each saved prompt is a tested recipe that you can adapt for new ingredients. The format, role, and scope instructions stay the same β only the specific topic or content changes.
You are not done learning to prompt β you are just beginning to learn it systematically. The CRAFT rubric, the New Colleague test, the five-layer stack, task decomposition, and targeted revision are tools you will use for years. The more you use them, the faster and more instinctive they become.
Write a prompt for a complex task, then score it using the CRAFT rubric (C, R, A, F, T β each 0β2, total out of 10). Share both the prompt and your self-score. Your lab assistant will give you feedback on your scoring accuracy and suggest how to improve lower-scoring criteria. Complete at least 3 exchanges to complete the module.