In early 2023, a group of doctors at Beth Israel Deaconess Medical Center in Boston ran an unusual experiment. They gave GPT-4 a set of real patient cases — the kind of tricky diagnostic puzzles that stump even experienced physicians — and asked the AI to figure out what was wrong.
The first round of results was unimpressive. The AI gave long, hedged, non-committal answers. It listed fifteen possible diagnoses. It said things like "further testing would be warranted." Doctors reading the outputs said it felt like asking a colleague for advice and getting a textbook back.
Then the researchers changed one thing. Instead of asking "What might be wrong with this patient?" they asked: "You are an experienced internal medicine physician. A 58-year-old male presents with these specific symptoms. Rank your top three diagnoses by likelihood and explain your reasoning for each."
The AI's performance jumped dramatically. Same model. Same knowledge. Completely different output. The researchers published their findings, and the medical community started paying close attention to something most people had ignored: the exact wording of a question changes everything.
Everyone has heard "ask a better question." It sounds like advice your teacher gives when you ask something lazy. But there's a real, mechanical reason why it matters with AI — and it's different from why it matters with a person.
When you ask a friend a vague question, they fill in the gaps using everything they know about you: your situation, your personality, what you've talked about before. Your friend has context about you specifically. An AI, by default, has none of that. Every conversation starts fresh. It doesn't know if you're a beginner or an expert, if you want a quick answer or a detailed one, if you're asking for yourself or for a school project.
So when you send a vague prompt, the AI doesn't ask clarifying questions the way a good friend would. It guesses. And it guesses toward the most common, generic version of whatever you asked. That's why vague questions produce answers that feel like they were written for everyone — and therefore for no one in particular.
AI language models predict what words should come next based on patterns in training data. A vague prompt activates the most common patterns. A specific prompt activates patterns that match your actual situation. You're not just asking differently — you're pointing the model at a different part of its knowledge.
Specificity isn't about using more words. It's about reducing the number of valid interpretations. Consider these two prompts:
Prompt A: "Help me write something about climate change."
Prompt B: "Write a 150-word introduction for a 7th grade science class presentation about how rising ocean temperatures affect coral reefs. Start with a surprising fact."
Prompt A could produce a persuasive essay, a poem, a news article, a lab report, or a speech. It has dozens of valid interpretations. Prompt B has almost one. The AI knows the length, the audience, the topic, the sub-topic, and the structure you want. It has almost no guessing to do.
The four things that do the most work when you add specificity:
Here's where it gets genuinely complicated. If you specify everything, you sometimes miss answers you didn't know to look for. The doctors in Boston got better diagnoses when they added specificity — but they were asking about cases they already had a framework for. What about the cases where the doctor doesn't know what category to put the patient in yet?
Over-specification can create tunnel vision: you tell the AI exactly what kind of answer you want, and it gives you that answer, even when a different kind of answer would have served you better. You asked for three diagnoses ranked by likelihood, so you got three diagnoses ranked by likelihood — and the AI never mentioned that the patient's symptoms didn't actually fit any standard diagnosis well.
There's a real ethical question here that nobody has cleanly solved:
If you specify your prompt very precisely, you get a useful, targeted answer — but you've also narrowed what the AI will tell you. If you ask broadly, you get generic noise — but you might occasionally stumble onto something important you didn't know to ask about. Who is responsible when a highly specified prompt causes the AI to miss the thing that mattered most? The person who wrote the prompt? The AI that followed instructions? The institution that deployed it?
Most people treat AI prompts like search queries: short, keyword-heavy, vague. They get generic answers and conclude the AI is not that useful. You now understand the actual mechanism — that specificity literally points the model at different knowledge. Every time you see someone complain that AI "doesn't really help," you can see exactly what they're doing wrong.
You've been hired to review prompts before they go to an AI system used by a real organization. Your job isn't to be nice about it — it's to find exactly what's missing and fix it. The AI assistant below will challenge your rewrites, ask what you were thinking, and won't accept "it's fine."
Start by dissecting this real-world weak prompt that was submitted to a school district's AI system:
Your task: identify every ambiguity in that prompt, then propose a rewritten version. Be specific about what you changed and why. The AI will interrogate your reasoning.
In the fall of 2022, GitHub launched a tool called Copilot — an AI that writes code alongside programmers. Early reviews were split. Some developers said it was revolutionary; others called it useless. The gap was so wide that researchers at Google got curious and ran a study.
What they found was striking. Copilot's usefulness wasn't evenly distributed. Senior engineers got dramatically more value from it than junior engineers. Not because senior engineers were better at using AI — but because senior engineers gave it more context without thinking about it. When a senior engineer wrote a comment above their code, it typically said something like: "This function needs to handle edge cases where the input array is empty or contains null values, and it should fail gracefully rather than throwing an exception." A junior engineer would write: "// sort the list."
The AI had exactly the same capability in both cases. But one user handed it a rich description of the problem; the other handed it a three-word label. The AI could only be as helpful as the context it was given. The researchers concluded that learning to give AI good context might actually be more valuable than learning to code itself.
Context is everything the AI doesn't know that it would need to know in order to give you a useful answer. It sounds obvious, but it's easy to miss because we're used to talking to people who share our situation. When you ask a friend "is this a good idea?" they already know what you've been working on, what matters to you, and what your alternatives are. The AI knows none of that.
There are roughly four types of context that matter most:
Here's something that trips up even experienced AI users: you can't give context you don't realize you have. The senior engineers in the GitHub study weren't consciously thinking "I should give the AI more context." They were just writing comments the way they always did — which happened to be rich and precise because they'd spent years learning to communicate clearly about code.
This means that the better you are at something, the better you can use AI to help with it. And the less you know about a topic, the harder it is to give the AI the context it needs to help you learn — which is a weird and uncomfortable truth. AI is most useful when you already know a lot about what you're asking about.
There's a workaround, though. When you don't know what context to give, you can ask the AI what it needs. Try starting with: "I want to [goal]. Before you answer, ask me any clarifying questions you need to give me a useful response." This flips the problem: instead of guessing what context matters, you let the AI identify the gaps.
"I'm trying to [goal]. My situation is [relevant background]. I've already tried [previous attempts]. What I need is [specific output type]. The main thing I want to avoid is [constraint]." — This single template covers all four context types and will transform most of your AI conversations.
Giving context can backfire in a specific way that researchers call framing bias. When you tell the AI your existing position — "I think the best solution is X, help me explain why" — the AI tends to support your existing position rather than challenge it. You've given it context, but you've also given it a conclusion. It will usually help you build the case for that conclusion rather than evaluate whether the conclusion is actually right.
This is particularly dangerous when you're using AI to research a decision, form an opinion, or evaluate an idea. If you walk in with a position and frame your prompt around it, you may come out with a more confident version of the same position — even if it was wrong from the start.
If AI consistently confirms whatever position you walk in with, it could make people more confident in their beliefs without making those beliefs more accurate. Is this a problem with how people write prompts? Or is it a design problem — should AI systems be built to push back on the user's framing more often, even when it's annoying? Who decides how much a tool should challenge the person using it?
Every news story about "AI said something wrong" or "AI gave terrible advice" is probably a context story. Either the user gave no context, the wrong context, or context that embedded a flawed assumption. You now have the framework to look at those stories differently — not as failures of the AI, but as failures of the conversation.
Below is a real-world scenario where someone used AI and was frustrated with the result. Your job: figure out exactly which context was missing, then propose what the original prompt should have been and why. The AI below will push you to be precise about which type of context (background, goal, constraint, quality) was missing and why it mattered.
Identify the missing context. Rewrite the prompt. Defend your rewrite to the AI below.
In the summer of 2022, the AI image generator Midjourney went public on Discord. Within weeks, a strange new profession emerged: the "prompt engineer." These were people who spent hours — sometimes days — iterating on a single image prompt, trying to generate a specific visual. They were not artists or programmers. They were people who had figured out that the way to get good AI output was not to write a perfect prompt once, but to treat prompting as an iterative process.
One of the early documented cases was a user named Andrei Kovalev, who was trying to generate a specific style for a book cover. His first prompt was four words. His final prompt — the one that produced the image he wanted — was 87 words and included artistic movement references, lighting specifications, color palette constraints, and explicit negative prompts (things to avoid). He had generated and refined over 40 versions to get there.
What Kovalev had discovered — and what the entire early Midjourney community was learning simultaneously — was that prompting is not a transaction, it's a conversation. You don't place an order and receive a product. You make a first attempt, evaluate what came back, figure out what the model misunderstood or defaulted to, and adjust accordingly. The first prompt is just the opening move.
Most people treat AI like a vending machine: put in a prompt, get out an answer, decide if it's good or bad. If it's bad, they either try again with basically the same prompt or give up. Neither of these is iteration.
Real iteration is a diagnostic process. When you get a bad response, you don't just re-ask — you ask: what specifically went wrong? Then you change exactly that thing and nothing else. That way you learn what each element of your prompt actually does.
| Non-iteration (vending machine) | Iteration (diagnostic) |
|---|---|
| "The answer was bad, let me try again." | "The answer was too general. What did I not specify that caused that?" |
| "Let me rephrase the whole thing." | "The format was wrong. Let me add a format instruction and keep everything else." |
| "This AI doesn't understand me." | "The AI gave the most generic answer to my question. What role or audience did I forget to specify?" |
| "Let me try a different AI." | "The AI understood the topic but not my goal. Let me add goal context explicitly." |
Professional prompt engineers — people who use AI for high-stakes work at companies and research institutions — tend to follow a three-move pattern, even if they don't call it that.
This three-move pattern is exactly how AI is used in professional settings — at law firms using AI to draft contracts, at hospitals using AI to summarize patient notes, at newsrooms using AI to research stories. No one sends one prompt and publishes the output. Iteration is not a workaround for bad AI; it's how the tool is actually designed to be used.
There's a risk in iteration that almost nobody talks about: the more you iterate toward a specific output, the more you can end up shaping the answer to match what you already wanted to hear. By the time you've made fifteen adjustments, the AI's output looks a lot like your original idea — polished, articulate, and confident. But you might have just spent an hour getting an AI to agree with you.
Professional fact-checkers and journalists who use AI have noted this pattern. You iterate until the AI produces something that sounds authoritative — but the content was shaped by your own choices about what to correct and what to leave alone. The final product sounds like it came from an objective source. It didn't.
If iteration lets you shape AI output toward your preferred conclusion — and the final result looks authoritative and objective — is that better or worse than writing the conclusion yourself? At least if you wrote it yourself, a reader knows it's your opinion. When an AI writes it after 15 iterations, it carries a false appearance of objectivity. Does the process of iteration introduce a new kind of deception — even when you're not trying to deceive anyone?
Every major organization using AI right now — hospitals, law firms, government agencies, newsrooms — is wrestling with iteration as a policy question. How many human reviews should happen between AI output and final use? Who is responsible when iterated AI output turns out to be wrong? You understand the mechanism behind these debates, which means you can engage with them at the same level as the adults making those decisions.
You're going to practice the three-move iteration pattern live. The AI below will respond to your prompts the way a real AI might — but it will also tell you what was wrong with your approach. Your goal: start with the broken prompt below, diagnose what failed, make one precise correction at a time, and reach a response that actually matches the target goal. You have to explain each change you make and why.
Send the starting prompt first, then iterate. After each response you get, tell the AI what you're changing and why — then send your revised prompt.
In March 2023, OpenAI quietly published a document called the "GPT-4 System Card" — a technical report on how their model behaved. Inside it, they described something they called "jailbreaking patterns": specific prompt structures that caused the model to behave in ways they hadn't intended. The document was intended as a safety warning. But something unexpected happened: the detailed descriptions of what prompt patterns were powerful enough to override the model's training attracted enormous attention from the research community — not to exploit the model, but to understand it.
What emerged from that attention was a clearer picture of how all prompts work. The same structural elements that made certain prompts powerful in dangerous ways were the same elements that made certain prompts powerful in useful ways. Researchers at Princeton published a paper in late 2023 identifying what they called "high-leverage prompt patterns" — structures that reliably got better outputs across many different tasks and many different AI systems.
These weren't tricks or exploits. They were structural patterns that aligned with how the models actually processed language. Knowing them doesn't give you magic words — it gives you a framework for building prompts that work for the same reason they've always worked.
These patterns show up across professional AI use — in research, in business, in creative fields. They work because of how language models process text, not because they're "tricks."
The real skill is combining these patterns — not stacking them randomly, but understanding which elements your specific request needs. A creative writing prompt might need Role + Audience + Negative Constraint. A research task might need Chain-of-Thought + Negative Constraint + a format specification. Here's what a combined prompt looks like in practice:
Weak: "Explain climate change to me."
Strong: "You are a science communicator who specializes in explaining complex topics to middle school students. Explain the greenhouse effect to a 7th grader who already understands what carbon dioxide is but has never heard the term 'greenhouse effect.' Use one concrete analogy. Do not use the word 'atmosphere' — replace it with a simpler term every time. Think through the explanation step by step before writing the final version."
The strong version uses: Role (science communicator), Audience (specific level + prior knowledge), Chain-of-Thought, and two Negative Constraints. Every element is load-bearing — remove any one and the output degrades.
Good prompt patterns make AI more useful — but they don't make it accurate. A perfectly structured prompt can produce a well-formatted, confidently stated, completely wrong answer. The model's fluency is not connected to its correctness. This is the most important limitation to understand.
Institutions that use AI at scale — hospitals using AI for triage notes, government agencies using AI for policy drafts, news organizations using AI for research — have had to build verification systems specifically because better-structured prompts produce more fluent and more convincing outputs, which can make false information harder to catch, not easier.
There's a version of this that affects everyday users too. When you learn to write prompts well, the AI's answers start to sound more authoritative. They're better formatted, more confident, more detailed. That can make you trust them more — at exactly the moment when you should be verifying them more carefully.
If skilled prompting produces more fluent and authoritative-sounding outputs, does learning to prompt well make AI more dangerous for the people around you — not you personally, but people who receive your AI-assisted work and trust it because it sounds good? Is there a responsibility that comes with being good at prompting? What would that responsibility look like?
You understand the four prompt patterns that research has identified as high-leverage. You know why they work mechanically. You know their limits. Most people using AI daily have never thought about any of this — they write prompts the way they'd write a search query and wonder why the results feel shallow. You now have a complete framework: specificity, context, iteration, and structure. That's the full toolkit.
You're designing a prompt for a real use case. A middle school science teacher wants to use AI to generate quiz questions for her students — but every time she tries, she gets either too-easy recall questions or confusingly advanced ones. She's asked you to design a prompt that works.
Your challenge: build a complete prompt using all four patterns — Role, Audience, Chain-of-Thought, and Negative Constraint — for her use case. Then defend every element of your design. The AI below will test each one: "Why that role? What if you'd used a different audience framing? What does that negative constraint rule out?"