In 1876, when Alexander Graham Bell transmitted the first telephone call to his assistant Thomas Watson, the immediate question wasn't philosophical — it was practical: how do you talk to this thing? Early telephone users didn't know whether to shout or whisper, whether to say "hello" or announce their name, whether to wait for a tone or just begin speaking. The Bell Telephone Company had to issue instruction cards. Western Union, which had declined to purchase Bell's patent for $100,000, dismissed the device as having "no commercial possibilities." The gap between a technology's existence and a person's ability to use it effectively turned out to be the most consequential variable of the entire industrial era.
In November 2022, OpenAI released ChatGPT. Within five days it had one million users. Within two months, one hundred million — the fastest consumer adoption in recorded history. Unlike the telephone, no instruction card came with it. Users arrived and typed what felt natural: short, vague, hopeful fragments. Many got mediocre results and concluded the tool was overhyped. Others stumbled into a phrasing that produced something startling, and couldn't explain why it worked. The interface was deceptively simple — a text box — and that simplicity concealed an enormous skill gap between a casual user and an effective one.
This course is about closing that gap. It covers the mechanics of prompting: how to give AI the context, format, tone, and constraints it needs to be genuinely useful. You won't finish this course knowing everything about AI — the field moves too fast for that promise. What you will finish knowing is how to construct a request that gets you something worth reading, how to debug a bad result, and how to iterate toward the output you actually needed. These are durable skills, valid across models and across years.
If you finish every module, here's who you become:
In early 2023, the legal firm Mata v. Avianca became a landmark case — not for its aviation law, but for what its attorneys submitted to a federal court. New York lawyers Steven Schwartz and Peter LoDuca used ChatGPT to research precedents. The AI produced six citations: plausible case names, plausible docket numbers, plausible judicial language. Every single one was fabricated. The attorneys had asked the model for case citations without specifying that those cases needed to actually exist, without asking it to flag uncertainty, and without instructing it to distinguish verified sources from generated text. Judge P. Kevin Castel fined the firm $5,000 in June 2023. The lawyers' error wasn't using AI — it was not knowing how to talk to it.
The prompt they almost certainly typed was something like: "Find cases supporting our argument that…" That phrasing contains no instruction about source verification, no acknowledgment of the model's hallucination tendency, no request for confidence levels. The tool responded to exactly what it was asked — and what it was asked left the door wide open to confabulation. The lesson isn't that AI is dangerous. The lesson is that the input shapes the output in ways that aren't obvious until you understand what a prompt actually does.
A prompt is any text you send to a language model to elicit a response. That's the mechanical definition. The more useful definition: a prompt is a specification. It specifies what you want, how you want it, what constraints apply, and what context the model needs to respond accurately.
Language models don't "understand" your intent — they predict the most statistically likely continuation of your input based on patterns in their training data. When your input is sparse, the model fills the gaps with its best guess about what you probably meant. Those guesses are often wrong, or right on average but wrong for your specific situation.
Think of it this way: if you call a contractor and say "fix the thing in the bathroom," you'll get a result shaped entirely by whatever that contractor assumes you meant. If you say "replace the wax ring seal on the toilet in the second-floor bathroom — the floor tiles are original 1940s ceramic, don't crack them," you've given a specification. The quality of the specification determines the quality of the work.
A 2023 study by researchers at the Wharton School found that GPT-4's performance on business tasks varied by more than 40 percentage points depending on prompt quality — with identical underlying tasks. The model's capability was constant. The prompt was the variable.
Not every prompt needs all four. But knowing they exist lets you diagnose why a response missed the mark.
Most first-time users operate almost entirely at the Task level. They type a request that contains a verb and a noun — "write an email," "summarize this," "explain photosynthesis" — and leave the other three elements to chance. The model handles this by averaging: it writes an email in the most common register, summarizes at a generic length, explains photosynthesis at a textbook level.
For many tasks, averaging works fine. You wanted a quick email draft; the generic version is close enough to edit. The problem emerges when you needed something specific: the email had to land with a skeptical CFO, the summary had to be three sentences for a social media post, the explanation had to target an eight-year-old. None of that was in the prompt. The model had no way to know.
The attorneys in Mata v. Avianca operated in this default state. Their task was clear. Their context, format, and constraints — especially the constraint that citations must be verifiable — were absent. The model did exactly what it was asked to do. It just wasn't asked enough.
AI doesn't fail you because it's broken. It produces mediocre output because the prompt was underspecified. Every technique in this course is a method for moving from underspecified to well-specified — without requiring you to write a paragraph of instructions for every request.
You'll encounter the phrase "prompt engineering" online, often attached to the implication that there are secret phrases — magic words — that unlock better AI performance. Some of this is real: certain phrasings do reliably outperform others, and researchers have documented them. "Let's think step by step" genuinely improves reasoning outputs, as shown in a 2022 Google Brain paper by Kojima et al. Role assignment ("You are an expert in…") shifts register and often improves domain accuracy.
But these techniques work because they add specification — they give the model more information about what kind of response is expected. They aren't incantations. Understanding why they work makes you able to adapt them, combine them, and invent your own. That's what separates someone who's memorized a few tricks from someone who can talk to AI effectively across any task they encounter.
The four elements above — Task, Context, Format, Constraints — are the underlying structure. Every lesson in this course is an elaboration of one or more of those elements. By the end, you'll be able to decompose any prompt you write and identify exactly where it's underspecified, which means you'll be able to fix it.
You'll work with the AI lab assistant to practice identifying what's missing from weak prompts and rewriting them with the four elements: Task, Context, Format, and Constraints. Start by sharing a vague prompt you want to analyze — something like one you might have typed before learning this framework.
Have at least 3 exchanges to complete the lab. Ask the assistant to evaluate your prompts, suggest improvements, and explain which elements were added.
In 2023, a team at Stanford Medicine published a study testing whether GPT-4 could give appropriate dietary advice. When the researchers prompted the model with "What should I eat to be healthy?" they received standard, generic nutritional guidance — accurate but useless for any particular person. When they reprompted with a patient's specific details — age, weight, diabetes diagnosis, current medications, kidney function levels — the model produced advice that closely aligned with what a registered dietitian would recommend for that specific case. The task was identical. The context transformed the output from a pamphlet into something clinically relevant.
This is the central dynamic of context: the model will fill in missing information, just not necessarily with your information. Without context, it fills the gap with the statistical average of everyone who has ever asked that kind of question. The average healthy adult's dietary advice is not the right answer for a 67-year-old with stage 3 chronic kidney disease. The model wasn't wrong in the first case — it just didn't know who it was talking to.
Context isn't one thing. It breaks into three distinct layers, each solving a different problem:
A reasonable worry: if context improves results, should you write a paragraph of background for every prompt? The answer is no — but not because context hurts. More context almost never degrades output quality. The cost is your time.
The practical rule is proportionality. For a quick factual lookup, context adds little value — the model knows what the capital of France is regardless of who you are. For a task where your specific situation shapes the right answer — writing a performance review, diagnosing why a marketing campaign failed, drafting a negotiation email — the context investment pays back immediately in output quality.
The highest-leverage context move for most users is establishing a system context at the start of a conversation: one paragraph that tells the model your role, your project, and your standards. You write it once. It shapes every response that follows. Many power users keep a saved "context block" they paste into new conversations to avoid rewriting it each time.
A 2024 analysis of 1,000 prompts by AI research group Anthropic found that prompts including user role and intended audience produced outputs rated "highly relevant" by domain experts at nearly twice the rate of prompts without that information — with no other differences in prompt structure.
One of the most commonly omitted pieces of context is audience. When you ask AI to "explain" something, the model defaults to a generic educated adult — roughly high school graduate reading level, no domain expertise assumed. That default is wrong for most real use cases.
If you're writing materials for sixth graders, that default produces text that's too advanced. If you're writing for a room of PhD economists, it produces text that's too basic. Neither case is the model's fault — you didn't tell it who would read the output.
Adding audience specification is one of the fastest, highest-return context additions available: "Explain this to a high school junior who has never taken chemistry" or "Explain this assuming the reader has a graduate degree in economics but no background in machine learning." The model has the range to handle both. You just have to tell it which register to use.
Before sending any substantive prompt, ask yourself: does the model know who I am, what I'm working with, and who this output is for? If all three answers are no, you're operating in the default state. Add one sentence for each missing layer — it takes thirty seconds and often doubles output quality.
In this lab you'll practice the three context layers: who you are, what the situation is, and what success looks like. Start with a bare-bones prompt on any topic — then iteratively add context layers and observe how the response changes.
Have at least 3 exchanges. Ask the assistant to show you how adding each layer changes the output, or request a before/after comparison.
When journalists at The Guardian began using AI tools for research assistance in 2023, their editorial team quickly documented a recurring problem: the AI produced accurate information in formats that made it unusable for journalism. A reporter asking for background on a political story would receive a continuous essay — accurate, well-structured — but formatted for a Wikipedia entry, not for a reporter who needed scannable facts to cross-check against sources. The same tool, prompted to return findings as a numbered list of claims with confidence levels attached, produced material reporters could actually work with. The information was largely identical. The format determined whether the output went straight into the workflow or required manual restructuring first.
This is the underappreciated dimension of prompting. Most guides focus on getting the AI to say the right thing. Fewer address getting it to say the right thing in the right shape. For many professional tasks, the shape is more immediately important — a perfectly accurate answer buried in an unnavigable wall of prose fails the person who needed a quick table they could paste into a slide deck.
Language models can produce output in almost any structure you specify. The most commonly useful formats, and their appropriate contexts:
Length is a format decision, not an afterthought. Language models default to a length they've learned is typical for a given kind of request — roughly one to three paragraphs for most questions. That default is often too long for a quick summary and too short for a detailed analysis.
Specifying length concretely is more effective than using adjectives. "Brief" and "detailed" are interpreted inconsistently. "Exactly three sentences," "under 100 words," "at least 500 words covering all three dimensions" — these produce reliably calibrated outputs. When you use word counts, models generally hit within 10–15% of the target.
A related technique: specify what not to include as a way of controlling length. "No preamble — start with the first recommendation" and "skip the summary at the end" are constraints that trim padding the model would otherwise add by default.
Requesting a table when you're not sure what the AI actually knows is a powerful diagnostic move. Tables require the model to be explicit about every cell — they expose uncertainty and gaps that flowing prose can paper over. If the model can't fill a cell confidently, it often leaves it blank or flags it, which you wouldn't see in narrative form.
The most important format question is: what happens to this output next? If you're going to paste it directly into a Slack message, you want plain prose — markdown formatting will render as asterisks and pound signs. If it's going into a slide deck, a table or short bullets work better than paragraphs. If a developer is consuming it programmatically, you need valid JSON with a predictable schema.
This is the kind of specification that separates AI workflows from AI experiments. In an experiment, you accept whatever comes back and work around it. In a workflow, you define the output format to match the input requirements of the next step. That definition goes in the prompt, as a format instruction.
The practical addition is one sentence: "Return the output as [format], because [downstream use]." The because clause isn't required, but it helps — telling the model why you need a specific format gives it enough context to handle edge cases you didn't anticipate.
Format is where the gap between "AI works" and "AI integrates into my workflow" lives. The information might be correct in any format. But usability — whether the output actually saves you time or creates reformatting work — is a format problem. Specify it explicitly.
Choose any topic you're genuinely curious about. Ask the assistant for information about it first as prose, then as a table, then as a numbered list. Notice how the format changes what's visible, what's scannable, and what's missing.
Have at least 3 exchanges. You can also ask the assistant to show you the difference between specifying "brief" versus an exact word count.
In 2023, Air Canada deployed an AI chatbot to handle customer service queries. A passenger named Jake Moffatt asked it about bereavement fares — discounted tickets for travelers dealing with a family death. The chatbot told him he could buy a full-price ticket, travel, and then apply for a retroactive discount within 90 days. Air Canada's actual policy did not permit retroactive bereavement claims. Moffatt traveled, applied for the refund, was denied, and eventually took the airline to small claims court — which ruled that Air Canada was bound by what its chatbot told him. The airline was ordered to pay.
The chatbot's failure was a constraints failure. Whoever deployed it had specified what it should do — answer questions about fares, policies, services — but failed to constrain it against generating policy interpretations it wasn't authorized to make. A constraint as simple as "Do not interpret or extrapolate from policy documents — direct users to an agent for clarification" would have prevented the incident. The AI wasn't malfunctioning. It was doing exactly what a helpful assistant does when given no guardrails: it gave its best answer. The best answer was wrong, and costly.
Constraints are the defensive layer of a prompt. They don't specify what you want — they specify what you don't want, what the output must not include, and what limits apply. Their function is to narrow the space of acceptable responses and prevent the model from exercising judgment in domains where you don't want it to.
Without constraints, the model fills every gap with its best guess. That's useful for many things and harmful for others. The model's best guess about an appropriate tone, length, formality level, and scope of coverage will be wrong in predictable ways for predictable types of tasks. Constraints give you control over those dimensions without having to specify every positive instruction in advance.
Constraints fall into recognizable clusters. Most well-specified prompts use at least two or three of these categories simultaneously:
Of all constraint categories, the uncertainty flag is the one most commonly omitted and most consequential when absent. Language models produce confident-sounding text regardless of whether they're certain. Without explicit instruction to flag uncertainty, you can't distinguish between something the model knows well and something it's confabulating with equal fluency.
This was the root failure in both the Mata v. Avianca case from Lesson 1 and the Air Canada case here. Neither prompt included any version of "tell me when you're not sure." Both models produced confident text they had no business being confident about. The fix isn't complicated: "If you're unsure about any part of this response, explicitly say so and explain why." It's one sentence. The attorneys and the airline's deployment team didn't include it.
Adding uncertainty flags doesn't make AI less useful. It makes it more trustworthy. You get the same output, annotated with the model's confidence — which lets you decide where to verify before acting.
A fully specified prompt uses all four elements together: Task + Context + Format + Constraints. Example: "Summarize [pasted article] [Task] for a non-specialist reader who has no background in climate science [Context] in three bullet points under 25 words each [Format]. Flag any claims the article itself presents as disputed, and don't include any information not present in the article [Constraints]." This sounds long — in practice it takes under a minute to write and produces an output you can use immediately.
A common concern: if you over-constrain a prompt, you limit the model's ability to be creative or to find an angle you didn't anticipate. This is occasionally true for genuinely open-ended creative tasks. For professional and informational tasks, it's almost never true. Constraints for business writing, research, summarization, and analysis virtually always improve output — the model's unconstrained instincts in these domains tend toward padding, hedging, generic openings, and confident confabulation, none of which you want.
The practical test: if you've gotten a response that was technically correct but annoying in some specific way — too long, too informal, started with "Certainly!", included a disclaimer you didn't need — that's a constraints failure. The fix is to add one constraint to your next prompt. Over time, you'll build a personal library of constraints that you apply to specific task types, and prompting will get faster because you're drawing on that library rather than starting from scratch.
You now have the full four-element framework. Every well-specified prompt is an assembly of Task + Context + Format + Constraints. You don't need all four for every request. You do need to know which ones are missing and what gap that leaves — because the model will fill every gap with its best guess, and you now know how to do better than that.
This lab is the capstone of Module 1. You'll assemble a complete four-element prompt from scratch for a real task you care about, then evaluate and refine it with the assistant's help.
Have at least 3 exchanges. The assistant will score your prompt against the four-element framework and suggest specific constraint additions. Try to add at least two constraint categories by the end of the lab.