Say It Right: Talk to AI · Introduction

The Interface That Changed What Intelligence Means

Every transformative tool arrives with a learning curve nobody warned you about

In 1876, when Alexander Graham Bell transmitted the first telephone call to his assistant Thomas Watson, the immediate question wasn't philosophical — it was practical: how do you talk to this thing? Early telephone users didn't know whether to shout or whisper, whether to say "hello" or announce their name, whether to wait for a tone or just begin speaking. The Bell Telephone Company had to issue instruction cards. Western Union, which had declined to purchase Bell's patent for $100,000, dismissed the device as having "no commercial possibilities." The gap between a technology's existence and a person's ability to use it effectively turned out to be the most consequential variable of the entire industrial era.

In November 2022, OpenAI released ChatGPT. Within five days it had one million users. Within two months, one hundred million — the fastest consumer adoption in recorded history. Unlike the telephone, no instruction card came with it. Users arrived and typed what felt natural: short, vague, hopeful fragments. Many got mediocre results and concluded the tool was overhyped. Others stumbled into a phrasing that produced something startling, and couldn't explain why it worked. The interface was deceptively simple — a text box — and that simplicity concealed an enormous skill gap between a casual user and an effective one.

This course is about closing that gap. It covers the mechanics of prompting: how to give AI the context, format, tone, and constraints it needs to be genuinely useful. You won't finish this course knowing everything about AI — the field moves too fast for that promise. What you will finish knowing is how to construct a request that gets you something worth reading, how to debug a bad result, and how to iterate toward the output you actually needed. These are durable skills, valid across models and across years.

If you finish every module, here's who you become:

You'll understand why a vague prompt produces a vague answer — and exactly what to add to fix it.
You'll know how to give AI context, format, tone, and constraints so it stops guessing what you actually want.
You'll be able to read a bad AI response and diagnose whether the problem was specificity, framing, or missing instructions.
You'll construct prompts that assign AI a role or character, shaping its voice to fit your purpose rather than its default.
You'll recognize the difference between prompts that help AI do good work and prompts that push it toward unreliable outputs.
You'll finish the Big Prompt Challenge having built and iterated a real prompt from scratch — not a classroom exercise, a usable one.
You're becoming someone who treats AI as a tool to be directed, not a lottery to be hoped at.

Say It Right: Talk to AI · Lesson 1

The Magic Words That Aren't Magic

What a prompt actually is, and why "just ask it" is incomplete advice

Why do two people asking the same question get wildly different answers?

In early 2023, the legal firm Mata v. Avianca became a landmark case — not for its aviation law, but for what its attorneys submitted to a federal court. New York lawyers Steven Schwartz and Peter LoDuca used ChatGPT to research precedents. The AI produced six citations: plausible case names, plausible docket numbers, plausible judicial language. Every single one was fabricated. The attorneys had asked the model for case citations without specifying that those cases needed to actually exist, without asking it to flag uncertainty, and without instructing it to distinguish verified sources from generated text. Judge P. Kevin Castel fined the firm $5,000 in June 2023. The lawyers' error wasn't using AI — it was not knowing how to talk to it.

The prompt they almost certainly typed was something like: "Find cases supporting our argument that…" That phrasing contains no instruction about source verification, no acknowledgment of the model's hallucination tendency, no request for confidence levels. The tool responded to exactly what it was asked — and what it was asked left the door wide open to confabulation. The lesson isn't that AI is dangerous. The lesson is that the input shapes the output in ways that aren't obvious until you understand what a prompt actually does.

What Is a Prompt, Precisely?

A prompt is any text you send to a language model to elicit a response. That's the mechanical definition. The more useful definition: a prompt is a specification. It specifies what you want, how you want it, what constraints apply, and what context the model needs to respond accurately.

Language models don't "understand" your intent — they predict the most statistically likely continuation of your input based on patterns in their training data. When your input is sparse, the model fills the gaps with its best guess about what you probably meant. Those guesses are often wrong, or right on average but wrong for your specific situation.

Think of it this way: if you call a contractor and say "fix the thing in the bathroom," you'll get a result shaped entirely by whatever that contractor assumes you meant. If you say "replace the wax ring seal on the toilet in the second-floor bathroom — the floor tiles are original 1940s ceramic, don't crack them," you've given a specification. The quality of the specification determines the quality of the work.

Why This Matters Right Now

A 2023 study by researchers at the Wharton School found that GPT-4's performance on business tasks varied by more than 40 percentage points depending on prompt quality — with identical underlying tasks. The model's capability was constant. The prompt was the variable.

The Four Elements Every Prompt Can Contain

Not every prompt needs all four. But knowing they exist lets you diagnose why a response missed the mark.

TaskWhat you want done. The verb. "Summarize," "write," "compare," "extract," "explain." The more specific the verb, the better.

ContextBackground the model needs to do the task well. Who you are, what the document is about, what audience you're writing for, what constraints exist.

FormatHow you want the output structured. Bullet list, numbered steps, a table, prose paragraphs, code, a JSON object. If you don't specify, the model guesses.

ConstraintsWhat to avoid or limit. "Don't include jargon." "Keep it under 200 words." "Cite only peer-reviewed sources." "Flag anything you're uncertain about."

The Default State: Underspecified

Most first-time users operate almost entirely at the Task level. They type a request that contains a verb and a noun — "write an email," "summarize this," "explain photosynthesis" — and leave the other three elements to chance. The model handles this by averaging: it writes an email in the most common register, summarizes at a generic length, explains photosynthesis at a textbook level.

For many tasks, averaging works fine. You wanted a quick email draft; the generic version is close enough to edit. The problem emerges when you needed something specific: the email had to land with a skeptical CFO, the summary had to be three sentences for a social media post, the explanation had to target an eight-year-old. None of that was in the prompt. The model had no way to know.

The attorneys in Mata v. Avianca operated in this default state. Their task was clear. Their context, format, and constraints — especially the constraint that citations must be verifiable — were absent. The model did exactly what it was asked to do. It just wasn't asked enough.

The Core Principle of This Course

AI doesn't fail you because it's broken. It produces mediocre output because the prompt was underspecified. Every technique in this course is a method for moving from underspecified to well-specified — without requiring you to write a paragraph of instructions for every request.

Prompting Is a Skill, Not a Trick

You'll encounter the phrase "prompt engineering" online, often attached to the implication that there are secret phrases — magic words — that unlock better AI performance. Some of this is real: certain phrasings do reliably outperform others, and researchers have documented them. "Let's think step by step" genuinely improves reasoning outputs, as shown in a 2022 Google Brain paper by Kojima et al. Role assignment ("You are an expert in…") shifts register and often improves domain accuracy.

But these techniques work because they add specification — they give the model more information about what kind of response is expected. They aren't incantations. Understanding why they work makes you able to adapt them, combine them, and invent your own. That's what separates someone who's memorized a few tricks from someone who can talk to AI effectively across any task they encounter.

The four elements above — Task, Context, Format, Constraints — are the underlying structure. Every lesson in this course is an elaboration of one or more of those elements. By the end, you'll be able to decompose any prompt you write and identify exactly where it's underspecified, which means you'll be able to fix it.

Lesson 1 Quiz

The Magic Words That Aren't Magic · 5 questions

1. In the Mata v. Avianca case, what was the attorneys' primary error when using ChatGPT?

Correct. The attorneys' prompt gave no instruction that citations must correspond to real cases, no request to flag uncertainty, and no constraint distinguishing verified sources from generated text.

Not quite. The core issue wasn't which tool or whether they read it — it was that the prompt contained no constraint requiring real, verifiable citations. The model produced what it was asked for.

2. Which of the four prompt elements was most critically absent in the attorneys' ChatGPT query?

Correct. The task ("find cases supporting our argument") was clear. What was missing were constraints — specifically the requirement that any case cited must actually exist and be verifiable.

The task was present — they asked for cases supporting their argument. The missing element was constraints: no instruction to verify sources, flag uncertainty, or limit output to real cases.

3. According to the lesson, why does adding "You are an expert in…" to a prompt often improve results?

Correct. Role assignment works because it provides context — it tells the model what kind of response is expected in terms of vocabulary, depth, and domain. It's a specification tool, not a magic phrase.

There's no hidden expert mode. Role assignment works because it adds specification — it gives the model more information about the register and domain knowledge expected in the response.

4. The Wharton School study mentioned in the lesson found that prompt quality caused GPT-4's performance on business tasks to vary by approximately how much?

Correct. The study found performance varied by more than 40 percentage points on identical tasks — with the model's underlying capability held constant. The prompt was the differentiating variable.

The lesson cites a Wharton study finding performance variation of more than 40 percentage points — a substantial gap attributable entirely to prompt quality, not model capability.

5. What does the lesson mean when it says a language model "averages" when given an underspecified prompt?

Correct. When a prompt lacks context, format, and constraints, the model fills the gap by predicting the most statistically typical response — which is often correct on average but wrong for your specific situation.

The lesson uses "averages" to describe the model defaulting to the most generic, statistically common response — filling in the gaps left by an underspecified prompt with the most likely interpretation, not your actual intent.

Lab 1: Diagnosing the Underspecified Prompt

Practice identifying which of the four elements are missing — then fix it

Your Task

You'll work with the AI lab assistant to practice identifying what's missing from weak prompts and rewriting them with the four elements: Task, Context, Format, and Constraints. Start by sharing a vague prompt you want to analyze — something like one you might have typed before learning this framework.

Have at least 3 exchanges to complete the lab. Ask the assistant to evaluate your prompts, suggest improvements, and explain which elements were added.

Try starting with: "Here's a prompt I want to analyze: 'Write me something about climate change.' What's missing from it?"

AI Lab Assistant

Lab 1

Welcome to Lab 1. I'm here to help you practice the four-element prompt framework: Task, Context, Format, and Constraints. Share any weak or vague prompt — something you might have typed before today — and we'll dissect it together. What have you got?

Say It Right: Talk to AI · Lesson 2

Context Is the Difference Between Generic and Useful

What the model doesn't know about you, it makes up

What do you actually have to tell an AI to get a response that fits your real situation?

In 2023, a team at Stanford Medicine published a study testing whether GPT-4 could give appropriate dietary advice. When the researchers prompted the model with "What should I eat to be healthy?" they received standard, generic nutritional guidance — accurate but useless for any particular person. When they reprompted with a patient's specific details — age, weight, diabetes diagnosis, current medications, kidney function levels — the model produced advice that closely aligned with what a registered dietitian would recommend for that specific case. The task was identical. The context transformed the output from a pamphlet into something clinically relevant.

This is the central dynamic of context: the model will fill in missing information, just not necessarily with your information. Without context, it fills the gap with the statistical average of everyone who has ever asked that kind of question. The average healthy adult's dietary advice is not the right answer for a 67-year-old with stage 3 chronic kidney disease. The model wasn't wrong in the first case — it just didn't know who it was talking to.

The Three Layers of Context

Context isn't one thing. It breaks into three distinct layers, each solving a different problem:

Who you areYour role, expertise level, profession, or relevant background. "I'm a first-year medical student" produces different output than "I'm a practicing cardiologist" — even on the same question.

What the situation isThe document, dataset, project, or problem you're working with. Pasting the actual text you want analyzed, explaining the business problem, describing the audience for the content you need written.

What success looks likeWhat the output will be used for. "This is for a board presentation" versus "this is for my personal notes" versus "this will be published in a newspaper" — each implies different standards, registers, and depth.

How Much Context Is Too Much?

A reasonable worry: if context improves results, should you write a paragraph of background for every prompt? The answer is no — but not because context hurts. More context almost never degrades output quality. The cost is your time.

The practical rule is proportionality. For a quick factual lookup, context adds little value — the model knows what the capital of France is regardless of who you are. For a task where your specific situation shapes the right answer — writing a performance review, diagnosing why a marketing campaign failed, drafting a negotiation email — the context investment pays back immediately in output quality.

The highest-leverage context move for most users is establishing a system context at the start of a conversation: one paragraph that tells the model your role, your project, and your standards. You write it once. It shapes every response that follows. Many power users keep a saved "context block" they paste into new conversations to avoid rewriting it each time.

Documented Pattern

A 2024 analysis of 1,000 prompts by AI research group Anthropic found that prompts including user role and intended audience produced outputs rated "highly relevant" by domain experts at nearly twice the rate of prompts without that information — with no other differences in prompt structure.

The Audience Problem

One of the most commonly omitted pieces of context is audience. When you ask AI to "explain" something, the model defaults to a generic educated adult — roughly high school graduate reading level, no domain expertise assumed. That default is wrong for most real use cases.

If you're writing materials for sixth graders, that default produces text that's too advanced. If you're writing for a room of PhD economists, it produces text that's too basic. Neither case is the model's fault — you didn't tell it who would read the output.

Adding audience specification is one of the fastest, highest-return context additions available: "Explain this to a high school junior who has never taken chemistry" or "Explain this assuming the reader has a graduate degree in economics but no background in machine learning." The model has the range to handle both. You just have to tell it which register to use.

Practical Takeaway

Before sending any substantive prompt, ask yourself: does the model know who I am, what I'm working with, and who this output is for? If all three answers are no, you're operating in the default state. Add one sentence for each missing layer — it takes thirty seconds and often doubles output quality.

Lesson 2 Quiz

Context Is the Difference Between Generic and Useful · 5 questions

1. In the Stanford Medicine study, what changed between the first and second dietary advice prompt?

Correct. The task stayed the same ("give dietary advice"). What changed was context: age, weight, diabetes diagnosis, medications, and kidney function levels. That context shifted the output from generic to clinically relevant.

The task didn't change — both prompts asked for dietary advice. The difference was context: the second prompt provided patient-specific medical details that transformed the output from generic to clinically relevant.

2. Which of the following is an example of "what success looks like" as a context layer?

Correct. "This summary will be read by our board of directors" describes the use case and audience — what success looks like in deployment. The other options represent who you are, situation, and format respectively.

The three context layers are: who you are (role/expertise), what the situation is (the document/problem), and what success looks like (use case/audience). "Board of directors at the annual meeting" is the use-case context — what success looks like.

3. When is adding detailed context LEAST likely to improve output quality?

Correct. For universal factual lookups — "What is the boiling point of water?" — context adds little value. The model knows the answer regardless of your role or situation. Context investment pays off when the right answer depends on who you are.

The lesson's rule is proportionality. Context is most valuable when the right answer depends on your specific situation. For universal facts — things with the same answer for everyone — context adds little value.

4. What does the lesson call the highest-leverage context move for most users?

Correct. A system context block — one paragraph establishing your role, project, and standards at the start of a conversation — shapes every subsequent response without requiring you to repeat context in each individual prompt.

The lesson recommends a system context block at the start of the conversation: a single paragraph covering who you are, your project, and your standards. It's written once and shapes all responses that follow.

5. According to the Anthropic analysis cited in the lesson, prompts that included user role and intended audience produced "highly relevant" outputs at approximately what rate compared to prompts without that information?

Correct. The analysis found that including role and audience context produced "highly relevant" ratings from domain experts at nearly double the rate — with no other structural differences in the prompts.

The Anthropic analysis found that adding user role and intended audience produced outputs rated "highly relevant" at nearly twice the rate of prompts without that context — with all other variables held equal.

Lab 2: Adding Context That Changes the Answer

See the same prompt produce different outputs as you layer in context

Your Task

In this lab you'll practice the three context layers: who you are, what the situation is, and what success looks like. Start with a bare-bones prompt on any topic — then iteratively add context layers and observe how the response changes.

Have at least 3 exchanges. Ask the assistant to show you how adding each layer changes the output, or request a before/after comparison.

Try starting with: "I want to practice adding context. Here's my bare prompt: 'Write an introduction for a report.' Help me layer in all three context types."

AI Lab Assistant

Lab 2

Welcome to Lab 2. We're focusing on context — the three layers that transform generic responses into useful ones: who you are, what the situation is, and what success looks like. Share a bare prompt and we'll build context around it together. What are you working on?

Say It Right: Talk to AI · Lesson 3

Format Is Not Cosmetic

How the shape of a response determines whether it's actually usable

Why does asking for the same information in different structures produce fundamentally different utility?

When journalists at The Guardian began using AI tools for research assistance in 2023, their editorial team quickly documented a recurring problem: the AI produced accurate information in formats that made it unusable for journalism. A reporter asking for background on a political story would receive a continuous essay — accurate, well-structured — but formatted for a Wikipedia entry, not for a reporter who needed scannable facts to cross-check against sources. The same tool, prompted to return findings as a numbered list of claims with confidence levels attached, produced material reporters could actually work with. The information was largely identical. The format determined whether the output went straight into the workflow or required manual restructuring first.

This is the underappreciated dimension of prompting. Most guides focus on getting the AI to say the right thing. Fewer address getting it to say the right thing in the right shape. For many professional tasks, the shape is more immediately important — a perfectly accurate answer buried in an unnavigable wall of prose fails the person who needed a quick table they could paste into a slide deck.

Format Options and When to Use Each

Language models can produce output in almost any structure you specify. The most commonly useful formats, and their appropriate contexts:

Prose paragraphsBest for narratives, explanations that require flow, or outputs that will be read top to bottom. Poor for scanning, comparison, or extraction.

Numbered listsBest for sequential steps, ranked items, or discrete facts. The numbering signals order matters.

Bullet listsBest for unordered discrete items. Fast to scan. Loses nuance and connection between ideas.

TablesBest for comparison across multiple attributes. Requesting a table forces the model to be structured and often surfaces gaps in the data.

Code / structured dataJSON, CSV, XML when the output feeds into another system. Specify the exact schema if you need it.

Headers + sectionsBest for long-form documents you'll navigate rather than read linearly. Specify heading hierarchy if it matters.

Length as a Format Element

Length is a format decision, not an afterthought. Language models default to a length they've learned is typical for a given kind of request — roughly one to three paragraphs for most questions. That default is often too long for a quick summary and too short for a detailed analysis.

Specifying length concretely is more effective than using adjectives. "Brief" and "detailed" are interpreted inconsistently. "Exactly three sentences," "under 100 words," "at least 500 words covering all three dimensions" — these produce reliably calibrated outputs. When you use word counts, models generally hit within 10–15% of the target.

A related technique: specify what not to include as a way of controlling length. "No preamble — start with the first recommendation" and "skip the summary at the end" are constraints that trim padding the model would otherwise add by default.

The Table Trick

Requesting a table when you're not sure what the AI actually knows is a powerful diagnostic move. Tables require the model to be explicit about every cell — they expose uncertainty and gaps that flowing prose can paper over. If the model can't fill a cell confidently, it often leaves it blank or flags it, which you wouldn't see in narrative form.

Matching Format to Downstream Use

The most important format question is: what happens to this output next? If you're going to paste it directly into a Slack message, you want plain prose — markdown formatting will render as asterisks and pound signs. If it's going into a slide deck, a table or short bullets work better than paragraphs. If a developer is consuming it programmatically, you need valid JSON with a predictable schema.

This is the kind of specification that separates AI workflows from AI experiments. In an experiment, you accept whatever comes back and work around it. In a workflow, you define the output format to match the input requirements of the next step. That definition goes in the prompt, as a format instruction.

The practical addition is one sentence: "Return the output as [format], because [downstream use]." The because clause isn't required, but it helps — telling the model why you need a specific format gives it enough context to handle edge cases you didn't anticipate.

Key Insight

Format is where the gap between "AI works" and "AI integrates into my workflow" lives. The information might be correct in any format. But usability — whether the output actually saves you time or creates reformatting work — is a format problem. Specify it explicitly.

Lesson 3 Quiz

Format Is Not Cosmetic · 5 questions

1. In The Guardian journalism example, what specifically changed between the unhelpful and helpful AI outputs?

Correct. The information was largely identical. Reporters asked for a numbered list of claims with confidence levels attached, rather than continuous essay prose. The format change made the same information actually usable in their workflow.

The lesson notes the information was largely identical. What changed was format: journalists asked for numbered claims with confidence levels instead of essay prose — making the same underlying information fit their workflow.

2. Which format is described as the best diagnostic tool for exposing gaps in what the AI actually knows?

Correct. Tables require the model to be explicit about every cell — exposing uncertainty and gaps that narrative prose can paper over. Cells left blank or flagged reveal what the model can't confidently state.

The lesson identifies tables as the diagnostic format. Because tables require explicit content for every cell, they expose gaps and uncertainties that a flowing narrative can smooth over and conceal.

3. Why does the lesson recommend using specific word counts ("under 100 words") rather than adjectives ("brief") for length control?

Correct. "Brief" and "detailed" are interpreted differently across models and contexts. Specific word counts — "under 100 words," "exactly three sentences" — produce outputs calibrated within 10–15% of the target consistently.

The lesson is practical: adjectives like "brief" are interpreted inconsistently, while specific word counts produce reliably calibrated outputs. Models generally hit within 10–15% of a specified word count target.

4. When should you specify JSON or CSV as an output format?

Correct. Structured data formats like JSON and CSV are appropriate when the output feeds into another system — a database, a script, an API. The format must match the downstream input requirements.

JSON and CSV are for machine consumption — when the output feeds into a developer's system, a database, or a script. The format should match the input requirements of the next step in your workflow.

5. What does the lesson identify as "where the gap between 'AI works' and 'AI integrates into my workflow' lives"?

Correct. The lesson's key insight is that format is where workflow integration lives. Accurate information in the wrong format still requires manual restructuring — defeating much of the productivity gain.

The lesson explicitly places this gap in format: the difference between output that plugs directly into your next step and output that requires manual restructuring before it's usable — regardless of whether the information is accurate.

Lab 3: Controlling Output Format

Request the same information in multiple formats — observe what changes

Your Task

Choose any topic you're genuinely curious about. Ask the assistant for information about it first as prose, then as a table, then as a numbered list. Notice how the format changes what's visible, what's scannable, and what's missing.

Have at least 3 exchanges. You can also ask the assistant to show you the difference between specifying "brief" versus an exact word count.

Try starting with: "Give me information about the history of the internet — first as three prose paragraphs, then I'll ask you to restructure it."

AI Lab Assistant

Lab 3

Welcome to Lab 3. We're exploring format — how the shape of a response changes its usability. Pick any topic and I'll show you how the same information looks as prose, a table, a numbered list, or structured data. What topic do you want to use as our test case?

Say It Right: Talk to AI · Lesson 4

Constraints Are How You Say No in Advance

The things you don't want are as important as the things you do

Why does telling an AI what NOT to do often matter more than telling it what to do?

In 2023, Air Canada deployed an AI chatbot to handle customer service queries. A passenger named Jake Moffatt asked it about bereavement fares — discounted tickets for travelers dealing with a family death. The chatbot told him he could buy a full-price ticket, travel, and then apply for a retroactive discount within 90 days. Air Canada's actual policy did not permit retroactive bereavement claims. Moffatt traveled, applied for the refund, was denied, and eventually took the airline to small claims court — which ruled that Air Canada was bound by what its chatbot told him. The airline was ordered to pay.

The chatbot's failure was a constraints failure. Whoever deployed it had specified what it should do — answer questions about fares, policies, services — but failed to constrain it against generating policy interpretations it wasn't authorized to make. A constraint as simple as "Do not interpret or extrapolate from policy documents — direct users to an agent for clarification" would have prevented the incident. The AI wasn't malfunctioning. It was doing exactly what a helpful assistant does when given no guardrails: it gave its best answer. The best answer was wrong, and costly.

What Constraints Do

Constraints are the defensive layer of a prompt. They don't specify what you want — they specify what you don't want, what the output must not include, and what limits apply. Their function is to narrow the space of acceptable responses and prevent the model from exercising judgment in domains where you don't want it to.

Without constraints, the model fills every gap with its best guess. That's useful for many things and harmful for others. The model's best guess about an appropriate tone, length, formality level, and scope of coverage will be wrong in predictable ways for predictable types of tasks. Constraints give you control over those dimensions without having to specify every positive instruction in advance.

The Most Useful Constraint Categories

Constraints fall into recognizable clusters. Most well-specified prompts use at least two or three of these categories simultaneously:

Scope limits"Only discuss events after 2010." "Focus exclusively on the North American market." "Answer only from the provided document — do not draw on outside knowledge."

Uncertainty flags"If you're not certain about a fact, say so." "Flag any claims that might be outdated." "Do not fabricate citations — if you don't have one, say so."

Tone and register limits"Do not use jargon." "Avoid humor — this is a formal document." "Don't use the phrase 'delve into' or 'it's important to note.'"

Authority limits"Do not give medical, legal, or financial advice — direct the user to a professional." "Do not make commitments on behalf of the company."

Format prohibitions"No preamble." "Do not include a summary at the end." "No bullet lists — write in continuous prose."

The Uncertainty Constraint: Often the Most Important

Of all constraint categories, the uncertainty flag is the one most commonly omitted and most consequential when absent. Language models produce confident-sounding text regardless of whether they're certain. Without explicit instruction to flag uncertainty, you can't distinguish between something the model knows well and something it's confabulating with equal fluency.

This was the root failure in both the Mata v. Avianca case from Lesson 1 and the Air Canada case here. Neither prompt included any version of "tell me when you're not sure." Both models produced confident text they had no business being confident about. The fix isn't complicated: "If you're unsure about any part of this response, explicitly say so and explain why." It's one sentence. The attorneys and the airline's deployment team didn't include it.

Adding uncertainty flags doesn't make AI less useful. It makes it more trustworthy. You get the same output, annotated with the model's confidence — which lets you decide where to verify before acting.

The Compound Prompt Pattern

A fully specified prompt uses all four elements together: Task + Context + Format + Constraints. Example: "Summarize [pasted article] [Task] for a non-specialist reader who has no background in climate science [Context] in three bullet points under 25 words each [Format]. Flag any claims the article itself presents as disputed, and don't include any information not present in the article [Constraints]." This sounds long — in practice it takes under a minute to write and produces an output you can use immediately.

When Constraints Feel Restrictive But Aren't

A common concern: if you over-constrain a prompt, you limit the model's ability to be creative or to find an angle you didn't anticipate. This is occasionally true for genuinely open-ended creative tasks. For professional and informational tasks, it's almost never true. Constraints for business writing, research, summarization, and analysis virtually always improve output — the model's unconstrained instincts in these domains tend toward padding, hedging, generic openings, and confident confabulation, none of which you want.

The practical test: if you've gotten a response that was technically correct but annoying in some specific way — too long, too informal, started with "Certainly!", included a disclaimer you didn't need — that's a constraints failure. The fix is to add one constraint to your next prompt. Over time, you'll build a personal library of constraints that you apply to specific task types, and prompting will get faster because you're drawing on that library rather than starting from scratch.

Putting It All Together

You now have the full four-element framework. Every well-specified prompt is an assembly of Task + Context + Format + Constraints. You don't need all four for every request. You do need to know which ones are missing and what gap that leaves — because the model will fill every gap with its best guess, and you now know how to do better than that.

Lesson 4 Quiz

Constraints Are How You Say No in Advance · 5 questions

1. In the Air Canada chatbot case, what type of constraint was missing from the chatbot's system prompt?

Correct. The chatbot needed an authority limit — a constraint preventing it from extrapolating policy interpretations it had no authorization to make. "Do not interpret policy — direct users to an agent" would have prevented the incident.

The missing constraint was an authority limit — something like "Do not interpret or extrapolate from policy documents; direct users to a human agent for clarification." Without it, the chatbot exercised judgment in a domain it wasn't authorized to operate in.

2. What does the lesson identify as the most commonly omitted and most consequential constraint category?

Correct. Uncertainty flags — "if you're unsure about any part of this, say so" — are most frequently omitted and most consequential when absent. Without them, you can't distinguish confident accuracy from confident confabulation.

The lesson explicitly names uncertainty flags as the most commonly omitted and most consequential constraint. Without them, the model produces equally confident text whether it knows something well or is confabulating — and you can't tell the difference.

3. According to the lesson, does adding constraints make AI less useful?

Correct. The lesson distinguishes open-ended creative tasks (where constraints might occasionally limit useful surprise) from professional and informational tasks — where the model's unconstrained defaults tend toward padding, hedging, and confabulation.

The lesson argues that for professional and informational tasks, constraints virtually always improve output. The unconstrained model defaults to padding, generic openings, and confident confabulation — none of which you want in a work product.

4. Which of the following best illustrates a "scope limit" constraint?

Correct. A scope limit narrows the domain the model draws from — in this case restricting it to a specific document rather than its full training data. The other options are an authority limit, uncertainty flag, and format prohibition respectively.

Scope limits define the boundary of source material or topic domain. "Answer only from the provided document" restricts the model's knowledge base to a specific source. The other options are authority limits, uncertainty flags, and format prohibitions.

5. What practical habit does the lesson recommend for building prompting efficiency over time?

Correct. The lesson recommends accumulating a personal library of constraints — noting which ones fixed recurring problems in specific task types. Over time, you apply them from memory rather than diagnosing from scratch each time.

The lesson recommends a personal constraints library: when a response is technically correct but annoying in some specific way, add one constraint and note it. Over time this library covers your common task types and reduces the prompting effort per task.

Lab 4: Building a Constrained Prompt

Assemble all four elements — Task, Context, Format, Constraints — in one prompt

Your Task

This lab is the capstone of Module 1. You'll assemble a complete four-element prompt from scratch for a real task you care about, then evaluate and refine it with the assistant's help.

Have at least 3 exchanges. The assistant will score your prompt against the four-element framework and suggest specific constraint additions. Try to add at least two constraint categories by the end of the lab.

Try starting with: "Help me build a four-element prompt. My task is: [describe something you actually need done]. Let's start by identifying which elements I've included and which are missing."

AI Lab Assistant

Lab 4

Welcome to Lab 4 — the capstone. We're going to build a fully-specified prompt together: Task, Context, Format, and Constraints, all working in concert. Tell me about something you actually need to accomplish — a work task, a personal project, anything real. We'll build the prompt from the ground up and stress-test it.

Module 1 Test

The Magic Words That Aren't Magic · 15 questions · Pass at 80%

1. What is the most precise definition of a "prompt" as used in this module?

Correct. A prompt is a specification — not just a question or command, but a definition of what you want, how you want it, what context applies, and what constraints govern the output.

The module defines a prompt as a specification: it specifies what you want done (task), what background applies (context), how output should be structured (format), and what limits apply (constraints).

2. How did ChatGPT receive one million users in 2022, and what was notable about that rate?

Correct. ChatGPT launched in November 2022 and reached one million users in five days — the fastest consumer adoption on record. Two months later it had 100 million users.

ChatGPT launched in November 2022 and reached one million users in five days — the fastest consumer product adoption in recorded history, reaching 100 million users within two months.

3. Which element of the four-element framework specifies "who you are, what you're working with, and who this output is for"?

Correct. Context covers the three layers: who you are (role/expertise), what the situation is (document/problem), and what success looks like (use case/audience).

Context is the element that covers your role, the material you're working with, and the intended audience or use case. Task is the verb (what to do), Format is the structure, Constraints are the limits.

4. The 2022 Google Brain paper by Kojima et al. documented that which phrase reliably improves reasoning outputs?

Correct. Kojima et al.'s 2022 paper documented that "Let's think step by step" reliably improves reasoning outputs — not because it's magic, but because it adds specification about the expected response process.

The lesson cites a 2022 Google Brain paper by Kojima et al. documenting that "Let's think step by step" genuinely improves reasoning outputs — it works because it specifies the expected reasoning process, not as a magic phrase.

5. In the Stanford Medicine dietary advice study, why did the second prompt produce clinically relevant output while the first did not?

Correct. Same task, same model. The difference was context: the second prompt included patient-specific medical details that shifted the output from generic population advice to individually appropriate guidance.

The task and model were identical. What changed was context: the second prompt provided patient-specific details (age, weight, diabetes diagnosis, medications, kidney function) that transformed generic advice into clinically relevant output.

6. What is the "default state" described in Lesson 1, and why is it a problem?

Correct. The default state is operating at the Task level only — giving the model a verb and noun without context, format, or constraints. The model then averages its way to a generic response that may be correct for nobody in particular.

The default state is Task-only prompting: the model gets a verb and noun but no context about who you are, no format specification, and no constraints. It fills every gap with its best average guess, which is often wrong for your specific situation.

7. Which format is described as best for comparison across multiple attributes?

Correct. Tables are best for comparison across multiple attributes — and as a bonus, they force the model to be explicit about every cell, exposing gaps and uncertainty that prose can conceal.

Tables are described as the best format for comparison across multiple attributes. They also serve as a diagnostic tool — forcing the model to be explicit about every data point and exposing uncertainty that flowing prose can paper over.

8. What constraint category would prevent a customer service chatbot from making commitments on pricing it isn't authorized to make?

Correct. Authority limits prevent the model from operating in domains it has no authorization to act in — "do not make pricing commitments; direct users to a sales representative" is an authority limit.

Authority limits constrain the model's ability to take actions or make commitments it isn't authorized to make. "Do not commit to pricing — direct users to a sales representative" is the relevant constraint category here.

9. The lesson about constraints uses both Mata v. Avianca and Air Canada as examples. What constraint was absent from BOTH cases?

Correct. Both cases involved AI producing confident text in areas of uncertainty or beyond its knowledge. Neither prompt included any instruction to flag uncertainty or acknowledge the limits of the model's knowledge.

Both cases share the missing uncertainty flag. The attorneys and the airline's deployment team both needed some version of "flag anything you're not certain about" — neither included it, and both got confidently wrong outputs as a result.

10. What does "proportionality" mean in the context of adding context to prompts?

Correct. For tasks where the right answer is universal (factual lookups), context adds little value. For tasks where your specific situation determines the right answer (drafting a negotiation email), detailed context pays back immediately.

Proportionality means matching context investment to how situation-dependent the task is. Universal facts need no context. Tasks where the right answer depends on who you are, what you're working with, and what success looks like need substantial context.

11. The Wharton School study found that prompt quality caused AI performance to vary by more than 40 percentage points. What was held constant in that study?

Correct. The tasks were identical across conditions. The model was the same GPT-4. Only prompt quality varied — proving that the performance gap was attributable entirely to how the prompt was written, not to model capability.

The model capability was held constant (GPT-4 throughout) and tasks were identical. Only prompt quality varied. The 40+ percentage point performance gap was therefore entirely attributable to how prompts were written.

12. What is a "system context block" and why is it described as high-leverage?

Correct. A system context block is written once at the conversation's start and applies to every response — making it high-leverage because the upfront investment pays across all subsequent exchanges rather than requiring per-prompt repetition.

A system context block is a practical technique: write one paragraph at the start of a conversation covering who you are, what you're working on, and your output standards. It shapes every response that follows without needing to be repeated in each individual prompt.

13. When the lesson says a model "averages" on an underspecified prompt, what underlying mechanism explains this behavior?

Correct. Language models generate text by predicting the most statistically likely continuation of input. When input is sparse, the prediction defaults to what's most common in training data — the average response for that type of request.

Language models predict the most statistically likely continuation of input based on training data patterns. An underspecified prompt has many possible interpretations — the model defaults to whichever completion is statistically most common, which is the "average" response.

14. Which of the following is the best example of a complete four-element prompt?

Correct. This prompt contains Task (summarize), Context (for a non-specialist reader), Format (three bullets under 25 words each), and Constraints (flag disputed claims only, don't add outside information). All four elements present.

The complete prompt is the one with all four elements: Task (summarize), Context (for a non-specialist reader), Format (three bullets under 25 words each), and Constraints (flag disputed claims). The others are missing one or more elements.

15. What does the lesson recommend doing when you get an AI response that is "technically correct but annoying in some specific way"?

Correct. A technically correct but annoying response is a constraints failure — some undesirable behavior was left unconstrained. The fix is identifying and adding that constraint, then noting it for future use in similar tasks.

Technically correct but annoying responses signal a constraints failure — something was left unconstrained that shouldn't have been. The lesson recommends adding one constraint to the next prompt and noting it for your personal library of task-specific constraints.