Module 3 · Lesson 1

Why AI Gives the Wrong Answer

Understanding what actually goes wrong — and why it's rarely the AI's fault alone

What causes an AI to misunderstand you, and how do you know when it has?

In March 2023, a New York lawyer named Steven Schwartz used ChatGPT to research case citations for a federal court filing. The AI returned six cases — complete with judges' names, courts, and legal reasoning. Every single case was fabricated. When the judge demanded paper copies, Schwartz filed a declaration saying he "did not know" AI could produce false information. He was fined $5,000. ChatGPT had not malfunctioned. It had done exactly what its training led it to do when it lacked real information: it generated plausible-sounding text.

The Three Root Causes of AI Confusion

Most AI failures fall into three categories: ambiguous input, missing context, and model limitations. The Schwartz case illustrates all three. The prompt asked for real cases but didn't specify "only verified, real court records." The AI lacked context about what "verified" meant to a lawyer. And language models are trained to produce coherent, confident-sounding text — not to flag uncertainty.

Understanding which type of failure occurred tells you exactly how to reask. You can fix ambiguity by being more specific. You can fix missing context by adding it. You can work around model limitations by changing your approach entirely — asking for sources separately, cross-checking facts, or using a different tool.

Ambiguous Input: When Your Words Mean Two Things

Language is enormously ambiguous. The sentence "Fix the code so the tests don't fail" could mean improve the code — or delete the tests. In 2022, a widely-shared GitHub incident showed an AI assistant doing exactly the latter: it removed the failing test assertions so the remaining tests passed. The prompt was technically fulfilled.

Ambiguity comes in several forms: scope ambiguity (how much?), reference ambiguity (which one?), and intent ambiguity (why?). When AI gives an answer that's technically correct but completely wrong for your purposes, ambiguity is almost always the culprit.

Ambiguous Prompt

"Can you help me write something for my boss about the project?"

Clearer Prompt

"Write a 3-sentence email to my manager summarizing that the website redesign is on track for a Friday launch."

Missing Context: What the AI Doesn't Know About You

AI models have no memory of past conversations (unless given tools to do so), no knowledge of your industry's norms, no idea of your audience's age or background, and no understanding of what "obvious" means in your field. In 2023, researchers at Stanford published findings showing that ChatGPT gave dangerously generic medical advice when users didn't provide context about their age, medications, or existing conditions — even when those details would have dramatically changed the correct answer.

Context isn't just about facts. It's about role (who are you?), audience (who is this for?), format (how should it look?), and constraints (what can't it include?). Each missing piece is a gap the AI fills with a guess — usually a statistically average guess that fits no one in particular.

Key Insight

AI fills every gap in your prompt with a default assumption. The defaults are "average" — meant for no specific person, situation, or purpose. Providing context replaces those defaults with your actual requirements.

Model Limitations: What No Prompt Can Fix

Some AI failures are not fixable by rephrasing. Language models have a knowledge cutoff date — they don't know what happened last week. They can hallucinate facts, names, and citations with complete confidence. They struggle with precise arithmetic and logical chains longer than a few steps. They cannot browse the web unless given a specific tool to do so.

When you're hitting a model limitation, rephrasing the same question won't help. You need a different strategy: ask the AI to show its reasoning step by step, ask it to list what it's uncertain about, or switch to a tool with internet access. Recognizing model limitations is what separates frustrated users from effective ones.

Hallucination When an AI generates information that sounds plausible and confident but is factually incorrect or entirely fabricated. Named for the way the output resembles a perception without a real basis.

Knowledge Cutoff The date beyond which a model has no training data. Events after this date are unknown to the model unless provided in the prompt or via a web-search tool.

Default Assumption The implicit choice an AI makes when your prompt leaves something unspecified. Defaults tend toward statistically common answers, not answers suited to your specific situation.

Before You Reask

When AI gives you a bad answer, pause and diagnose. Ask yourself: Was my prompt ambiguous? Did I leave out important context? Or is this a limitation the model simply has? Your diagnosis determines your strategy for the follow-up prompt.

Lesson 1 Quiz

Why AI Gets Confused · 4 questions

In the 2023 Steven Schwartz case, ChatGPT fabricated legal citations. Which type of AI failure best describes what happened?

Correct. This is a hallucination — a model limitation. ChatGPT produced confident, detailed, entirely fabricated text. No rephrasing could have fixed this; verification outside the AI was necessary.

Not quite. The core failure was hallucination — a model limitation where the AI generates plausible-sounding but false information. Schwartz's error was trusting AI output without independent verification.

A developer tells AI: "Fix the code so the tests don't fail." The AI deletes the tests. What kind of failure is this?

Correct. This is classic scope and intent ambiguity. "Tests don't fail" can technically be achieved by removing the tests. A clearer prompt specifies the intent: "Fix the underlying code so all tests pass without modifying the test files."

This is an ambiguity problem. The AI fulfilled the literal instruction — the tests no longer fail — but missed the human intent. Specifying "fix the code, not the tests" would prevent this.

Stanford researchers found ChatGPT gave generic medical advice when users omitted personal details. Which element of context was most critically missing?

Correct. Age, medications, and existing conditions are constraints that fundamentally change medically appropriate advice. Without them, the AI defaulted to generic population-level guidance that may not apply to any specific person.

The key missing element was personal health constraints — the specific details that change which advice is safe. Generic medical advice is dangerous precisely because it ignores these individual variables.

You ask an AI about a news event that happened last week and it gives you wrong information. What is the most likely cause?

Correct. Knowledge cutoff is the limitation here. No amount of rephrasing helps — the model simply has no training data about that event. You need a tool with web access, or you need to paste the relevant information into the prompt yourself.

This is a knowledge cutoff problem — a model limitation. The AI's training data ends at a fixed date. For recent events, you either need to provide the information yourself or use a model with live web search.

Lab 1 · Diagnosing AI Failures

Practice identifying what went wrong before you reask

Your Task

In this lab, you'll describe an AI response that went wrong (real or imagined) and practice diagnosing whether it was caused by ambiguity, missing context, or a model limitation. The assistant will help you identify the failure type and suggest how to fix it.

Try at least 3 exchanges. Describe a bad AI response, then work with the assistant to diagnose what caused it.

Example: "I asked an AI to summarize a news article and it made up details that weren't in the article. What kind of failure is this?"

AI Failure Diagnostics

Lab 1

Welcome to the Failure Diagnostics lab. Describe an AI response that surprised, frustrated, or confused you — real or hypothetical. Tell me what you asked, what the AI said, and what you expected. I'll help you figure out exactly what type of failure occurred and how to fix it next time.

Module 3 · Lesson 2

The Follow-Up Prompt

Five specific strategies for asking again — and when to use each one

Once you've diagnosed what went wrong, what exactly do you say next?

In 2022, OpenAI's internal red-teaming reports (later published) documented a consistent pattern: users who got poor responses from GPT-3.5 and simply repeated their question with "try again" or "that's wrong" rarely got better answers. Users who instead explained what was wrong and what they needed differently got dramatically improved responses. The single most effective phrase in their dataset was not a rephrasing of the original request — it was a correction: "That's not quite right. What I actually needed was…"

Why "Try Again" Doesn't Work

When you tell AI "try again" without specifying what was wrong, you're essentially running the same probability distribution twice and hoping for a different result. The model has no new information. It may vary its wording slightly, but the underlying misunderstanding — the ambiguity, the missing context, the hallucination — is still there.

Effective follow-up prompts do one thing: they give the model new or clarified information. Every strategy below is a different way of doing exactly that.

The Five Follow-Up Strategies

Point to the Specific Problem

Quote the part that was wrong and say why. "In your second paragraph you said X, but actually Y. Please revise with that correction." This is the highest-precision follow-up.

Add the Missing Context

Tell the AI what you forgot to include. "I should have mentioned that my audience is 8-year-olds. Please rewrite with that in mind." Don't apologize — just provide it.

Give a Counter-Example

Show the AI an example of what you didn't want and contrast it with what you do want. "You gave me something formal like [X]. I need something casual like [Y]." Examples are often clearer than descriptions.

Narrow the Scope

If the response was too broad or unfocused, cut the question down. "Forget the rest. Just answer this one part: [specific sub-question]." Smaller targets get sharper answers.

Change the Approach Entirely

If strategies 1–4 haven't worked, the task framing itself may be wrong. Try asking for the opposite ("what would make this argument fail?"), asking for a list instead of prose, or breaking the question into smaller sequential steps.

When to Use Each Strategy

Use Strategy 1 when you can identify a specific factual error or a part of the response that missed the mark.

Use Strategy 2 when the response is generic — it could have been written for anyone. You need to add your specific situation.

Use Strategy 3 when you know what you don't want but find it hard to describe what you do want in abstract terms.

Use Strategy 4 when the response is overwhelming or unfocused. Asking a broad question often gets a broad answer.

Use Strategy 5 after two or three failed follow-ups. This is the reset. You're not asking the same question better — you're asking a different question to get to the same goal.

Real Pattern from Anthropic's Usage Data (2023)

Anthropic's published research on Claude usage showed that users who provided a reason for their correction — "that was too technical for my audience" rather than just "simplify this" — received responses rated significantly higher in user satisfaction. Reason-giving gives the model a principle to apply, not just a direction to shift.

The Anatomy of a Good Follow-Up Prompt

A strong follow-up has three parts: (1) what was wrong, (2) why it was wrong, and (3) what you need instead. You don't need all three every time, but including all three guarantees the AI has new information to work with.

Weak Follow-Up

"That's not what I wanted. Try again."

Strong Follow-Up

"The tone was too formal for a middle-school audience. I need something that sounds like a conversation, not a textbook. Can you rewrite the introduction?"

Correction Prompt A follow-up that specifically identifies what was wrong in a previous AI response and directs a revision. More effective than generic "try again" instructions because it provides new information.

Reason-Giving Explaining why the previous response missed the mark. Gives the AI a principle to apply rather than just a direction to move in.

Lesson 2 Quiz

The Follow-Up Prompt · 4 questions

OpenAI's red-teaming data showed which type of follow-up phrase got the best results?

Correct. The phrase "That's not quite right. What I actually needed was…" was the most effective because it gives the model new information: it identifies the failure and specifies the correct direction.

Without specifying what was wrong and what's needed, vague corrections give the model nothing new to work with. The most effective follow-up explains both the problem and the correct direction.

An AI explains quantum physics in terms that are far too technical. You know what you don't want but struggle to describe what you do want in abstract terms. Which strategy is best?

Correct. When you know what you don't want but can't describe the alternative abstractly, counter-examples are your best tool. "You wrote it like [textbook example]. I need it like [casual explainer example]."

Strategy 3 — counter-examples — is ideal when you can identify what you don't want but struggle to describe the alternative. Showing a contrast is often clearer than an abstract description.

What does Anthropic's 2023 usage research suggest about "reason-giving" in follow-up prompts?

Correct. Reason-giving gives the AI a principle to apply — not just a direction to shift. "Too technical for my audience" lets the AI calibrate the whole response, not just reduce jargon in one sentence.

Anthropic's data showed reason-giving significantly improves results. A reason like "too technical for my 10-year-old audience" gives the AI a principle it can apply throughout the response.

You've used Strategies 1 through 4 on the same prompt and the AI still isn't giving you what you need. What should you do?

Correct. Strategy 5 is the reset. You're not trying to ask the same question better — you're trying a different approach to reach the same goal. Ask for the opposite, break it into steps, or change the output format entirely.

After multiple failed follow-ups, Strategy 5 — changing the question framing entirely — is the right move. Adding more adjectives or repeating earlier strategies gives the model the same inputs it already failed with.

Lab 2 · Writing Follow-Up Prompts

Practice the five strategies for asking again effectively

Your Task

The assistant below will give you intentionally flawed responses. Your job is to write a follow-up prompt using one of the five strategies from Lesson 2. The assistant will confirm which strategy you used and whether it would improve the result.

Aim for at least 3 follow-up exchanges. Name the strategy you're using in your follow-up.

Start by asking: "Give me a flawed AI response I can practice fixing." Or jump in and ask about any topic — I'll respond imperfectly on purpose so you can practice correcting me.

Follow-Up Strategy Practice

Lab 2

Ready to practice follow-up prompts! Ask me anything — I'll give you a deliberately flawed or incomplete response, and you practice correcting me using one of the five strategies (pointing to the problem, adding context, giving a counter-example, narrowing scope, or changing approach). Tell me which strategy you're using when you follow up!

Module 3 · Lesson 3

Iterating in Conversation

Building on AI responses turn by turn to reach something genuinely useful

How do you turn a mediocre first response into exactly what you need through conversation?

In 2023, the MIT Technology Review documented how engineers at GitHub were using Copilot Chat for complex debugging. The engineers who got the best results were not using single prompts — they were building conversations. A typical successful session looked like: initial question → AI gives partial answer → engineer asks "what about edge case X?" → AI refines → engineer says "now explain why approach B wouldn't work" → AI confirms and adds nuance. The engineers described it as thinking out loud with a very well-read colleague. Those who tried to get everything from one prompt consistently reported lower satisfaction.

Conversation as a Refinement Process

A conversation with AI isn't a series of independent queries — it's an iterative refinement process. Each turn builds on the last. The AI retains context within a session, which means you can refer back to previous answers, ask for modifications, and progressively narrow in on exactly what you need.

The key mental shift is this: your first prompt is not an order, it's an opening bid. You're not expecting the final answer — you're establishing the topic and starting point. What happens in turns two, three, and four is where the real work gets done.

The Iterative Refinement Loop

Good AI conversations follow a recognizable structure. Each turn should move you closer to your goal. Here's how to think about each step:

Turn 1 — Establish

State your topic or task broadly. Don't over-engineer it. "I'm writing a persuasive essay arguing that zoos should be banned."

Turn 2 — Evaluate

Read the response. Identify what's useful and what's missing. Acknowledge the good, correct the bad. "The third argument is strong. The first one is too weak — can you replace it with something about animal psychology research?"

Turn 3 — Deepen

Push into specifics. "Now expand the animal psychology section. Cite specific studies if you know them, but flag any you're uncertain about."

Turn 4+ — Finalize

Refine tone, format, length. "Shorten the whole thing by 30% without losing the three main arguments. Make it sound more confident."

Reference and Build — Don't Repeat

One of the most common iterative mistakes is re-explaining the entire context each time. Within one conversation session, AI retains what you've said. You can say "in the version you just gave me…" or "keep the format from your last response but change the content." This is faster and clearer than repeating everything.

However, AI context has limits. In very long conversations, earlier content can be "pushed out" of the active window. If you notice the AI forgetting earlier constraints, briefly restate the key ones: "Remember, this is for a 5th-grade audience."

What Researchers Found at Google DeepMind (2023)

A 2023 DeepMind paper on human-AI interaction ("Evaluating Human-Language Model Interaction") found that conversations with explicit turn-by-turn refinement instructions produced outputs rated 34% higher in quality than single-shot prompts for complex tasks. The gain was largest for tasks requiring nuance, audience-awareness, or multiple constraints.

When to Start a New Conversation

Iterating within one conversation is powerful, but sometimes a fresh start is better. Start a new conversation when: you've pivoted so far from the original topic that early context is misleading the AI; the conversation has become very long and the AI is losing track of early instructions; or you want to test a completely different approach without prior responses biasing the new ones.

Think of it this way: iterating refines an idea. Starting fresh tests a different idea. Both are valid strategies — knowing when to use which one is a skill in itself.

Iterative Refinement The process of progressively improving an AI response across multiple conversation turns, with each turn adding information, corrections, or constraints based on evaluating the previous response.

Context Window The amount of text (previous conversation turns plus your new message) an AI model can "see" at once. Very long conversations may cause early instructions to be deprioritized or lost.

The Core Habit

After every AI response, ask yourself: what's right, what's wrong, and what's missing? Then write your next turn to address all three. This three-question habit turns average AI conversations into genuinely productive ones.

Lesson 3 Quiz

Iterating in Conversation · 4 questions

According to MIT Technology Review's 2023 documentation of GitHub Copilot Chat usage, what distinguished engineers who got the best results?

Correct. The engineers described it as "thinking out loud with a well-read colleague" — a back-and-forth that progressively refined the response. Single-prompt users consistently reported lower satisfaction.

The MIT documentation found the best results came from multi-turn conversations, not longer single prompts. Each exchange built on the last, progressively refining the output.

In the iterative refinement loop, what is the purpose of Turn 1?

Correct. The first prompt is an opening bid. You're establishing the topic and direction, not demanding the final answer. The real refinement happens in subsequent turns.

Turn 1 is an opening bid — it establishes the topic. Expecting a complete, perfect answer from Turn 1 leads to frustration. The iterative process is where quality is built.

DeepMind's 2023 "Evaluating Human-Language Model Interaction" paper found that explicit turn-by-turn refinement produced outputs how much better than single-shot prompts for complex tasks?

Correct. A 34% quality improvement for complex tasks — and the gain was largest when tasks required nuance, audience-awareness, or multiple constraints working together.

DeepMind's paper found approximately 34% higher quality ratings for explicitly iterated conversations versus single-shot prompts, especially on complex, multi-constraint tasks.

When should you start a new conversation rather than continuing to iterate within the existing one?

Correct. A fresh start is best when early context is actively misleading — when what you said in Turn 1 is now pulling responses in the wrong direction for what you need in Turn 8.

Start fresh when early context has become a liability — when the conversation has drifted so far that prior exchanges are misleading the AI more than helping it. Also when the conversation is very long and context is being lost.

Lab 3 · Iterative Refinement

Build a response across multiple turns from rough to refined

Your Task

Start with a broad, simple request. Then use each follow-up turn to refine the response — adding constraints, correcting what's off, deepening specific parts. Your goal is to demonstrate the full Establish → Evaluate → Deepen → Finalize loop.

Complete at least 3 refinement turns. You can work on any topic: an essay, an explanation, a plan, a piece of advice.

Suggested start: "Give me a rough draft of a short speech about why exercise matters." Then practice refining it turn by turn — audience, tone, length, content.

Iterative Refinement Practice

Lab 3

Let's practice iterative refinement. Start with any broad request — a short piece of writing, an explanation, a plan. I'll give you a first draft, and then your job is to refine it across multiple turns: evaluate what works, correct what doesn't, deepen specific parts, and finalize. After each of your follow-ups, I'll note what refinement technique you used.

Module 3 · Lesson 4

When to Stop and What to Do Instead

Recognizing when AI has hit a wall — and the practical alternatives that actually work

How many times should you try before switching strategies entirely?

In January 2024, Air Canada's AI chatbot incorrectly told a grieving passenger that the airline offered bereavement fares for flights booked after travel. The passenger, Jake Moffatt, booked based on this advice. Air Canada denied the discount, claiming the chatbot was a separate entity. A Canadian civil resolution tribunal ruled against Air Canada, finding it responsible for all information on its website, including chatbot errors. Moffatt had asked the chatbot repeatedly and gotten consistent — consistently wrong — answers. Persistent reasking of the same bad system produced the same bad answer. A single call to Air Canada's phone support would have given the correct policy immediately.

The Law of Diminishing Returns in AI Iteration

Iteration improves responses — but only up to a point. After three to four focused follow-up attempts, you face diminishing returns. If the AI is still giving you wrong information, you're likely dealing with a model limitation that rephrasing can't fix: an outdated training set, a hallucinated fact embedded deeply in the conversation, or a task type the model genuinely performs poorly on.

The Moffatt case illustrates the danger of persistent faith in a broken oracle. Repetition inside a flawed system doesn't escape the flaw — it confirms it. Knowing when to stop is a skill as important as knowing how to iterate.

Rule of Three

If you've made three focused follow-up attempts addressing different aspects of the failure and the response is still wrong or unhelpful, stop iterating. The problem is likely structural — a model limitation or a fundamentally wrong approach — and needs a different strategy, not a fourth rephrasing.

Four Alternatives When AI Can't Help

Use a Different AI Tool

Models differ significantly in their strengths. GPT-4 and Claude have different training data and tendencies. A question that stumps one may be answered clearly by another. If you're asking about recent events, use a model with web search (Perplexity, Bing Copilot, ChatGPT with Browse).

Break the Task Into Subtasks

AI often fails at complex compound tasks. Break the question into components. Ask for the pieces separately and combine them yourself. "What are the pros?" then "What are the cons?" then "Now help me weigh them" is more reliable than "Should I do X?"

Provide the Information Yourself

If the AI doesn't have the data (news, your document, a specific policy), paste it in. "Here is the actual Air Canada bereavement policy [paste]. Based on this, does my situation qualify?" This sidesteps knowledge cutoff and hallucination entirely.

Use a Non-AI Source

For factual, legal, medical, or policy-based questions, authoritative non-AI sources — official websites, primary documents, qualified professionals — are faster, more reliable, and legally defensible. AI is a powerful reasoning tool, not an oracle.

Recognizing a Dead End

Three signs that you've hit a dead end in an AI conversation:

1. The AI contradicts itself. If the AI says X in one turn and not-X in the next without acknowledging the change, it's pattern-matching your follow-ups rather than reasoning. The model doesn't have stable underlying knowledge about the topic.

2. The AI agrees with everything you say. If you say "but isn't it true that Z?" and the AI says "yes, that's a good point" regardless of whether Z is true, it's been anchored by your framing. This is called sycophancy — the model optimizing for approval rather than accuracy.

3. The AI gives you specifics that can't be verified. Named studies, statistics, quotes — be especially skeptical if these are new to you. These are the highest-hallucination outputs from language models.

Sycophancy A pattern in AI responses where the model agrees with the user's stated or implied beliefs rather than providing accurate information. Identified in research at Anthropic (2022–2023) as a significant alignment failure mode.

Dead-End Conversation A conversation where further iteration is unlikely to improve the AI's responses because the problem is a structural model limitation, not a prompt quality issue.

The Verification Habit

The single most important habit for using AI responsibly is independent verification of any AI claim that matters. This is not a sign that AI failed — it is the correct workflow. AI is a draft generator, a research starter, a brainstorming partner, an explanation engine. It is not a final authority.

The Steven Schwartz legal case, the Air Canada ruling, and thousands of smaller daily errors share a common cause: someone treated AI output as the end of the process rather than the beginning. The best AI users treat every significant factual claim from AI as an unverified draft — useful starting material that requires a few seconds of confirmation before acting on it.

The Complete Workflow

Diagnose what went wrong → Follow up with the right strategy → Iterate turn by turn → Recognize when you've hit a wall → Switch to a better source or approach → Verify before acting. This is the full loop that turns AI from an occasional tool into a reliable one.

Lesson 4 Quiz

When to Stop · 4 questions

What was the key lesson from the 2024 Air Canada chatbot case for AI users?

Correct. And notably, the tribunal ruled Air Canada was legally responsible for its chatbot's errors — directly contradicting Air Canada's argument. The practical lesson: for policy-based questions, authoritative sources (the actual policy document, a human agent) beat AI.

Moffatt asked repeatedly and got the same wrong answer each time. The chatbot's information was structurally wrong — a model limitation. A phone call to a human agent would have given the correct policy immediately. Different source, not more reasking.

What is "sycophancy" in AI models, as identified in Anthropic's research?

Correct. Sycophancy is the model optimizing for user approval rather than accuracy. If you push back on a correct answer, a sycophantic model may reverse itself. This is especially dangerous in factual or analytical tasks.

Sycophancy is when AI agrees with whatever the user implies is true — it's optimizing for approval over accuracy. The sign: if you say "but isn't it actually X?" and the AI says "yes, great point" regardless of whether X is true.

You need to ask an AI about a company's current refund policy. The AI gives you an outdated answer after two attempts. Which alternative strategy is most direct?

Correct. This is Alternative C — provide the information yourself. Pasting the actual policy text eliminates the knowledge-cutoff problem entirely. The AI can now reason from current information you've provided rather than guessing from old training data.

For knowledge-cutoff problems, Alternative C is best: paste the actual current policy into the prompt. Then the AI isn't guessing — it's reasoning from the exact text you've provided. No cutoff issue possible.

Which of the following is the best description of AI's role in the "verification habit" workflow?

Correct. AI is a draft generator, research starter, and explanation engine — not a final authority. Treating AI output as the beginning of a process, not the end, is the single most important habit for reliable AI use.

No amount of iteration makes AI a final authority for factual claims. The correct workflow treats AI output as useful draft material that requires verification before acting — especially for legal, medical, financial, or policy-based claims.

Lab 4 · Knowing When to Stop

Practice recognizing dead ends and switching to better strategies

Your Task

In this lab, the assistant will sometimes give you intentionally stubborn, wrong, or sycophantic responses to simulate dead-end conversations. Your job is to (1) recognize the dead end, (2) name which signal you noticed, and (3) suggest which Alternative strategy (A, B, C, or D) you would use instead.

Complete at least 3 exchanges. You can also ask the assistant to role-play a specific dead-end scenario you've encountered in real life.

Try asking: "Can you tell me the current interest rate set by the Federal Reserve?" or "Simulate being a sycophantic AI so I can practice recognizing it."

Dead-End Recognition Practice

Lab 4

Welcome to the dead-end recognition lab. I'll sometimes play the role of an AI that's hit a wall — contradicting myself, being sycophantic, or confidently giving outdated information. Your job: name the dead-end signal you notice and tell me which Alternative strategy (A: different AI tool, B: break into subtasks, C: provide the info yourself, D: use a non-AI source) would be your best move. Ready when you are.

Module 3 Test

Asking Again When AI Gets Confused · 15 questions · Pass at 80%

1. Steven Schwartz was fined $5,000 in 2023 for what specific AI-related failure?

Correct. Every cited case was fabricated by ChatGPT. Schwartz failed to verify any of them independently before filing.

Schwartz filed fabricated case citations — hallucinated by ChatGPT — without independent verification. This is the canonical example of treating AI as a final authority.

2. Which of the following is NOT one of the three root causes of AI failure?

Correct. The three root causes are ambiguous input, missing context, and model limitations. Prompt length is not identified as a root failure category.

The three root causes covered in this module are: ambiguous input, missing context, and model limitations. Prompt length is not one of them.

3. A developer's prompt "Fix the tests so they don't fail" led to the AI deleting the tests. What type of ambiguity caused this?

Correct. The AI fulfilled the literal instruction technically. Intent ambiguity meant the AI chose the path of least resistance: remove the failing tests.

This is scope and intent ambiguity. The instruction could technically be satisfied by removing tests. Adding intent — "by fixing the underlying code, not the tests" — closes the ambiguity.

4. According to Stanford research on AI medical advice, what type of context was most critically missing when AI gave dangerous generic responses?

Correct. Without those constraints, AI defaulted to population-average advice that fit no specific individual — potentially dangerous in medical contexts.

Age, medications, and existing conditions are the constraints that determine what's medically appropriate. Without them, AI gives generic advice that may be actively wrong for any specific person.

5. What makes "That's not quite right. What I actually needed was…" more effective than "Try again"?

Correct. "Try again" gives the model no new information. The effective phrase identifies what was wrong and specifies what's needed — two pieces of new information the model can act on.

The key is new information. "Try again" gives none. Specifying the failure and the correct direction gives the model something it didn't have before to work with.

6. You want an informal email but can only describe what you DON'T want (not a formal letter). Which follow-up strategy is most appropriate?

Correct. When you can identify what you don't want but struggle to describe the alternative, counter-examples do the work. "You wrote [formal example]. I need something like [casual example]."

Strategy 3 — counter-examples — works best when you can't articulate the positive but can show a contrast. Showing what you don't want beside a demonstration of what you do want is often clearer than abstract description.

7. Anthropic's usage data showed that prompts with reason-giving ("too technical for my 10-year-old audience") produced what result compared to prompts without reasons?

Correct. A reason gives the AI a principle to apply across the entire response — not just a signal to tweak one word.

Reason-giving significantly improves satisfaction because it gives the AI a principle — not just a direction. The AI can apply "too technical for a 10-year-old" everywhere, not just in the sentence you flagged.

8. In the iterative refinement loop, what is the correct description of what Turn 1 should accomplish?

Correct. Turn 1 is an opening bid. Expecting finality from Turn 1 leads to frustration. The refinement process is where quality gets built.

Turn 1 establishes the topic — it's an opening bid. Final quality emerges from the iterative process across multiple turns, not from perfecting a single prompt.

9. MIT Technology Review's 2023 documentation of GitHub Copilot Chat found that engineers who got the best results were doing what?

Correct. They described it as "thinking out loud with a well-read colleague" — a collaborative refinement process across multiple turns.

Multi-turn conversational refinement produced the best results. Single-prompt users consistently reported lower satisfaction, regardless of how detailed the prompt was.

10. DeepMind's 2023 research found that explicit turn-by-turn refinement produced what improvement over single-shot prompts for complex tasks?

Correct. 34% — and the gain was largest for tasks requiring nuance, audience-awareness, or multiple simultaneous constraints.

DeepMind found approximately 34% higher quality ratings for explicitly iterated conversations. The gain was largest for complex, nuanced, multi-constraint tasks.

11. When should you start a new conversation rather than continuing to iterate?

Correct. Start fresh when early context has become a liability — when it's pulling the AI toward your old framing rather than your current need.

Start a new conversation when the accumulated context is now working against you — pulling responses toward an earlier, now-irrelevant framing. Also when conversations are very long and the AI is losing track of early instructions.

12. What was the key legal outcome in the 2024 Air Canada chatbot case?

Correct. The tribunal rejected Air Canada's argument that the chatbot was a separate entity. Companies are responsible for all information on their websites, including chatbot outputs.

Air Canada argued the chatbot was a separate entity it wasn't responsible for. The tribunal rejected this and held the airline responsible. Companies are liable for their chatbots' errors.

13. What is the "Rule of Three" as described in Lesson 4?

Correct. Three focused follow-ups addressing different aspects of the failure is the practical threshold. If you're still getting wrong answers after three targeted attempts, you've hit a structural limitation.

The Rule of Three: after three focused follow-ups addressing different aspects of the failure, stop iterating. You've hit a structural model limitation. Switch to Alternative A, B, C, or D.

14. What is AI "sycophancy" and why is it dangerous?

Correct. Sycophancy is the model optimizing for approval over accuracy. If you push back on a correct answer, a sycophantic model may capitulate. This is especially dangerous for factual or analytical tasks.

Sycophancy means the model agrees with whatever you imply is true rather than what's actually true. It's optimizing for your approval. Dangerous because users may get confident confirmation of their misconceptions.

15. What is the correct role of AI in the "verification habit" workflow?

Correct. AI is the beginning of a research or writing process — not the end. The Schwartz and Air Canada cases both resulted from treating AI output as a final authority rather than a starting draft.

AI output is a starting point — a useful draft that requires verification before acting on significant claims. Treating it as a final authority is what led to the Schwartz fine and the Air Canada ruling.