Module 5 · Lesson 1

What AI Can and Cannot Find

Understanding the knowledge boundaries that shape every AI research session

Why did reporters using AI assistants in 2023 discover the same stubborn blind spots — and what should every writer know before the first query?

In early 2023, Reuters Institute for the Study of Journalism published findings from its ongoing Digital News Report project showing that newsrooms experimenting with large language model assistants for background research consistently ran into a shared problem: the models confidently described events up to their training cutoffs but could not retrieve anything more recent — and, critically, did not always flag the gap. Reporters working deadline stories on the Silicon Valley Bank collapse in March 2023 found that AI assistants described SVB as a healthy mid-size lender because their training data predated the bank run. The lesson that spread quickly through professional circles was simple but consequential: AI research tools operate from a frozen snapshot of the world, not a live feed.

The Training Cutoff — What It Really Means

Every large language model is trained on a corpus of text assembled up to a specific date. After that date, the model has no awareness of new events unless that information is explicitly injected through retrieval tools, fine-tuning updates, or user-provided context. This boundary is called the training cutoff or knowledge cutoff.

The practical consequences for writers are significant. A model whose cutoff falls in early 2023 cannot describe the outcome of a court case decided in late 2023, cannot cite a scientific study published after that date, and cannot reflect policy changes, leadership transitions, or market shifts that occurred afterward. Asking it to do so risks receiving a confident but outdated — or entirely fabricated — answer.

Importantly, the gap between cutoff and current date grows over time. A model released with a 2023 cutoff that is still in active use in 2025 carries a potential two-year blind spot on any current-events query.

Critical Distinction

A training cutoff is not the same as a release date. Models are often deployed months after their data cutoff, meaning the gap between "what the model knows" and "today" is always larger than the deployment gap alone. Always ask a model directly about its knowledge cutoff before using it for time-sensitive research.

The Hallucination Problem in Research Contexts

When asked about topics where its training data is thin, contradictory, or absent, an AI model may generate plausible-sounding but factually wrong responses — a behavior researchers call hallucination. For writers, this is most dangerous in three specific research situations: citation retrieval, biographical facts about less-prominent individuals, and technical statistics.

A documented example: in May 2023, a New York federal court case (Mata v. Avianca) came to public attention when a lawyer filed a brief containing citations to court decisions that did not exist — all generated by ChatGPT. The citations had realistic case names, docket numbers, and jurisdictions. When opposing counsel checked, none of the cases could be found. Judge P. Kevin Castel fined the attorneys $5,000 and issued a formal sanction order. The episode made front-page news in legal and journalism circles and became a standard cautionary example for any professional using AI in document-intensive work.

The takeaway for writers: never use an AI-generated citation, statistic, or quotation without independent verification from a primary source. The model's confidence level is not a reliable signal of accuracy.

What AI Research Tools Are Actually Good At

Understanding limitations does not mean abandoning the tools — it means deploying them correctly. AI assistants offer genuine research leverage in specific tasks where the risks of hallucination are lower and the speed gains are substantial.

Background synthesis: For topics well-documented before the training cutoff, AI can compress hours of reading into a useful orientation summary. A journalist covering a company's antitrust history, or a novelist researching Victorian-era textile manufacturing, can use AI to get oriented quickly before diving into primary sources.

Question generation: AI is excellent at helping writers identify what they do not yet know. Prompting a model to "list ten questions I should be able to answer before writing about X" often surfaces angles the writer had not considered.

Source pathway identification: Rather than asking AI to be the source, ask it to identify what types of sources would contain the answer — government databases, academic journals, professional associations, regulatory filings. This turns the model into a research librarian rather than an encyclopedia.

Writer's Framework

Think of AI as a first-pass orienter, not a final-pass verifier. Use it to understand the shape of a topic, identify the right questions, and find where authoritative information lives. Then go to those authoritative sources directly for anything that will appear in your published work.

Training cutoff The date after which an AI model has no knowledge of world events, unless supplemented by retrieval tools or user-provided context.

Hallucination An AI output that sounds authoritative but contains fabricated or inaccurate information, most common when the model's training data is thin or absent on a topic.

Retrieval-augmented generation (RAG) A technique that connects an AI model to live or curated databases, allowing it to pull current information rather than relying solely on training data.

Lesson 1 Quiz

What AI Can and Cannot Find — 5 questions

1. What does "training cutoff" mean in the context of AI research tools?

Correct. The training cutoff is the date boundary of the model's knowledge — events after that date are invisible to the model unless injected via retrieval tools or user context.

Not quite. The training cutoff specifically refers to the knowledge boundary in time — the date after which the model's training data does not extend.

2. In the 2023 Mata v. Avianca case, what was the core problem with the AI-generated legal brief?

Correct. ChatGPT generated realistic-looking but entirely fabricated case citations. The attorney filed the brief without checking whether those cases existed, resulting in sanctions from Judge Castel.

The actual problem was fabricated citations — court cases with realistic names and docket numbers that simply did not exist in any legal database.

3. Why is AI confidence level an unreliable signal of accuracy?

Correct. The fluency of AI output — how polished and authoritative it sounds — is a product of language modeling, not a measure of factual accuracy. Hallucinated content reads exactly like accurate content.

Fluency and accuracy are entirely separate properties in AI systems. A model can produce grammatically perfect, confident-sounding text about things that are simply not true.

4. According to the Reuters Institute findings, what happened when reporters used AI to research Silicon Valley Bank in March 2023?

Correct. The SVB collapse was a post-cutoff event for most models in use at the time. AI assistants described the bank in terms consistent with its pre-collapse profile, demonstrating exactly why writers cannot rely on AI for current-events research.

The AI gave outdated information — SVB appeared healthy in the training data, so the model described it that way, unaware of the crisis that was unfolding in real time.

5. Which of the following is the most appropriate use of AI in a research workflow?

Correct. AI excels as a first-pass orienter and question generator. The researcher's job is to pursue those questions through authoritative, verifiable sources — not to treat AI output as the endpoint.

The appropriate role is AI as orienter and question-generator. Any fact, statistic, or citation that will appear in published work must be verified through primary sources independent of the AI's output.

Lab 1: Mapping AI's Knowledge Boundaries

Practice probing an AI assistant to identify what it knows, what it doesn't, and where its cutoff creates risk for writers

Your Task

In this lab you will interrogate the AI assistant about its own knowledge limitations and practice the kinds of boundary-testing queries that professional researchers use before trusting AI output. Ask about training cutoffs, probe a recent event to see how the model handles it, and ask the assistant how you should approach verifying its claims.

Suggested opening: "What is your training cutoff date, and what should I do when I need information about events after that date?" Then follow up with questions about specific research scenarios you face as a writer.

Research Boundary Lab

Welcome to Lab 1. I'm here to help you understand the limits of AI as a research tool — including my own limits. Ask me about my training cutoff, test how I handle recent events, or ask me how you should verify information I provide. Let's make you a sharper AI-assisted researcher.

Module 5 · Lesson 2

Prompting for Research: Precision Queries

How to frame questions that return usable, verifiable information instead of confident generalities

What separates a research prompt that generates insight from one that generates noise — and how did the Associated Press learn to draw that line?

When the Associated Press formalized its AI usage guidelines in the summer of 2023, the document — portions of which were published and widely discussed — included specific guidance on what the news organization called "precision prompting." AP editors had observed that vague, open-ended queries to AI tools produced sweeping but unverifiable summaries, while narrowly scoped queries with explicit constraints returned outputs that were far easier to check and far more likely to be accurate. The guidance emphasized treating AI outputs as leads, not facts, and required that any AI-assisted background material be traced to at least two independent primary sources before being used in a story. The policy became a reference point for other newsrooms developing their own AI workflows throughout 2023 and 2024.

The Anatomy of a Research Prompt

A well-designed research prompt has four components working together: a scope constraint that limits the domain; a time boundary that acknowledges the cutoff and focuses appropriately; a format directive that shapes output for usability; and an uncertainty request that explicitly asks the model to flag what it does not know.

Compare these two prompts: "Tell me about climate policy." vs. "Summarize the major international climate agreements that existed before 2023, identify any aspects you are uncertain about, and list the government or intergovernmental sources where I could verify each point." The first invites a confident sweep of everything. The second constrains the domain, time-bounds the query, requests source pathways, and activates the model's uncertainty flagging.

The second prompt type is slower to write but dramatically more useful for writers who intend to publish. The AP's internal findings matched what researchers at Stanford's Human-Centered AI Institute reported in 2023: specificity in prompts correlates strongly with output reliability across multiple AI platforms.

Prompt Pattern: The Four-Component Research Query

1. Scope: "Focusing only on [specific domain or entity]…"
2. Time boundary: "…based on information available before [year], or noting where recency matters…"
3. Format directive: "…give me a structured summary with [bullet points / numbered claims / a table]…"
4. Uncertainty request: "…and explicitly flag any claims where your confidence is low or where I should verify independently."

Layered Questioning: Going Deeper Without Losing Control

Effective AI research is iterative. The first query establishes orientation; subsequent queries drill into specific claims, challenge assumptions, and surface what was glossed over. This technique — sometimes called layered questioning — mirrors how experienced journalists approach source interviews: the first question opens the subject, and each follow-up tightens focus based on what was just said.

In practice: after an initial summary, a strong follow-up might be "You said X in your previous response. What is the basis for that claim, and how confident are you in it?" This forces the model to expose its reasoning. When the model produces hedged language ("it is generally believed," "reportedly," "some sources suggest"), that is a signal to pursue primary verification rather than building further queries on that foundation.

A documented example of layered questioning going wrong: in 2023, The Guardian reported on a case where a researcher asking an AI to elaborate on an initial claim inadvertently anchored subsequent queries to the first hallucinated response. Each follow-up accepted the false premise and built more plausible-sounding detail around it. This "hallucination cascade" is a known failure mode — specificity in each individual query, rather than building on previous AI outputs uncritically, is the main defense against it.

Asking for Source Types, Not Source Names

One of the most reliable research prompting strategies is to ask AI for the category of sources that would contain authoritative information, rather than asking AI to name specific sources or quote specific texts. This sidesteps the hallucination risk entirely for citation purposes while still leveraging the model's genuine strength: knowledge of which institutions, databases, and publication venues cover which domains.

For example: "What types of government databases or regulatory filings would contain data on pharmaceutical pricing practices in the United States?" is far safer than "Give me three studies that document pharmaceutical pricing practices." The first prompt produces a map; the second invites fabrication.

Washington Post technology reporter Nitasha Tiku documented this approach in a 2023 article about journalists adapting to AI research tools, noting that the reporters who found AI most useful were those who used it to understand the research landscape rather than to extract specific facts.

Core Principle

Ask AI: "Where should I look?" not "What is the answer?" Use AI to build your research map, then navigate that map using primary sources. This separates the orienting function — where AI is strong — from the verification function — where AI is unreliable.

Precision prompting Crafting queries with explicit scope constraints, time boundaries, format directives, and uncertainty requests to improve the reliability and usability of AI research outputs.

Hallucination cascade A failure mode where uncritical follow-up questions build on a hallucinated premise, producing increasingly elaborate but entirely false detail chains.

Layered questioning An iterative prompting strategy that drills progressively deeper into a topic, challenging AI's claims and exposing uncertain reasoning at each step.

Lesson 2 Quiz

Prompting for Research: Precision Queries — 5 questions

1. What are the four components of a well-designed research prompt as described in this lesson?

Correct. These four components work together to constrain the domain, time-bound the query, shape the output for usability, and activate the model's uncertainty flagging — producing far more reliable research outputs.

The four components covered are: scope constraint, time boundary, format directive, and uncertainty request. Together they dramatically improve the reliability of AI research outputs.

2. What is a "hallucination cascade" and how does it occur?

Correct. This failure mode occurs when a researcher accepts a hallucinated premise and asks for elaboration, causing each subsequent query to construct more detail on a false foundation.

A hallucination cascade is when you build subsequent queries on a hallucinated premise — the model keeps adding plausible-sounding detail to something that was never true to begin with.

3. According to the AP's 2023 guidelines, AI outputs should be treated as what?

Correct. The AP guidelines specified that AI outputs are leads, not facts, and required tracing any AI-assisted material to at least two independent primary sources before use in a story.

The AP required that AI outputs be treated as leads only — each claim had to be traced to at least two independent primary sources before appearing in published work.

4. Why is asking AI for "source types" safer than asking for specific source names?

Correct. Asking for categories — "what type of database would contain this?" — produces a research map without risking fabricated citations. The AI's knowledge of institutional domains is far more reliable than its recall of specific texts.

The key is that asking for source categories bypasses the hallucination risk entirely — you get a research map rather than potentially fabricated citations.

5. What signal in AI output should prompt a writer to seek primary verification rather than building further queries on that response?

Correct. Hedged language in AI output is a signal that the model's confidence is low or its training data on that point is thin. That is exactly when you should go to primary sources rather than asking the AI to elaborate.

Hedged language — "reportedly," "generally believed," "some sources suggest" — is the key signal. It means the model's confidence is low and you should not build further on that foundation without primary verification.

Lab 2: Precision Prompting Practice

Build and refine research queries using the four-component framework — then evaluate what you get back

Your Task

Practice building precision research prompts using all four components: scope constraint, time boundary, format directive, and uncertainty request. Try a topic relevant to your writing. Then experiment with asking for source types rather than source names, and use layered follow-up questions to probe what the AI's initial response glossed over.

Try this to start: "I'm researching [your topic]. Using only information available before 2023, give me a structured overview of the key facts, note any areas of uncertainty, and tell me what types of authoritative sources I should consult to verify each major claim."

Precision Prompting Lab

Welcome to Lab 2. Let's practice precision prompting together. Give me a topic you're researching or writing about, and I'll help you build the most effective research query possible — and then we can analyze the output together for reliability signals. What are you working on?

Module 5 · Lesson 3

Verification Workflows for AI-Assisted Research

The professional protocols writers use to confirm what AI found before it goes into print

When The Guardian adopted structured AI verification protocols in 2023, what changed about the errors that reached their editors — and what does that tell writers building their own workflows?

In late 2023, The Guardian's editorial technology team published an internal briefing — later referenced in media industry coverage by Press Gazette — describing a tiered verification protocol the newsroom had implemented for AI-assisted research. The framework divided AI outputs into three categories: background context (could inform but not appear in copy), checkable claims (specific facts requiring one primary-source confirmation), and high-risk claims (statistics, quotes, biographical details requiring two independent confirmations from original sources). The protocol reduced AI-related errors reaching subeditors by a documented margin, because it forced researchers to categorize risk before routing claims to verification. The system's key insight was that not all AI outputs carry the same risk level — and treating them identically was producing inefficiencies in both directions.

A Three-Tier Verification Framework

The Guardian's tiered approach maps cleanly onto a framework any writer can implement. The central move is risk categorization: before verifying anything, you assess how much damage an error in that specific claim would do, and how easily it could be verified if wrong.

Tier 1 — Background orientation: General context, historical framing, conceptual explanations. These shape your understanding but do not appear verbatim in published work. AI can supply these relatively safely, since errors here are caught before they reach the page.

Tier 2 — Specific checkable claims: Dates, named individuals in roles, organizational descriptions, event sequences. These require one primary-source confirmation — a contemporaneous news report, an official document, a named official statement. AI can surface the claim; you confirm it before using it.

Tier 3 — High-risk claims: Statistics and percentages, direct quotations attributed to named individuals, medical or legal claims, financial figures, biographical facts about living individuals. These require two independent primary-source confirmations from original sources, not secondary summaries. AI should not be the proximate source of any Tier 3 claim in published work.

The "Original Source" Rule

A primary source is not the same as a reputable publication reporting on a primary source. For Tier 3 claims, trace back to the original: the government database, the peer-reviewed paper, the official transcript, the company filing. Secondary reporting is a pathway to the original — it is not the verification itself.

Building a Personal Verification Workflow

Verification is most efficient when it is built into the research process rather than added at the end. Practically, this means tagging AI-sourced claims during note-taking rather than trying to reconstruct which facts came from AI versus primary sources after the fact.

A simple markup system: during AI-assisted research, mark any claim you intend to use with a colored flag or shorthand notation. [T2] for Tier 2, [T3] for Tier 3. At the end of a research session, you have a clear action list: Tier 2 items each need one primary-source link; Tier 3 items each need two. Nothing moves to your draft until it has the required confirmations.

Science writer Ed Yong, writing in The Atlantic on science communication practices in 2020 and regularly discussing AI research tools in subsequent interviews, has described a similar approach: never let AI-sourced claims "blend in" with verified claims in notes. The spatial separation in the notes maps to verification steps before drafting.

Cross-Referencing Tools That Work Alongside AI

Several publicly accessible tools directly support the verification workflow. The key is knowing which tool addresses which type of claim.

Factual claims about statistics: Government statistical agencies (ONS in the UK, BLS and Census Bureau in the US, Eurostat in Europe) publish primary data directly. JSTOR, PubMed, and Google Scholar link to peer-reviewed originals. Always retrieve the actual paper or report rather than trusting a summary.

Corporate and organizational facts: SEC EDGAR (US), Companies House (UK), and equivalent national registries publish official filings. Leadership titles, financial data, and corporate structure questions answered here are primary-source verified.

Quotation verification: The Internet Archive's Wayback Machine captures page content at specific dates, allowing verification that a public statement was actually made and that it says what the AI claims it says. This is especially useful for statements from websites that have since been edited or removed.

Scientific claims: For AI-generated summaries of scientific research, always retrieve the original abstract at minimum, and note the actual sample size, methodology, and year. AI summaries of scientific papers frequently overstate findings, drop confidence intervals, and misattribute results.

Writer's Protocol

Tag → Tier → Trace. Tag every AI-sourced claim in your notes. Assign each a tier based on risk. Trace each to the required number of original primary sources before it moves to your draft. This three-step habit, applied consistently, is the main defense against AI-assisted factual errors in published work.

Risk categorization The act of assessing how much damage an error in a specific claim would do and how easily it could be detected, allowing verification effort to be allocated efficiently across an AI research session.

Primary source An original document, dataset, official statement, or firsthand account — as opposed to a secondary source reporting on or summarizing the original. For verification purposes, the chain must trace back to the primary.

Tag → Tier → Trace A three-step verification habit: mark AI-sourced claims, assign a risk tier, then trace each to the required number of original primary sources before using in published work.

Lesson 3 Quiz

Verification Workflows for AI-Assisted Research — 5 questions

1. In The Guardian's tiered verification framework, which type of claims require two independent primary-source confirmations?

Correct. Tier 3 — high-risk claims — includes statistics, direct quotations, biographical facts about living individuals, and medical or legal claims. These require two independent confirmations from original primary sources.

The Tier 3 high-risk category requiring two independent confirmations includes statistics, direct quotations, biographical facts about living individuals, and medical or legal claims.

2. What is the key insight behind The Guardian's tiered protocol?

Correct. The system's core insight was that risk categorization before verification routes effort efficiently — over-verifying low-risk orientation claims wastes time; under-verifying high-risk specific claims creates errors.

The key insight was that different AI outputs carry different risk levels, and treating them identically was inefficient. The tiered approach allocates verification effort to where it matters most.

3. For verifying that a public figure actually made a specific statement that has since been removed from a website, which tool is most appropriate?

Correct. The Wayback Machine captures dated snapshots of web pages, allowing verification that a statement existed at a specific time and contained specific language — essential when the original page has since been edited or removed.

The Wayback Machine is the appropriate tool here — it captures dated page snapshots, allowing you to verify that a statement was made and that it says what the AI claims it says.

4. Why does the lesson recommend tagging AI-sourced claims during note-taking rather than after the fact?

Correct. Once AI-sourced claims blend into notes alongside verified facts, the provenance is lost. Tagging during research creates a clear action list — you know exactly which claims still need verification before drafting.

The reason is provenance: after the fact, AI-sourced claims blend indistinguishably with verified facts. Tagging during research creates a clear checklist of what still needs confirmation.

5. When AI provides a summary of a scientific study, what specific problem does the lesson identify with those summaries?

Correct. AI summaries of scientific papers are notoriously unreliable in specific ways: they tend to overstate effect sizes, omit confidence intervals and methodology limitations, and can misattribute findings to the wrong authors or papers entirely.

The specific documented problems are: AI summaries overstate findings, drop confidence intervals, and misattribute results. Always retrieve the original abstract at minimum before using any AI-summarized scientific claim.

Lab 3: The Verification Tier Workout

Practice categorizing AI research outputs by risk tier and building a verification action list

Your Task

Present a set of AI research outputs to the assistant and practice categorizing each as Tier 1 (background), Tier 2 (specific checkable claim), or Tier 3 (high-risk claim requiring two confirmations). Then ask the assistant to help you build a verification action list — identifying which primary-source types would confirm each Tier 2 and Tier 3 claim.

Try this: "Here are five claims from my AI research session: [paste or describe them]. Help me categorize each by risk tier and tell me what type of primary source would verify each Tier 2 and Tier 3 claim."

Verification Tier Lab

Welcome to Lab 3. I'll help you practice the Tag → Tier → Trace workflow. Share some claims from your AI research — real or hypothetical — and we'll work through risk categorization together. For each claim, I'll help you identify its tier and the primary-source type needed to verify it. What claims are you working with?

Module 5 · Lesson 4

Integrating AI Research into the Writing Process

How to weave verified AI-assisted research into prose without losing your editorial judgment — or your voice

How did longform writers at publications like ProPublica and The Atlantic find ways to use AI research tools without letting those tools flatten the distinctive perspectives that make their work worth reading?

ProPublica's investigative team began formally evaluating AI-assisted research workflows in 2023, with findings described in a Nieman Lab article by reporter Craig Silverman in early 2024. The key observation from ProPublica's process was that AI research tools changed the entry point of investigation but not its core method. Reporters used AI to get oriented faster — to understand the regulatory landscape of an industry, to identify which government agencies held relevant documents, to surface terminology that would improve database search strings. But the pivotal moment of every story — the human source interview, the document read, the expert conversation — remained unchanged. The reporters who used AI most effectively treated it as a way to arrive at those pivotal moments better prepared, not as a substitute for them.

The Integration Problem: Research Voice vs. Writer Voice

One risk specific to AI-assisted research is what might be called framing capture: the tendency for AI's summary of a topic to shape not just what a writer knows but how they conceptualize the story. If an AI frames a story about a corporate scandal primarily through a financial lens, a writer who absorbs that framing uncritically may produce analysis that misses the human and cultural dimensions. The AI's framing was determined by what was most prominent in its training data — which reflects what was most published, not necessarily what is most significant.

The countermeasure is deliberate perspective-seeking after the AI research phase. Having absorbed the AI's orientation, explicitly ask: what angle is underrepresented here? Who would disagree with this framing? What human story does this financial summary leave out? These questions — best explored through direct human sources — are the corrective that keeps the writer's judgment in control of the story rather than the model's training data distribution.

Framing Capture — A Documented Risk

In 2023, Columbia Journalism Review published analysis showing that AI-assisted news stories about corporate earnings tended to reproduce the framing priorities of financial wire services — not because reporters chose that framing consciously, but because it dominated the AI's training data. Writers who began with AI orientation before human reporting were more likely to reproduce these framing defaults than those who consulted human sources first.

Using AI Research to Sharpen Questions for Human Sources

The most powerful integration of AI into longform research is not replacement of human sources but preparation for them. An AI session that produces a solid orientation of a regulatory domain, a corporate history, or a scientific debate can dramatically improve the quality of subsequent expert interviews. The writer arrives knowing the vocabulary, understanding the fault lines in the debate, and able to ask second-order questions rather than first-order definitions.

This is the approach that science journalists at publications including STAT News and Wired described adopting in 2023 coverage of AI tools in journalism. Rather than asking an expert "can you explain what mRNA vaccines do?" — a question that could now be answered by AI — they could ask "the published literature suggests X, but I've seen criticism along the lines of Y — where do you come down, and why?" The AI handled the encyclopedic background; the human source provided the judgment, nuance, and new information that no model could supply.

Citing AI in Your Work: Emerging Standards

As AI research tools become normalized, publication standards for disclosing AI use in research — distinct from AI use in drafting — are still developing. However, several principles are already clear from guidance issued in 2023 and 2024 by the Society of Professional Journalists, the Authors Guild, and academic style guides including APA and MLA.

First: AI is not a citable source for factual claims in published work. If a fact came from AI, the citation must trace to the primary source that confirms it — the AI session itself cannot be the reference. This is because AI outputs are not stable, reproducible, or independently verifiable in the way that published texts are.

Second: if AI was used in a research process that informed a published piece, disclosure is increasingly expected, particularly in journalism and academic writing. The form of disclosure varies — some publications use an end-note, others a process note — but transparency about AI's role is the emerging norm.

Third: AI-generated quotes or paraphrases attributed to real named individuals are ethically unacceptable in journalism, nonfiction, and academic writing, regardless of how plausible they appear. Any quotation must trace to a documented, verifiable source.

Integration Principle

AI research changes the speed of orientation, not the standard of verification. Use AI to arrive at your primary sources better prepared. Use primary sources — human, documentary, institutional — to verify, deepen, and complicate everything AI gave you. The story is what happens when those two layers meet.

Framing capture The tendency for AI's framing of a topic to shape a writer's conceptualization of the story, reflecting training data distribution rather than independent editorial judgment.

Second-order questions Interview questions that presuppose orientation knowledge and probe judgment, nuance, and new information — only possible when AI research has handled the encyclopedic background.

AI disclosure The emerging editorial standard of noting in published work when AI tools contributed to the research or drafting process, in a form appropriate to the publication type.

Lesson 4 Quiz

Integrating AI Research into the Writing Process — 5 questions

1. What did ProPublica's reporters identify as the key benefit of AI research tools in their investigative workflow?

Correct. ProPublica's finding was that AI changed the speed and quality of orientation — reporters understood the regulatory landscape, terminology, and document pathways faster — but the core investigative work remained unchanged.

ProPublica found that AI changed the entry point — faster orientation, better preparation — but the pivotal moments of investigation (source interviews, document reads, expert conversations) remained exactly as before.

2. What is "framing capture" in AI-assisted research?

Correct. Framing capture occurs when the writer absorbs AI's dominant framing — typically reflecting what was most published, not most significant — and carries that framing into the story without critically examining it.

Framing capture is the risk that AI's orientation shapes how you think about the story — reflecting what dominated training data rather than your own editorial judgment. The countermeasure is deliberate perspective-seeking from human sources.

3. According to the Society of Professional Journalists and emerging style guides, can AI be cited as a source for a factual claim in published work?

Correct. AI outputs are not stable, reproducible, or independently verifiable in the way published texts are. Factual citations must trace to primary sources — the AI session itself cannot be the reference for any publishable claim.

AI cannot be cited as a source for factual claims. The citation must trace to the verifiable primary source that confirms the fact. AI is a pathway to that source, not a source itself.

4. What are "second-order questions" and why does AI research enable them?

Correct. When AI handles the encyclopedic background — what mRNA vaccines do, how a regulatory agency is structured — the writer can use the expert interview for judgment, nuance, and new information that only a human can supply.

Second-order questions presuppose orientation and go straight to judgment and nuance. AI handling the background knowledge frees the interview time for the things only human sources can give you.

5. What did the Columbia Journalism Review analysis find about AI-assisted news stories about corporate earnings?

Correct. The CJR analysis found framing capture in practice: AI's training data was dominated by financial wire service framing, and reporters who used AI orientation before human reporting were more likely to reproduce that framing — demonstrating that what gets published most shapes how AI orients writers to a topic.

The CJR finding was framing capture in action — financial wire service framing dominated training data, so AI-oriented reporters unconsciously reproduced those priorities rather than developing independent story angles.

Lab 4: Research-to-Draft Integration

Practice using AI research to prepare second-order questions and build a verified research framework for a piece you are writing

Your Task

Choose a topic you are currently writing about — journalism, nonfiction, fiction research, or academic writing. Use the AI assistant to get oriented, then practice: (1) identifying the AI's framing choices and what perspective is underrepresented, (2) generating second-order questions for human sources, and (3) building a disclosure note for how you used AI in your research process.

Try: "I'm writing about [topic]. Give me an orientation summary, then tell me: what framing assumptions are built into your summary, and what perspectives or angles are underrepresented? Finally, help me generate five second-order questions I could use in an expert interview."

Research Integration Lab

Welcome to Lab 4. In this final lab we'll integrate everything — orientation, framing analysis, second-order questioning, and disclosure. Tell me what you're writing about and I'll give you an orientation, flag my own framing assumptions, and help you generate expert interview questions that only a human source can answer. What's your topic?

Module 5 Test

Research and Fact-Finding with AI — 15 questions — 80% to pass

1. What is the primary reason AI models cannot provide accurate information about events that occurred after their training cutoff?

Correct. The training cutoff is a hard boundary — the model's knowledge is frozen at that date unless augmented by retrieval-augmented generation or user-provided context.

The training cutoff is a fixed boundary. The model's weights encode information from training data up to that date only. Without retrieval tools, there is no mechanism for post-cutoff knowledge.

2. In the Mata v. Avianca case, Judge Castel sanctioned attorneys because:

Correct. The specific issue was fabricated citations — realistic-looking case names, docket numbers, and jurisdictions that turned out to be entirely non-existent. The failure was submitting without checking whether those cases existed.

The sanction was specifically for filing citations to non-existent cases — ChatGPT hallucinated realistic-looking legal citations that had no existence in any court record.

3. Retrieval-augmented generation (RAG) addresses which specific limitation of standard AI models?

Correct. RAG connects the model to external databases or documents at inference time, allowing it to retrieve current or specialized information rather than relying solely on training data.

RAG specifically addresses the training cutoff problem by connecting the model to external information sources at query time, enabling current-information retrieval.

4. The AP's 2023 AI guidelines described AI outputs as:

Correct. The AP's framework treated AI outputs as leads only — the starting point for investigation, not the endpoint. Any AI-assisted material required tracing to at least two independent primary sources.

The AP required treating AI outputs as leads, not facts, with every claim traced to at least two independent primary sources before use in a story.

5. Which of the following best describes "precision prompting"?

Correct. Precision prompting combines four elements — scope constraint, time boundary, format directive, uncertainty request — to produce outputs that are more constrained, checkable, and reliable for research purposes.

Precision prompting uses four components: scope constraint, time boundary, format directive, and uncertainty request. Together they constrain the query and improve output reliability significantly.

6. A hallucination cascade occurs when:

Correct. Each follow-up question that accepts the false premise adds more plausible-sounding false detail. The defense is treating each query as independent and not building on AI's previous outputs without verification.

A hallucination cascade is specifically the compounding effect of building on false premises — each elaboration makes the false detail more elaborate and harder to detect.

7. The Guardian's tiered verification protocol categorized AI outputs into three tiers. What was the organizational purpose of this tiering?

Correct. The tiering system's purpose was efficiency through risk allocation — not every claim needs the same verification effort, and treating them identically produces both over-checking of low-risk content and under-checking of high-risk content.

The purpose was efficient allocation of verification effort — matching the level of required confirmation to the risk level of the claim, rather than applying identical verification effort to all outputs.

8. For verifying a direct quotation attributed to a named individual where the original web page has been removed, which tool is most appropriate?

Correct. The Wayback Machine captures dated page snapshots, allowing you to verify that a statement existed at a specific time and contained specific language, even when the original source has since been edited or removed.

The Wayback Machine is the right tool for verifying statements from pages that have since changed or been removed — it provides dated captures of the original content.

9. Which of the following is the best summary of how the "Tag → Tier → Trace" workflow is implemented?

Correct. The three steps are sequential: mark AI-sourced claims during the session, assess each for risk level, then verify with the appropriate number of primary sources before the claim enters the draft.

Tag = mark AI-sourced claims in notes during research. Tier = assess each for risk level. Trace = verify each with the required number of original primary sources. This sequence prevents AI-sourced claims from reaching the draft unverified.

10. What did ProPublica's investigative team identify as the unchanged element in their workflow even after adopting AI research tools?

Correct. AI changed the entry point and speed of orientation, but the pivotal investigative moments — source interviews, document analysis, expert conversations — remained entirely unchanged. AI helps you arrive at those moments better prepared.

ProPublica found that AI accelerated orientation but did not change the core investigative work: the human source interviews, document reads, and expert conversations that give investigative journalism its substance.

11. What is "framing capture" and what is the recommended countermeasure?

Correct. The countermeasure is deliberate: after AI orientation, explicitly ask what is underrepresented, who would disagree with this framing, and what human story the summary leaves out. This restores the writer's editorial judgment.

The countermeasure for framing capture is deliberate perspective-seeking: asking what's underrepresented, who disagrees with the AI's framing, and what human angles the summary missed.

12. According to the Columbia Journalism Review analysis referenced in Lesson 4, why did AI-assisted earnings stories tend to reproduce financial wire service framing?

Correct. Training data distribution determines framing defaults. Financial wire services publish enormous volumes of earnings coverage, so that framing dominates AI orientation — not through any editorial choice, but through data volume.

The mechanism was training data dominance: wire service coverage was so volumetrically dominant in training data that it became the AI's default framing lens for any earnings-related query.

13. What makes AI a suitable tool for generating "second-order questions" for expert interviews?

Correct. When AI provides the orientation — vocabulary, fault lines in a debate, background definitions — the expert interview can go straight to judgment, nuance, and new information. The first-order encyclopedic questions are already answered.

Second-order questions are possible because AI handles the encyclopedic background. The writer arrives at the expert interview already oriented — freeing that conversation for things only a human source can provide.

14. What is the current standard for citing AI in published work when AI contributed to the research process?

Correct. The emerging norm is disclosure of AI's role in research via an appropriate note, while maintaining that AI itself cannot be the citation for any factual claim — the citation must trace to the primary source.

Disclosure is the emerging standard — via end-note or process note depending on the publication. But AI itself is never the citation for factual claims; those trace to verifiable primary sources.

15. Which of the following best captures the core principle unifying all four lessons in this module?

Correct. This is the module's central principle: AI accelerates orientation; the verification standard remains unchanged. The writer's job is to bring those two layers together — AI's speed of orientation meets primary sources' authority of confirmation.

The core principle is: AI changes the speed of orientation, not the standard of verification. Use AI as a first-pass orienter and question-generator; verify everything through primary sources before it reaches your draft.