L1
Β·
Quiz
Β·
Lab
L2
Β·
Quiz
Β·
Lab
L3
Β·
Quiz
Β·
Lab
L4
Β·
Quiz
Β·
Lab
Module Test
Module 4 Β· Lesson 1

The Checklist Wasn't Born in a Lab

How a surgeon's checklist became the template for every fake-detection system that followed β€” and why checklists work when gut instinct fails.
If experts keep making the same mistakes, what's the fix β€” better training, or a better system?

On October 12, 2006, a surgical team at a respected Seattle hospital operated on the wrong knee. The patient, a 65-year-old man named Donald Church, had come in for his right knee. His chart said right knee. His consent form said right knee. The surgeon was experienced β€” thousands of procedures behind him. And yet, the incision opened on the left leg.

Investigations afterward found no reckless behavior, no negligence in the ordinary sense. The surgeon simply believed he remembered which leg it was. He trusted his expert memory over the written record. The same thing had happened 57 times in American operating rooms that year alone, according to Joint Commission data.

The fix that followed β€” championed by surgeon and author Atul Gawande after studying aviation safety β€” was humblingly simple: a printed checklist. Surgeons had to mark the correct limb with a marker before the procedure began. Wrong-site surgeries dropped by over 40% within two years of widespread adoption.

The lesson wasn't that surgeons were bad. The lesson was that expert confidence is one of the most dangerous failure modes a human mind has. A checklist doesn't replace expertise. It catches what expertise misses.

Why Your Brain Is a Bad Fake Detector (on Its Own)

When you look at an AI-generated image and try to decide if it's real, your brain does something that feels like careful analysis but often isn't. It pattern-matches against what feels familiar. It takes shortcuts. And the better the fake, the harder those shortcuts fail β€” because high-quality AI images are specifically optimized to trigger the "looks normal to me" response.

Researchers at MIT's Media Lab published a study in 2023 showing that human participants correctly identified AI-generated faces only 51.2% of the time β€” barely better than flipping a coin. Participants who said they were "very confident" in their answers were wrong nearly as often as those who admitted uncertainty. Confidence didn't track accuracy at all.

This is exactly the same failure mode as the wrong-site surgery. The surgeon was confident. The participants were confident. Confidence is not a checklist. Confidence is a feeling.

Confirmation bias The tendency to look for evidence that supports what you already believe and ignore evidence that contradicts it. When you expect an image to be real, you notice the convincing details and your brain glosses over the weird ones.
Expert overconfidence The phenomenon where people who know a lot about a subject become worse at catching their own errors, because their expertise makes mistakes feel impossible.

A checklist forces you to slow down and examine specific, predetermined things β€” not whatever happens to catch your eye. It makes the process systematic instead of impressionistic (based on feel rather than method). That's the difference between guessing and detecting.

What a Good Checklist Actually Does

In 2009, the World Health Organization published the Surgical Safety Checklist that Gawande had helped design. It had 19 items. Hospitals that used it saw a 36% reduction in major complications. The checklist wasn't magic β€” it was forced attention. It made experts check things they'd otherwise assume were fine.

A fake-detection checklist works on the same principle. Instead of looking at an image and thinking "does this feel right?", you run through a fixed set of specific questions. Does the lighting source match across the entire image? Do the reflections in the eyes match each other? Are there an even number of fingers? Is the text readable or garbled?

Each question targets a known failure mode β€” a place where current AI image generators consistently stumble. When you check them one by one, you're not relying on your general impression. You're auditing specific systems, the way an aircraft mechanic checks a plane before flight.

The Aviation Parallel

Commercial pilots use pre-flight checklists even though they've done the same checks thousands of times. Not because they're forgetful β€” because the stakes are too high to rely on memory. The checklist isn't a training tool. It's a permanent operational tool. Your fake-detector checklist works the same way: you'll use it every time, not just while you're learning.

Here's the ethical question you should sit with before this lesson ends: If a checklist can reliably catch 80% of AI fakes, does that mean the 20% that pass become "officially real" in the eyes of anyone who uses it? Does having a system give you false confidence in the things the system misses? There's no clean answer. Surgeons with checklists still make mistakes. Checklists don't make you certain β€” they make you less wrong.

The Architecture of Your Checklist

Over the next three lessons, you're going to build a personal fake-detector checklist from the ground up. Each lesson adds a new layer β€” physical tells, context tells, and source tells. By the end of this module, you won't be guessing. You'll have a structured, repeatable process.

This lesson establishes the foundation: why checklists beat gut instinct, and what makes a checklist question useful versus useless. A useful checklist question has three properties: it targets a specific known failure mode, it has a clear yes/no answer (not "does this look weird?"), and it applies broadly to many different kinds of images β€” not just one type.

You Now Know Something Most Adults Don't

Most people look at an AI-generated image and rely entirely on their gut feeling. You now understand that gut feeling is unreliable, why it's unreliable, and that a systematic checklist approach is measurably more accurate. That's not a trivial insight β€” it's the same insight that reduced surgical deaths in hospitals worldwide. You can apply it every time you encounter an image that might be fake.

The checklist isn't something you download. You're building it yourself, piece by piece, so that you actually understand why each item is on it. By the time you're done, it'll be yours β€” not a set of rules you memorized, but a set of questions you trust because you know where they came from.

Lesson 1 Quiz

The Case for Checklists

5 questions Β· Select the best answer for each.
1. The 2006 wrong-site surgery case in Seattle is used at the start of this lesson mainly to illustrate what?
Correct. The case shows that even highly trained experts can fail when relying on confident memory alone β€” and that a simple checklist can prevent that failure.
Not quite. The point isn't about surgeons vs. AI, or even about written records specifically β€” it's about what happens when expert confidence substitutes for a systematic process.
2. The MIT Media Lab study from 2023 found that humans identified AI-generated faces correctly about 51% of the time. What does this tell us about using confidence as a guide?
Exactly right. The study showed confident participants were wrong just as often as uncertain ones, which means confidence doesn't track accuracy in this task.
The study specifically showed that confident participants were no more accurate than uncertain ones β€” confidence wasn't a useful signal at all.
3. You're looking at an image someone claims is a real photo. Which approach matches the checklist method described in this lesson?
Correct. A systematic checklist of specific questions targeting known failure modes is exactly what this lesson argues for β€” not impressions, comparisons, or expert opinion.
This approach relies on impression or comparison rather than a systematic process. The lesson argues that impressions fail even for experts.
4. According to the lesson, a useful checklist question must have which three properties? Apply this: which of the following is the WEAKEST checklist question?
Right. "Feels a bit off" is exactly the kind of impressionistic question the lesson says we should replace. It doesn't target a specific failure mode and doesn't have a clear yes/no answer.
Options A, B, and D all target specific, known AI failure modes and have clear yes/no answers. "Feels off" does not β€” it's the subjective impression that checklists are designed to replace.
5. The lesson raises a genuine ethical concern: if a checklist catches 80% of fakes, do the 20% that pass become "officially real"? What is this concern really about?
Exactly. The concern is that passing a checklist might feel like confirmation of authenticity β€” when in reality, a checklist only reduces uncertainty, it doesn't eliminate it.
The ethical worry isn't about the accuracy percentage or who should use checklists. It's about whether systematizing detection gives us false confidence about the things our system doesn't catch.
Lesson 1 Lab

Design Your First Checklist Questions

Work with an AI collaborator to draft the foundational layer of your personal fake-detector checklist.

Your Role: Checklist Architect

You've learned why checklists beat gut instinct. Now you're going to start building yours. Your collaborator below knows a lot about AI image generation β€” but they're not going to hand you a checklist. They're going to challenge your thinking until you arrive at questions that actually work.

A good checklist question targets a specific known failure mode, has a clear yes/no answer, and applies broadly. Your job is to propose questions and defend why they belong on the list.

Start by telling your collaborator: what's one thing you already look for when you think an image might be fake? Then ask them whether it qualifies as a good checklist question β€” and why or why not.
Collaborator: REED
Checklist Design
You're building a checklist, not memorizing one β€” there's a real difference. Tell me something you already look for when you suspect an image might be AI-generated. I'll tell you whether it actually holds up as a checklist question, and we can figure out why or why not together.
Module 4 Β· Lesson 2

Physical Tells: What the Body Gives Away

AI image generators are trained on millions of photographs β€” but they still don't understand physics. Here's how to catch what physics exposes.
If an AI has seen every photograph ever posted to the internet, why does it still get hands wrong?

On February 5, 2023, a photograph of Pope Francis wearing a large, puffy white Balenciaga-style puffer jacket spread across Twitter, Reddit, and WhatsApp. Within 48 hours it had been seen by an estimated 27 million people. Many of them believed it was real β€” including, reportedly, several journalists who initially included it in news round-ups before deleting their posts.

The image was generated using Midjourney v5, which had launched just days earlier with dramatically improved photorealism. The creator, a Chicago man named Pablo Xavier, posted it to a Reddit forum as an obvious joke. It did not stay a joke.

What's most instructive about this case is not that the image spread β€” it's why people believed it. They focused on what was convincing: the fabric texture, the lighting on the jacket, the general setting. They didn't look at what wasn't convincing: the Pope's right hand, which has six fingers. The hands in the image are anatomically wrong. The rosary he's holding passes through his palm in a way that defies physics. And the background architecture, when examined closely, contains columns that merge into each other at impossible angles.

Nobody looked. They saw what they expected to see β€” a celebrity in a funny outfit β€” and their brain filled in the rest.

Why AI Gets Bodies Wrong β€” And Why It Always Will (For Now)

AI image generators learn by finding statistical patterns in enormous collections of images. When they generate a hand, they're not drawing a hand based on knowledge of anatomy β€” they're producing pixels that statistically correlate with "hand" in their training data. Hands are hard because they appear in photographs at wildly different angles, scales, and positions. The statistical pattern for "hand" is messy.

This creates specific, repeatable failure points that you can add to your checklist:

Finger count errors AI models frequently generate hands with too many or too few fingers, or fingers that merge, split, or bend at impossible angles. This was present in the Pope Francis image and remains one of the most reliable physical tells as of 2024.
Lighting inconsistency Real photographs have one light source (or a coherent combination). AI images often have subjects lit from a direction that doesn't match the background, or shadows that fall at inconsistent angles across the same scene.
Reflection errors Reflective surfaces β€” eyes, glasses, windows, water β€” should reflect the same environment from a consistent perspective. AI generators often produce reflections that show environments not present in the scene, or that differ between the two eyes of a face.
Object physics violations Objects that pass through each other, fabric that folds in ways that violate gravity, jewelry that merges with skin, or architecture with features that couldn't exist structurally β€” all are signs that the generator is filling in patterns without understanding physical constraints.

These aren't aesthetic quirks. They're structural failure points caused by how AI generation works. They don't all appear in every image β€” but checking for them systematically is far more reliable than a general "does this look weird" impression.

The Ear Problem β€” and What It Tells You About Pattern Matching

Researchers at the University of Copenhagen published a study in December 2022 examining AI-generated faces at scale. One of their most consistent findings: ears. Specifically, the area where the ear connects to the jaw and neck.

In real photographs, this area shows a consistent anatomical structure β€” the tragus (the small cartilage flap in front of the ear canal), the earlobe, and the neck muscle all align in a way that follows from how the human skull is built. In AI-generated faces, this region is frequently blurred, structurally inconsistent, or shows hair growing in directions that don't follow the curve of the skull.

Why ears? Because ears are rarely the focus of a photograph. Training data contains millions of images where the ear is partially hidden, blurred, turned away, or cut off by the frame. The AI has seen fewer clear examples of ears than of eyes or noses, so its statistical model for "ear" is weaker.

Checklist Items for Physical Tells

Hands: Count the fingers. Check for merging, extra knuckles, or joints that bend backward.
Lighting: Find the light source. Does every shadow in the image agree on its direction?
Eyes: Do both eyes reflect the same thing? Are the reflections consistent with the described scene?
Ears: Is the ear-to-neck connection anatomically coherent? Is hair growing in plausible directions?
Fabric and objects: Does anything pass through anything else? Does fabric fold against gravity?

Now here's the institutional version of the question you've been thinking about: In March 2023, the Associated Press β€” one of the world's most respected news agencies β€” published formal guidance for its journalists on AI-generated images. Their checklist for editors included several of the physical tells above. The AP, with decades of photo-verification experience, decided that a systematic checklist was more reliable than leaving the call to individual editors' judgment. If it works for the AP, it works for you.

The Limitation You Have to Sit With

Here is the uncomfortable truth about physical tells: they are a moving target. The finger-count problem that was extremely reliable in early 2023 had become significantly less reliable by late 2023, as model training improved. Midjourney v6, released in December 2023, produces hands that pass casual inspection most of the time.

This means your checklist needs a maintenance mindset. The principle is stable β€” AI generators fail at physics because they don't understand physics. But the specific failures shift as models improve. A checklist built in 2023 is not necessarily a checklist that works in 2025.

Ethical Tension

If physical tells become less reliable as AI improves, does publishing detailed checklists help the people trying to detect fakes β€” or does it mainly help AI developers know what to fix next? There's a real argument that widely-shared detection guides accelerate the arms race. Think about that before you decide how publicly to share your checklist.

You can now look at an image and do something almost no one outside of professional fact-checking teams does: a structured audit of physical plausibility. You check hands, lighting, reflections, ears, fabric physics. You do it in order. You don't stop because the image "looks real overall." That's the difference between a detector and a viewer.

Lesson 2 Quiz

Physical Tells

5 questions Β· Apply what you've learned to new scenarios.
1. Why did so many people believe the Pope Francis puffer jacket image in February 2023?
Correct. The image spread because people saw what they expected β€” a celebrity in a funny outfit β€” and didn't look at the specific failure points like the hand with six fingers or the impossible rosary.
The image actually contained multiple physical tells including a six-fingered hand and architecture errors. The problem wasn't image quality β€” it was where people chose to look.
2. AI generators struggle with hands because of their training process. Which explanation best describes why?
Right. The statistical pattern for "hand" is weaker than for, say, "eye" because hands appear in more varied orientations in training data. The AI isn't failing at anatomy β€” it's failing at statistics.
AI generators aren't programmed with anatomical knowledge at all. The hand problem comes from the quality of the statistical patterns the model learned, not from deliberate exclusion or mathematical complexity.
3. You're looking at an AI-suspected portrait. The subject is wearing glasses. Which checklist question is most targeted at a known AI failure mode?
Correct. Reflection consistency is a known AI failure point β€” generators frequently produce reflections that are inconsistent between lenses or show environments not present in the scene.
Fashion and affordability are not AI failure modes. Lens size symmetry is somewhat useful but less targeted than reflection accuracy, which is one of the most consistent failure points for current AI generators.
4. The lesson says the finger-count problem became less reliable by late 2023 as AI models improved. What does this mean for your checklist?
Exactly right. The underlying reason AI fails at physical tells is stable β€” it doesn't understand physics. But which specific failures are most visible will shift over time, so the checklist needs maintenance.
The lesson doesn't argue that checklists become useless or that you should abandon one category entirely. It argues that the principle is stable but specific items need to evolve with the technology.
5. The Associated Press issued formal guidance on AI image detection in March 2023 that included physical tells. Why does the lesson mention this institutional example?
Right. The AP example reinforces the core argument from Lesson 1: a systematic checklist is more reliable than individual judgment, even for highly experienced professionals.
The AP example is meant to reinforce the value of systematic checklists over expert judgment β€” not to restrict checklist use to professionals or credit the AP with inventing the method.
Lesson 2 Lab

Add the Physical Layer to Your Checklist

Build out the physical tells section with your AI collaborator β€” and defend every item you include.

Your Role: Physical Tells Auditor

You've studied lighting, hands, reflections, ears, and object physics. Now you're going to decide which of these belong in the physical tells section of your checklist β€” and in what order. Order matters: you want the most reliable, fastest checks first.

Your collaborator will push back on your reasoning. They may argue that some tells you want to include are too vague, or that you're missing a category entirely. You have to defend your choices.

Start by telling your collaborator: which physical tell do you think should be checked FIRST β€” and why that one before the others?
Collaborator: REED
Physical Tells
You're building the physical tells section of your checklist. The order of your checks matters β€” you want the fastest and most reliable ones first. So: which physical tell do you put at the top of your list, and why does it earn that position over the others?
Module 4 Β· Lesson 3

Context Tells: The Image That Couldn't Have Existed

Sometimes an image passes every physical check β€” and it's still fake. Context is the second layer of your checklist, and it catches what physics misses.
If an image looks physically perfect, can it still be proven fake? What evidence would you even look for?

In March 2023, images began circulating on Telegram channels showing what appeared to be explosions near the Kremlin in Moscow. The photographs were striking β€” columns of smoke, dramatic lighting, fragments in the air. Some had minimal physical errors. A few passed basic visual inspection by casual observers and were shared widely as evidence of a drone attack.

The open-source intelligence organization Bellingcat β€” which had built its reputation on systematic verification during the 2014 Ukraine conflict β€” didn't start with the images themselves. They started with the context. Their investigators asked: if this event happened, what other evidence should exist?

There were no corroborating social media posts from the area. No flight-tracking data showed rerouted aircraft. Moscow traffic cameras β€” many of which Bellingcat routinely archives β€” showed no unusual activity near the Kremlin at the timestamps the images supposedly captured. No Russian government official or media outlet reported anything, which would be extraordinary if a genuine explosion had occurred at the seat of government.

The images were assessed as fabricated β€” not because they failed a physics check, but because the entire context they claimed to document didn't exist. The real world had left no trace of the event those images depicted.

Context as Evidence: What the World Should Show

Physical tells examine what's inside an image. Context tells examine what's outside it β€” specifically, whether the world the image claims to document left any other evidence of its existence.

This is the method Bellingcat and other open-source intelligence (OSINT β€” pronounced "OH-sint") organizations use. OSINT is the practice of using publicly available information β€” social media, satellite imagery, traffic cameras, news archives, flight tracking β€” to verify or disprove claims about real-world events.

OSINT (Open-Source Intelligence) The practice of collecting and analyzing publicly available information to verify or investigate claims. Used by journalists, researchers, and investigators. Does not require hacking or private access β€” only careful, systematic analysis of public data.
Corroborating evidence Independent evidence that supports the same conclusion. A real event typically produces multiple independent records β€” witnesses, camera footage, official reports, changed travel patterns. A fabricated event typically produces only the fabrication itself.

For your checklist, context tells become a second layer of questions: not "does this image look right?" but "does the world this image claims to show make sense?" You're looking for the absence of corroboration β€” the dog that didn't bark.

The Metadata Layer β€” and Why It's Already Gone

Every photograph taken by a digital camera or smartphone embeds invisible data in the image file called EXIF metadata. This data includes the camera model, lens information, date and time, GPS coordinates if enabled, and software used to process the image. For decades, this was a powerful verification tool β€” you could check whether the recorded GPS coordinates matched the claimed location, or whether the timestamp made sense.

Here is the problem: most social media platforms, including Facebook, Instagram, Twitter/X, and WhatsApp, automatically strip EXIF data from every image uploaded to their servers. This is partly for user privacy β€” location data in photos was being used to track people. But the side effect is that by the time you see an image on social media, its metadata is already gone.

AI-generated images typically have no EXIF data at all, or generic placeholder data from the generation software. But the absence of EXIF data doesn't prove an image is fake β€” real images lose their metadata too, routinely. You can't use "no metadata = fake." You can note it as a weak signal, but it's not definitive.

Context Checklist Items

Reverse image search: Does this image appear anywhere older than the claimed event? Does it appear in a different context, with different captions?
Corroboration: If this event happened, what else should exist? News reports? Social media posts from people in the area? Flight data? None of those existing is a strong signal.
Timestamp plausibility: Does the lighting in the image (time of day, season) match the claimed time and location? Is there snow in a city where it was 30Β°C that week?
Source tracing: Where did this image first appear? Who posted it? Can you find the original account or publication?

Reverse image search β€” available through Google Images, TinEye, and Yandex Images β€” is one of the most powerful and underused tools available to anyone. In 2022, Yandex's reverse image search proved particularly effective at identifying AI-generated faces because its index included some of the original training datasets. A face that shows up in multiple contexts under different names is a strong indicator of fabrication or misrepresentation.

When Context and Physics Point in Opposite Directions

Here's a scenario that happens in real investigations: an image passes all physical checks β€” hands look right, lighting is consistent, reflections match β€” but the context is impossible. A photograph of a building that allegedly burned down in 2021, in which the building has a sign for a business that wasn't founded until 2023. Or a "street photo" from one country in which a license plate style belongs to a different country entirely.

Context tells can catch fakes that physics misses. And the reverse is true: an image can have a real-world event behind it and still have suspicious physical qualities due to heavy processing or compression.

This is why your checklist needs both layers. They're not redundant β€” they catch different things. Running physical checks and then context checks gives you two independent sets of evidence. If both point toward fake, your confidence is high. If they point in different directions, that's an important flag β€” it means something unusual is happening that deserves more investigation.

You Now Have Something Bellingcat Analysts Use

The context-verification method you just learned β€” asking what other evidence should exist if this event were real β€” is the same fundamental approach used by Bellingcat, the BBC's disinformation unit, and the Stanford Internet Observatory. You're not learning this for a test. You're learning it because this is how professionals actually work. Next time you see a dramatic news image, you'll automatically think: "What should the world show if this were real?" That's a different way of seeing.

One more ethical tension before you move on: OSINT tools like reverse image search and location verification give individuals significant power to investigate public claims. But those same tools can be used to track private individuals β€” finding where someone lives from a background detail in a photo, for example. The skills that make you a better fake-detector also make you capable of serious privacy violations. How you use them is a choice that doesn't have a built-in answer.

Lesson 3 Quiz

Context Tells

5 questions Β· Think through the context, not just the image.
1. Bellingcat's investigation of the 2023 Kremlin explosion images focused mainly on what?
Exactly right. Bellingcat looked at corroborating real-world evidence β€” or rather, its complete absence β€” rather than the image files themselves.
Bellingcat's method was contextual, not pixel-level or metadata-based. They asked what the real world should show if the event had occurred β€” and found nothing.
2. Most social media platforms strip EXIF metadata from uploaded images. What does this mean for using "no metadata" as a fake-detection signal?
Correct. Since real photos also lose metadata when uploaded to most platforms, the absence of metadata doesn't tell you much. It's noted as a weak signal, not a conclusion.
The lesson specifically warns against using "no metadata = fake" because real photos routinely lose their metadata on social platforms. It's not a definitive indicator.
3. You see a dramatic image claiming to show a flood in a specific city, shared widely on social media. Which context check would be MOST useful as a first step?
Right. Reverse image search is typically the fastest context check β€” it can immediately show whether this image appeared years ago in a different flood, or under a completely different caption.
Finger count is a physical check, not a context check. Asking a local contact and checking weather history could both be useful but are slower and less decisive than a reverse image search as a first step.
4. An image passes all physical tells β€” perfect hands, consistent lighting, matching reflections. But when you check context, you find it first appeared on an account created the same day as the post, with no other activity, and no news outlet has reported the event it claims to show. What should you conclude?
Correct. Physical and context checks catch different things. A clean physical check doesn't override suspicious context β€” they're independent layers, and the context flags here are strong indicators of fabrication.
The lesson explicitly says physical and context tells "catch different things" and are not redundant. Passing physical checks doesn't clear an image if context checks raise serious red flags.
5. The lesson raises the ethical concern that OSINT skills used to detect fakes can also be used to track private individuals. This is an example of what broader problem?
Exactly. This is a classic dual-use problem β€” the same capability that helps you detect disinformation can be used to violate privacy. The lesson doesn't resolve this tension, because it doesn't have a clean resolution.
The lesson doesn't argue for restricting OSINT to authorities, or that privacy concerns make detection impossible. It raises the dual-use nature of these tools as a genuine ethical tension without a neat answer.
Lesson 3 Lab

Add the Context Layer to Your Checklist

Build the context-verification section β€” and figure out which questions to ask before you even open a reverse image search.

Your Role: Context Investigator

You now have physical tells in your checklist. The context layer comes next. Context checks are different: they require you to look outside the image and ask what the world should show if this image were real.

Your collaborator will challenge you to think about which context questions are most efficient β€” the ones that can quickly confirm or rule out an image's authenticity without requiring hours of research.

Start by describing a type of news image you've seen recently (a protest, a disaster, a political event β€” real or hypothetical). Tell your collaborator: what's the first context question you'd ask about it β€” and what answer would tell you the most, fastest?
Collaborator: REED
Context Verification
Context checks require a different mindset than physical checks β€” you're not looking at the image, you're looking at the world around it. Describe a type of news image to me, and tell me what context question you'd ask first. I want to know your reasoning: why that question, and what would a positive or negative answer actually tell you?
Module 4 Β· Lesson 4

Source Tells and Assembling Your Checklist

The third layer of detection isn't in the image at all β€” it's in where the image came from. Then we put everything together into a checklist you'll actually use.
If an image is physically perfect and contextually plausible, what's left to check?

In the days following the November 2022 U.S. midterm elections, a set of images circulated showing what appeared to be ballots being destroyed or improperly handled at polling locations in several states. The images had no obvious physical tells. The scenes they depicted were plausible β€” polling locations look mundane, not fantastical.

Fact-checkers at Reuters and PolitiFact traced the images not by examining their pixels but by tracing their spread. The images had all appeared first within the same 48-hour window, from a cluster of accounts on Twitter with very similar characteristics: created within weeks of the election, few followers, no real posting history, predominantly pushing one political narrative. This pattern β€” coordinated inauthentic behavior β€” is a source tell.

None of the images could be verified as showing what they claimed. Several were traced to unrelated locations or older unconnected events. But the key initial signal wasn't the images themselves. It was where they came from, when they appeared, and how they spread.

Source tells are the third layer of your checklist β€” and in some ways the most powerful, because they work even when the image itself is impossible to definitively analyze.

Reading the Source: What to Look For

A source tell is any signal about the origin or distribution of an image that raises or lowers the probability that it's authentic. Unlike physical tells (which require close examination of the image) or context tells (which require external research), source tells are often visible immediately β€” before you've even looked at the image carefully.

Coordinated inauthentic behavior When multiple accounts or sources push the same content in a coordinated way, often to make it appear more widely believed or documented than it actually is. Identified by similar account creation dates, posting patterns, follower counts, and content themes.
Account age and posting history A newly created account pushing dramatic content with no prior history is a significant source tell. Real people who photograph real events typically have a posting history that predates the event.
Amplification pattern How quickly an image spread, and through what types of accounts. Authentic viral images typically spread from diverse sources β€” news organizations, individuals, organizations. Fabricated images often spread through a narrower ecosystem of ideologically aligned accounts.

Source tells don't prove fakery any more than a single physical tell does. A brand-new account can post a real photograph. A verified journalist can share a fake. But source tells give you prior probability β€” an estimate of how likely, before you look at the image itself, it is to be authentic. You combine that with what you find in physical and context checks to reach an overall assessment.

Assembling Your Three-Layer Checklist

You've now built the three layers of your fake-detector checklist across this module. Each layer asks different questions and catches different failures. Here is what a complete, assembled checklist looks like β€” though yours may prioritize items differently based on what you've learned and argued in the labs.

Layer 1 Β· Physical Tells
Hands: correct number and anatomy of fingers? Lighting: single consistent source, matching shadows? Eyes: matching reflections? Ears: coherent structure? Fabric/objects: physics-compliant?
Layer 2 Β· Context Tells
Reverse image search: older appearances or different captions? Corroboration: news reports, social media from area, official statements? Timestamp plausibility: lighting, season, geography consistent? Source tracing: where did it first appear?
Layer 3 Β· Source Tells
Account age: new account pushing dramatic content? Posting history: prior activity consistent with a real person? Amplification: spreading through diverse or narrow ecosystem? Coordinated pattern: multiple accounts with similar characteristics pushing the same content?
Overall Assessment
How many layers raised flags? Physical only, or all three? The more layers that show problems, the higher the probability of fabrication. But a clean checklist doesn't make an image real β€” it makes it less suspicious.

Notice what the overall assessment says: a clean checklist doesn't make an image real. This is the most important thing to keep in your head when you use this checklist. You're not certifying authenticity. You're auditing for known failure modes. Passing the audit means you didn't find evidence of fabrication β€” not that no such evidence exists.

When Your Checklist Disagrees With Itself

In February 2024, a video still image circulated showing what was claimed to be damage from a strike in a conflict zone. Physical tells: clean. Context: partially corroborated β€” some reports matched the location, but not all details. Source: the original account had been active for years and had a real posting history, but the image spread unusually fast through a specific political cluster.

No single layer gave a clear answer. Experienced fact-checkers at BBC Verify β€” the BBC's dedicated disinformation unit, launched in 2023 β€” labeled it "unverified" rather than "fake" or "real." That label is more honest than most.

Your checklist is not a binary machine. It doesn't output TRUE or FALSE. It outputs an evidence summary that you have to interpret. Sometimes the honest answer is "I cannot verify this" β€” and that's a more accurate and useful conclusion than a false certainty in either direction.

Ethical Question β€” No Clean Answer

If you run your checklist on an image and it comes back "unverified," what's the responsible thing to do with that image? Not sharing it prevents potential misinformation. But not sharing it also means potentially suppressing true information about a real event. Journalists, platforms, and individuals face this choice in real time, daily. There is no consensus answer. Knowing that the choice is genuinely hard is part of what makes you more thoughtful than someone who just shares without thinking.

Here is what three modules of this course plus this final module have built: you can now see images the way a trained investigator sees them. You check for physical impossibilities. You look for absent corroboration. You read the source. You know that your gut feeling is unreliable and you have a method to replace it. You know the method has limits. You know those limits honestly, which makes you more trustworthy β€” not less β€” as a person who shares information. Most people will never build that framework. You have it now. Use it carefully.

Lesson 4 Quiz

Source Tells and the Complete Checklist

5 questions Β· Put all three layers together.
1. The 2022 election images case was initially flagged mainly because of what type of tell?
Correct. The initial signal was a source tell β€” coordinated inauthentic behavior from a cluster of similar accounts β€” before any content analysis was performed.
The lesson says the images had no obvious physical tells, and the key initial signal was the source pattern: new accounts, limited history, pushing one narrative in a coordinated way.
2. Source tells provide "prior probability." What does this mean in the context of your checklist?
Right. Prior probability is your starting estimate of authenticity, before detailed checking. Source tells raise or lower that estimate, which then gets updated by what you find in physical and context checks.
Prior probability means your estimate before investigation, not a ranking of checklist layers or a prior verification. Source tells are one input into a cumulative assessment across all three layers.
3. An image passes all physical checks, has some corroborating context, but comes from a brand-new account with no prior history. BBC Verify labeled a similar case "unverified" rather than "fake" or "real." Why is "unverified" a more responsible label in this situation?
Exactly. When evidence from different layers points in different directions, the honest conclusion is genuine uncertainty β€” not forced certainty in either direction. "Unverified" is accurate, not evasive.
The lesson explicitly says "unverified" is more honest than false certainty in either direction. It's not a legal protection or a code for "probably fake" β€” it's an accurate description of a genuinely uncertain evidentiary situation.
4. The lesson says "a clean checklist doesn't make an image real." Apply this: a photograph passes all three layers of your checklist. What is the most accurate conclusion?
Correct. Passing the checklist means you didn't find known failure modes β€” not that none exist. You're auditing for what you know to look for, which is necessarily incomplete. Less suspicious is not the same as verified real.
The lesson is very clear: "passing the audit means you didn't find evidence of fabrication β€” not that no such evidence exists." A clean checklist reduces suspicion but doesn't certify authenticity.
5. The lesson asks: if your checklist returns "unverified," is it more responsible to share the image or not share it? This question is left unresolved because:
Right. This is a genuine ethical tension where both options cause potential harm. Not sharing might suppress truth; sharing might spread falsehood. Understanding that the choice is genuinely hard is more useful than a false rule.
The lesson deliberately leaves this open because there's no universally correct answer β€” both not sharing and sharing carry real risks in different situations. Recognizing that is the point.
Lesson 4 Lab

Complete and Defend Your Checklist

Finalize your three-layer fake-detector checklist β€” then stress-test it against a scenario your collaborator throws at you.

Your Role: Lead Investigator

You've built three layers: physical tells, context tells, and source tells. Now you're going to present your complete checklist to your collaborator β€” and they're going to test it with a scenario. You'll need to apply your own checklist in real time and defend your conclusions.

This isn't about getting the "right" answer. It's about showing that you can use your own system under pressure and know its limits honestly.

Start by presenting your finalized checklist β€” all three layers, in the order you'd actually use them. Your collaborator will then give you a scenario to work through.
Collaborator: REED
Full Checklist Stress Test
This is the final test for your checklist. Present it to me β€” all three layers, in the sequence you'd actually run them on a suspicious image. Be specific: I want actual questions, not category names. Once I see your checklist, I'll give you a scenario and you'll walk me through your assessment using it. Go ahead.
Module 4 Β· Final Assessment

Module Test: Build Your Own Fake-Detector Checklist

15 questions Β· 80% required to pass Β· Tests reasoning across all four lessons.
1. What is the central argument for using a checklist rather than gut instinct to detect AI-generated images?
Correct. The checklist argument rests on the demonstrated unreliability of confident gut instinct, supported by both the MIT study and the surgical safety literature.
The argument is specifically about the unreliability of expert confidence and the measured improvement from systematic methods, not about speed or complexity.
2. Atul Gawande's surgical safety checklist reduced major complications by 36%. What does this have to do with detecting AI fakes?
Right. The structural parallel is what matters: in both domains, expert confidence fails in predictable ways, and a forced-attention checklist provides a measurable improvement.
The connection isn't about shared questions or researchers β€” it's about the underlying failure mode (expert confidence) and the shared solution (systematic forced attention).
3. Why do AI image generators consistently struggle with hands?
Correct. The problem is statistical, not anatomical. Hands in photos are highly variable in orientation, so the learned pattern is weak and inconsistent.
AI generators don't use anatomical knowledge or intentional restrictions. The hand problem comes from statistical noise in the training data.
4. In the February 2023 Pope Francis puffer jacket case, which physical tell was clearly visible but widely missed?
Correct. The hand had six fingers and the rosary violated physics β€” but viewers focused on the convincing jacket texture and missed these tells entirely.
The lesson specifically identifies the six-fingered hand and impossible rosary as the visible tells that people missed because they were focused on the convincing clothing details.
5. Why is the ear-to-neck region a useful place to check in AI-generated portrait images?
Right. Because ears are often partially hidden, blurred, or cut off in photographs, the AI's training data for ear structure is sparser and its statistical model for that region is weaker.
The ear weakness comes from limited clear training examples, not copyright restrictions or mathematical complexity. The statistical model is weaker because the region is rarely photographed clearly.
6. Physical tells can catch fakes that context checks miss, and vice versa. What does this mean for how you should use your checklist?
Correct. The two layers are independent, not redundant. Running both provides two separate evidentiary inputs that together produce a more reliable overall assessment.
The lesson explicitly states the two layers "catch different things" and are not redundant. Running only one layer and stopping is exactly what the checklist is designed to prevent.
7. Bellingcat investigated the 2023 Kremlin explosion images primarily by asking: "If this event happened, what other evidence should exist?" This is an example of which checklist layer?
Right. Bellingcat's approach β€” looking for independent corroboration in traffic data, flight patterns, news reports β€” is the core of context-layer checking.
Bellingcat's key question β€” what should the real world show if this is true? β€” is a context check, not a physical or source check. They looked outside the images entirely.
8. A photograph shows a city street in heavy snow. The caption claims it was taken in July in Singapore. Which context check catches this?
Correct. Timestamp plausibility checks whether the image's environmental conditions are consistent with the claimed time and location. Singapore doesn't have snow in July β€” or any month.
While other context checks might also be useful here, the most direct and immediate catch is timestamp plausibility: the weather depicted is geographically and seasonally impossible for the claimed time and place.
9. Most social media platforms strip EXIF metadata from uploaded images. For your checklist, this means:
Right. Since real photos also lose metadata routinely on social platforms, absent metadata cannot be used as definitive proof of AI generation. It's a weak signal, not a conclusion.
The lesson explicitly warns against treating absent metadata as proof of fabrication. Both real and fake images end up without metadata on most social platforms.
10. The 2022 U.S. election ballot images were flagged by Reuters and PolitiFact because of what pattern?
Correct. The pattern of coordinated inauthentic behavior β€” multiple accounts with similar suspicious characteristics all pushing the same content β€” was the key initial signal.
The initial flag came from source-layer analysis of account characteristics and posting patterns, not from content analysis or official statements.
11. "Coordinated inauthentic behavior" is a source tell. Which scenario below best describes it?
Right. Coordinated inauthentic behavior is characterized by multiple accounts with similar suspicious profiles acting in concert β€” new, low-activity accounts pushing the same content simultaneously.
A single high-profile account, organic diverse spread, and simple reposts are not patterns of coordinated inauthentic behavior. The key markers are multiple similar accounts acting in concert.
12. Your checklist shows clean physical tells and partially corroborating context, but suspicious source tells. BBC Verify's approach to this kind of case was to label it "unverified." Why is this more honest than calling it "probably fake"?
Correct. When layers point in different directions, the genuinely honest conclusion is uncertainty. "Probably fake" would overstate what the evidence shows. "Unverified" accurately describes the evidentiary state.
This isn't about legal language or politeness. When different checklist layers give conflicting signals, the accurate conclusion is genuine uncertainty β€” not forced probability in either direction.
13. The lesson discusses how publishing detailed fake-detection guides might help AI developers know what to fix. This is an example of what type of problem?
Right. This is the arms-race dynamic: detection guides help people spot fakes, but they also signal to developers what failures to fix. Both sides improve in response to the other.
The issue described is the arms-race dynamic between detection and generation β€” not copyright, privacy, or defamation.
14. "A clean checklist doesn't make an image real." Apply this principle: which conclusion correctly follows from an image passing all three checklist layers?
Exactly right. A clean checklist means your audit found no evidence of the specific failure modes you were checking for β€” not that the image is necessarily authentic.
The lesson is explicit: passing the checklist reduces suspicion but doesn't certify authenticity. You're auditing for known failure modes, which is necessarily incomplete.
15. Across all four lessons, the same warning appears repeatedly: your checklist has limits. What is the most responsible use of a checklist, given those limits?
Correct. The checklist produces an evidence summary, not a verdict. Its value is in systematically reducing uncertainty and flagging known failure modes β€” with honesty about its limits. That's more useful than a false certainty in either direction.
The consistent message across all lessons is that the checklist reduces uncertainty but doesn't eliminate it. Using it to certify or to replace honest uncertainty with false confidence is exactly what the lessons argue against.