On October 12, 2006, a surgical team at a respected Seattle hospital operated on the wrong knee. The patient, a 65-year-old man named Donald Church, had come in for his right knee. His chart said right knee. His consent form said right knee. The surgeon was experienced β thousands of procedures behind him. And yet, the incision opened on the left leg.
Investigations afterward found no reckless behavior, no negligence in the ordinary sense. The surgeon simply believed he remembered which leg it was. He trusted his expert memory over the written record. The same thing had happened 57 times in American operating rooms that year alone, according to Joint Commission data.
The fix that followed β championed by surgeon and author Atul Gawande after studying aviation safety β was humblingly simple: a printed checklist. Surgeons had to mark the correct limb with a marker before the procedure began. Wrong-site surgeries dropped by over 40% within two years of widespread adoption.
The lesson wasn't that surgeons were bad. The lesson was that expert confidence is one of the most dangerous failure modes a human mind has. A checklist doesn't replace expertise. It catches what expertise misses.
When you look at an AI-generated image and try to decide if it's real, your brain does something that feels like careful analysis but often isn't. It pattern-matches against what feels familiar. It takes shortcuts. And the better the fake, the harder those shortcuts fail β because high-quality AI images are specifically optimized to trigger the "looks normal to me" response.
Researchers at MIT's Media Lab published a study in 2023 showing that human participants correctly identified AI-generated faces only 51.2% of the time β barely better than flipping a coin. Participants who said they were "very confident" in their answers were wrong nearly as often as those who admitted uncertainty. Confidence didn't track accuracy at all.
This is exactly the same failure mode as the wrong-site surgery. The surgeon was confident. The participants were confident. Confidence is not a checklist. Confidence is a feeling.
A checklist forces you to slow down and examine specific, predetermined things β not whatever happens to catch your eye. It makes the process systematic instead of impressionistic (based on feel rather than method). That's the difference between guessing and detecting.
In 2009, the World Health Organization published the Surgical Safety Checklist that Gawande had helped design. It had 19 items. Hospitals that used it saw a 36% reduction in major complications. The checklist wasn't magic β it was forced attention. It made experts check things they'd otherwise assume were fine.
A fake-detection checklist works on the same principle. Instead of looking at an image and thinking "does this feel right?", you run through a fixed set of specific questions. Does the lighting source match across the entire image? Do the reflections in the eyes match each other? Are there an even number of fingers? Is the text readable or garbled?
Each question targets a known failure mode β a place where current AI image generators consistently stumble. When you check them one by one, you're not relying on your general impression. You're auditing specific systems, the way an aircraft mechanic checks a plane before flight.
Commercial pilots use pre-flight checklists even though they've done the same checks thousands of times. Not because they're forgetful β because the stakes are too high to rely on memory. The checklist isn't a training tool. It's a permanent operational tool. Your fake-detector checklist works the same way: you'll use it every time, not just while you're learning.
Here's the ethical question you should sit with before this lesson ends: If a checklist can reliably catch 80% of AI fakes, does that mean the 20% that pass become "officially real" in the eyes of anyone who uses it? Does having a system give you false confidence in the things the system misses? There's no clean answer. Surgeons with checklists still make mistakes. Checklists don't make you certain β they make you less wrong.
Over the next three lessons, you're going to build a personal fake-detector checklist from the ground up. Each lesson adds a new layer β physical tells, context tells, and source tells. By the end of this module, you won't be guessing. You'll have a structured, repeatable process.
This lesson establishes the foundation: why checklists beat gut instinct, and what makes a checklist question useful versus useless. A useful checklist question has three properties: it targets a specific known failure mode, it has a clear yes/no answer (not "does this look weird?"), and it applies broadly to many different kinds of images β not just one type.
Most people look at an AI-generated image and rely entirely on their gut feeling. You now understand that gut feeling is unreliable, why it's unreliable, and that a systematic checklist approach is measurably more accurate. That's not a trivial insight β it's the same insight that reduced surgical deaths in hospitals worldwide. You can apply it every time you encounter an image that might be fake.
The checklist isn't something you download. You're building it yourself, piece by piece, so that you actually understand why each item is on it. By the time you're done, it'll be yours β not a set of rules you memorized, but a set of questions you trust because you know where they came from.
You've learned why checklists beat gut instinct. Now you're going to start building yours. Your collaborator below knows a lot about AI image generation β but they're not going to hand you a checklist. They're going to challenge your thinking until you arrive at questions that actually work.
A good checklist question targets a specific known failure mode, has a clear yes/no answer, and applies broadly. Your job is to propose questions and defend why they belong on the list.
On February 5, 2023, a photograph of Pope Francis wearing a large, puffy white Balenciaga-style puffer jacket spread across Twitter, Reddit, and WhatsApp. Within 48 hours it had been seen by an estimated 27 million people. Many of them believed it was real β including, reportedly, several journalists who initially included it in news round-ups before deleting their posts.
The image was generated using Midjourney v5, which had launched just days earlier with dramatically improved photorealism. The creator, a Chicago man named Pablo Xavier, posted it to a Reddit forum as an obvious joke. It did not stay a joke.
What's most instructive about this case is not that the image spread β it's why people believed it. They focused on what was convincing: the fabric texture, the lighting on the jacket, the general setting. They didn't look at what wasn't convincing: the Pope's right hand, which has six fingers. The hands in the image are anatomically wrong. The rosary he's holding passes through his palm in a way that defies physics. And the background architecture, when examined closely, contains columns that merge into each other at impossible angles.
Nobody looked. They saw what they expected to see β a celebrity in a funny outfit β and their brain filled in the rest.
AI image generators learn by finding statistical patterns in enormous collections of images. When they generate a hand, they're not drawing a hand based on knowledge of anatomy β they're producing pixels that statistically correlate with "hand" in their training data. Hands are hard because they appear in photographs at wildly different angles, scales, and positions. The statistical pattern for "hand" is messy.
This creates specific, repeatable failure points that you can add to your checklist:
These aren't aesthetic quirks. They're structural failure points caused by how AI generation works. They don't all appear in every image β but checking for them systematically is far more reliable than a general "does this look weird" impression.
Researchers at the University of Copenhagen published a study in December 2022 examining AI-generated faces at scale. One of their most consistent findings: ears. Specifically, the area where the ear connects to the jaw and neck.
In real photographs, this area shows a consistent anatomical structure β the tragus (the small cartilage flap in front of the ear canal), the earlobe, and the neck muscle all align in a way that follows from how the human skull is built. In AI-generated faces, this region is frequently blurred, structurally inconsistent, or shows hair growing in directions that don't follow the curve of the skull.
Why ears? Because ears are rarely the focus of a photograph. Training data contains millions of images where the ear is partially hidden, blurred, turned away, or cut off by the frame. The AI has seen fewer clear examples of ears than of eyes or noses, so its statistical model for "ear" is weaker.
Hands: Count the fingers. Check for merging, extra knuckles, or joints that bend backward.
Lighting: Find the light source. Does every shadow in the image agree on its direction?
Eyes: Do both eyes reflect the same thing? Are the reflections consistent with the described scene?
Ears: Is the ear-to-neck connection anatomically coherent? Is hair growing in plausible directions?
Fabric and objects: Does anything pass through anything else? Does fabric fold against gravity?
Now here's the institutional version of the question you've been thinking about: In March 2023, the Associated Press β one of the world's most respected news agencies β published formal guidance for its journalists on AI-generated images. Their checklist for editors included several of the physical tells above. The AP, with decades of photo-verification experience, decided that a systematic checklist was more reliable than leaving the call to individual editors' judgment. If it works for the AP, it works for you.
Here is the uncomfortable truth about physical tells: they are a moving target. The finger-count problem that was extremely reliable in early 2023 had become significantly less reliable by late 2023, as model training improved. Midjourney v6, released in December 2023, produces hands that pass casual inspection most of the time.
This means your checklist needs a maintenance mindset. The principle is stable β AI generators fail at physics because they don't understand physics. But the specific failures shift as models improve. A checklist built in 2023 is not necessarily a checklist that works in 2025.
If physical tells become less reliable as AI improves, does publishing detailed checklists help the people trying to detect fakes β or does it mainly help AI developers know what to fix next? There's a real argument that widely-shared detection guides accelerate the arms race. Think about that before you decide how publicly to share your checklist.
You can now look at an image and do something almost no one outside of professional fact-checking teams does: a structured audit of physical plausibility. You check hands, lighting, reflections, ears, fabric physics. You do it in order. You don't stop because the image "looks real overall." That's the difference between a detector and a viewer.
You've studied lighting, hands, reflections, ears, and object physics. Now you're going to decide which of these belong in the physical tells section of your checklist β and in what order. Order matters: you want the most reliable, fastest checks first.
Your collaborator will push back on your reasoning. They may argue that some tells you want to include are too vague, or that you're missing a category entirely. You have to defend your choices.
In March 2023, images began circulating on Telegram channels showing what appeared to be explosions near the Kremlin in Moscow. The photographs were striking β columns of smoke, dramatic lighting, fragments in the air. Some had minimal physical errors. A few passed basic visual inspection by casual observers and were shared widely as evidence of a drone attack.
The open-source intelligence organization Bellingcat β which had built its reputation on systematic verification during the 2014 Ukraine conflict β didn't start with the images themselves. They started with the context. Their investigators asked: if this event happened, what other evidence should exist?
There were no corroborating social media posts from the area. No flight-tracking data showed rerouted aircraft. Moscow traffic cameras β many of which Bellingcat routinely archives β showed no unusual activity near the Kremlin at the timestamps the images supposedly captured. No Russian government official or media outlet reported anything, which would be extraordinary if a genuine explosion had occurred at the seat of government.
The images were assessed as fabricated β not because they failed a physics check, but because the entire context they claimed to document didn't exist. The real world had left no trace of the event those images depicted.
Physical tells examine what's inside an image. Context tells examine what's outside it β specifically, whether the world the image claims to document left any other evidence of its existence.
This is the method Bellingcat and other open-source intelligence (OSINT β pronounced "OH-sint") organizations use. OSINT is the practice of using publicly available information β social media, satellite imagery, traffic cameras, news archives, flight tracking β to verify or disprove claims about real-world events.
For your checklist, context tells become a second layer of questions: not "does this image look right?" but "does the world this image claims to show make sense?" You're looking for the absence of corroboration β the dog that didn't bark.
Every photograph taken by a digital camera or smartphone embeds invisible data in the image file called EXIF metadata. This data includes the camera model, lens information, date and time, GPS coordinates if enabled, and software used to process the image. For decades, this was a powerful verification tool β you could check whether the recorded GPS coordinates matched the claimed location, or whether the timestamp made sense.
Here is the problem: most social media platforms, including Facebook, Instagram, Twitter/X, and WhatsApp, automatically strip EXIF data from every image uploaded to their servers. This is partly for user privacy β location data in photos was being used to track people. But the side effect is that by the time you see an image on social media, its metadata is already gone.
AI-generated images typically have no EXIF data at all, or generic placeholder data from the generation software. But the absence of EXIF data doesn't prove an image is fake β real images lose their metadata too, routinely. You can't use "no metadata = fake." You can note it as a weak signal, but it's not definitive.
Reverse image search: Does this image appear anywhere older than the claimed event? Does it appear in a different context, with different captions?
Corroboration: If this event happened, what else should exist? News reports? Social media posts from people in the area? Flight data? None of those existing is a strong signal.
Timestamp plausibility: Does the lighting in the image (time of day, season) match the claimed time and location? Is there snow in a city where it was 30Β°C that week?
Source tracing: Where did this image first appear? Who posted it? Can you find the original account or publication?
Reverse image search β available through Google Images, TinEye, and Yandex Images β is one of the most powerful and underused tools available to anyone. In 2022, Yandex's reverse image search proved particularly effective at identifying AI-generated faces because its index included some of the original training datasets. A face that shows up in multiple contexts under different names is a strong indicator of fabrication or misrepresentation.
Here's a scenario that happens in real investigations: an image passes all physical checks β hands look right, lighting is consistent, reflections match β but the context is impossible. A photograph of a building that allegedly burned down in 2021, in which the building has a sign for a business that wasn't founded until 2023. Or a "street photo" from one country in which a license plate style belongs to a different country entirely.
Context tells can catch fakes that physics misses. And the reverse is true: an image can have a real-world event behind it and still have suspicious physical qualities due to heavy processing or compression.
This is why your checklist needs both layers. They're not redundant β they catch different things. Running physical checks and then context checks gives you two independent sets of evidence. If both point toward fake, your confidence is high. If they point in different directions, that's an important flag β it means something unusual is happening that deserves more investigation.
The context-verification method you just learned β asking what other evidence should exist if this event were real β is the same fundamental approach used by Bellingcat, the BBC's disinformation unit, and the Stanford Internet Observatory. You're not learning this for a test. You're learning it because this is how professionals actually work. Next time you see a dramatic news image, you'll automatically think: "What should the world show if this were real?" That's a different way of seeing.
One more ethical tension before you move on: OSINT tools like reverse image search and location verification give individuals significant power to investigate public claims. But those same tools can be used to track private individuals β finding where someone lives from a background detail in a photo, for example. The skills that make you a better fake-detector also make you capable of serious privacy violations. How you use them is a choice that doesn't have a built-in answer.
You now have physical tells in your checklist. The context layer comes next. Context checks are different: they require you to look outside the image and ask what the world should show if this image were real.
Your collaborator will challenge you to think about which context questions are most efficient β the ones that can quickly confirm or rule out an image's authenticity without requiring hours of research.
In the days following the November 2022 U.S. midterm elections, a set of images circulated showing what appeared to be ballots being destroyed or improperly handled at polling locations in several states. The images had no obvious physical tells. The scenes they depicted were plausible β polling locations look mundane, not fantastical.
Fact-checkers at Reuters and PolitiFact traced the images not by examining their pixels but by tracing their spread. The images had all appeared first within the same 48-hour window, from a cluster of accounts on Twitter with very similar characteristics: created within weeks of the election, few followers, no real posting history, predominantly pushing one political narrative. This pattern β coordinated inauthentic behavior β is a source tell.
None of the images could be verified as showing what they claimed. Several were traced to unrelated locations or older unconnected events. But the key initial signal wasn't the images themselves. It was where they came from, when they appeared, and how they spread.
Source tells are the third layer of your checklist β and in some ways the most powerful, because they work even when the image itself is impossible to definitively analyze.
A source tell is any signal about the origin or distribution of an image that raises or lowers the probability that it's authentic. Unlike physical tells (which require close examination of the image) or context tells (which require external research), source tells are often visible immediately β before you've even looked at the image carefully.
Source tells don't prove fakery any more than a single physical tell does. A brand-new account can post a real photograph. A verified journalist can share a fake. But source tells give you prior probability β an estimate of how likely, before you look at the image itself, it is to be authentic. You combine that with what you find in physical and context checks to reach an overall assessment.
You've now built the three layers of your fake-detector checklist across this module. Each layer asks different questions and catches different failures. Here is what a complete, assembled checklist looks like β though yours may prioritize items differently based on what you've learned and argued in the labs.
Notice what the overall assessment says: a clean checklist doesn't make an image real. This is the most important thing to keep in your head when you use this checklist. You're not certifying authenticity. You're auditing for known failure modes. Passing the audit means you didn't find evidence of fabrication β not that no such evidence exists.
In February 2024, a video still image circulated showing what was claimed to be damage from a strike in a conflict zone. Physical tells: clean. Context: partially corroborated β some reports matched the location, but not all details. Source: the original account had been active for years and had a real posting history, but the image spread unusually fast through a specific political cluster.
No single layer gave a clear answer. Experienced fact-checkers at BBC Verify β the BBC's dedicated disinformation unit, launched in 2023 β labeled it "unverified" rather than "fake" or "real." That label is more honest than most.
Your checklist is not a binary machine. It doesn't output TRUE or FALSE. It outputs an evidence summary that you have to interpret. Sometimes the honest answer is "I cannot verify this" β and that's a more accurate and useful conclusion than a false certainty in either direction.
If you run your checklist on an image and it comes back "unverified," what's the responsible thing to do with that image? Not sharing it prevents potential misinformation. But not sharing it also means potentially suppressing true information about a real event. Journalists, platforms, and individuals face this choice in real time, daily. There is no consensus answer. Knowing that the choice is genuinely hard is part of what makes you more thoughtful than someone who just shares without thinking.
Here is what three modules of this course plus this final module have built: you can now see images the way a trained investigator sees them. You check for physical impossibilities. You look for absent corroboration. You read the source. You know that your gut feeling is unreliable and you have a method to replace it. You know the method has limits. You know those limits honestly, which makes you more trustworthy β not less β as a person who shares information. Most people will never build that framework. You have it now. Use it carefully.
You've built three layers: physical tells, context tells, and source tells. Now you're going to present your complete checklist to your collaborator β and they're going to test it with a scenario. You'll need to apply your own checklist in real time and defend your conclusions.
This isn't about getting the "right" answer. It's about showing that you can use your own system under pressure and know its limits honestly.