In April 2023, a photograph began circulating that showed Pope Francis wearing a long white puffer jacket β the kind you'd see on a rapper at a fashion show, not on the leader of the Catholic Church. It was striking, it was shareable, and within hours it had spread to millions of accounts. People laughed. People debated. Then people started asking: wait, is this real?
It wasn't. The image had been created using Midjourney v5, an AI image generator that had launched just weeks earlier. The person who made it, a 31-year-old Chicago construction worker named Pablo Xavier, posted it to a Reddit community called r/midjourney as a casual experiment. He did not expect the internet to treat it as a news photograph.
But here is what matters for you, right now: the image had clues. Not subtle ones. The Pope's left hand, partially visible near the jacket's hem, had too many fingers. The stitching on the jacket repeated in a pattern that doesn't exist in real fabric. The background showed a wall with text that almost spelled real words β but dissolved into nonsense on close inspection. Millions of people saw the image and missed every single one of those clues.
You are about to learn how not to be one of those millions.
Here is something that will change how you look at every AI image from now on: hands are statistically hard. To understand why, you need to understand β at least a little β how image generators actually work.
AI image generators like Midjourney, DALL-E, and Stable Diffusion are not drawing pictures the way a human artist draws. They are doing something stranger. They were trained on hundreds of millions of photographs and artworks, and through that training they built a statistical model β a kind of probability map β of what pixels tend to appear next to other pixels in images of the world.
When you type "Pope wearing puffer jacket," the AI samples from that probability map, gradually building an image that looks statistically plausible. But here's the catch: hands appear in images in hundreds of thousands of different configurations. Five fingers. Four. Partially hidden. Clenched. Open. Seen from above. Seen from the side. Held at angles. The sheer variety means the model's probability map for "hand" is messy and uncertain. It produces something that looks hand-shaped β but the details collapse under examination.
The result is the tell-tale sign that researchers and fact-checkers started calling the "six-finger problem" as early as 2022. It isn't always six fingers. Sometimes it's four. Sometimes fingers merge into each other. Sometimes a thumb grows from the wrong side of a palm. But it is almost always something.
Diffusion model: The technology inside most modern AI image generators. It starts with pure visual noise (like static on an old TV) and gradually "denoises" it β step by step β until a coherent image emerges. The process is guided by your text prompt. The final image is statistically plausible but not photographically real.
Fingers aren't the only tell. Zoom in on fabric in an AI image β a shirt collar, a jacket sleeve, a tablecloth β and you'll notice something strange. The texture looks correct from a distance, but up close it tiles. It repeats. Or worse: it generates a pattern that is almost right but lacks the specific kind of irregularity that real woven fabric has.
Real textiles are made by machines that have tolerances, by yarn that has inconsistencies, by hands that touched the cloth and left microscopic traces. AI images contain none of that physical history. What they contain instead is a statistically averaged version of what fabric looks like β smooth where real fabric is rough, perfectly repeating where real fabric is varied.
In the Pope puffer jacket image, fabric researchers who examined the JPEG noticed that the white quilting pattern on the jacket's chest had a bilateral symmetry β it was almost perfectly mirrored left-to-right β that no real manufactured jacket would have. That symmetry was the model trying to be "balanced," following a visual logic that has nothing to do with how clothing is actually made.
The same principle applies to skin. Real skin has pores, fine hairs, blemishes, asymmetry. AI skin β especially on faces β tends toward an uncanny smoothness. Not perfect; but smoother than real skin by a margin that trained eyes catch immediately. Dermatologists, unusually, became some of the earliest reliable human detectors of AI-generated portrait photography, because they had spent careers staring at skin at exactly the resolution where AI fails.
You now know something that most adults scrolling past that Pope image did not know: what to look for. Fingers, fabric texture, skin smoothness, and symmetry aren't just details β they're the places where the statistical model's uncertainty shows through. Knowing this makes you a more careful reader of visual information than the majority of the internet.
Pablo Xavier did not create that image to deceive anyone. He shared it in an AI art community as a creative experiment. He later told BuzzFeed News he was surprised by how quickly it spread and was treated as real. He didn't watermark it. He didn't label it as AI. But he also didn't caption it "this is a real photo of the Pope."
So here is the question, and there is no clean answer: Does the creator of an AI image have a responsibility to prevent misuse β even if they never intended misuse?
If you make something that looks real enough to fool millions of people, does it matter that you didn't mean for that to happen? Should AI art platforms automatically add invisible or visible watermarks? And if they did β would that be censorship of a new art form, or a reasonable safety measure? Who decides?
There are smart, reasonable people on every side of this. The question is live right now in legislative bodies in the United States, the European Union, and China. What you think about it matters β because policies being written today will shape what you're allowed to create and what you'll be shown for the rest of your life.
Pause point: If you're reading this in one sitting and want to stop here, you have the core insight for L1: AI images hide their artificiality in specific, learnable places β hands, fabric, skin. The next lesson goes deeper into faces and background artifacts. Come back when you're ready.
In February 2024, Reuters published an investigation into a network of fake expert profiles that had appeared across LinkedIn, research paper co-author lists, and policy organization websites. The profiles featured photorealistic headshots β serious, credible-looking faces attached to names like Dr. Emily Chen and Professor James Okafor and citations to journals that didn't exist.
Every single one of the faces had been generated using This Person Does Not Exist β a website launched in 2019 by Philip Wang, a software engineer at Uber, which generates a new photorealistic AI face on every page refresh. Wang built it to demonstrate the capabilities of a type of AI architecture called a GAN (Generative Adversarial Network). He did not build it as a fraud tool. But the faces it produced were being used to give false credibility to fabricated expert opinions on climate policy, vaccine safety, and election integrity.
Reuters trained a team of journalists to spot the tells. The checklist they built was specific, learned, and effective. It is also exactly what you are about to learn.
Human ears are one of the most individually distinctive features on the body. No two ears are the same β the precise curve of the helix, the size of the earlobe, the angle of the tragus (the little flap in front of the ear canal). In AI-generated faces, ears are frequently malformed in subtle ways: slightly asymmetrical between left and right, missing the inner structure of the concha (the bowl-shaped hollow), or blurring where the ear meets the jawline.
But the most reliable tell β the one that Reuters' team used most often β is earrings. AI models struggle with jewelry that attaches to a specific anatomical point. If a generated face is wearing earrings, look closely: one earring may be slightly higher than the other. One may penetrate the lobe at an angle that's anatomically impossible. Sometimes an earring on one side of a face is simply missing on the other.
Eyes are the other key checkpoint. Real eyes have a catchlight β a tiny reflection of the light source in the room, usually a window or lamp. In real photographs, this catchlight appears in the same relative position in both eyes. In AI images, catchlights sometimes appear in different positions in each eye, or are missing entirely from one eye, or are duplicated. The model knows eyes should have catchlights, but doesn't consistently apply the geometry that makes them physically accurate.
There's also what researchers informally call the iris smear: in AI-generated faces, the boundary between the iris (the colored part of the eye) and the sclera (the white part) is often too smooth β too perfectly circular. Real irises have slight imperfections at their edge. Real eyes also have visible blood vessels in the whites. AI irises often look printed on, like a perfect disk, rather than grown.
GAN (Generative Adversarial Network): An older AI architecture where two neural networks compete β one generating images, one trying to detect fakes. The generator gets better by trying to fool the detector. GANs produce highly realistic faces but have specific, consistent failure patterns. Newer diffusion models have different (but also learnable) failure patterns.
Look past the face in an AI-generated headshot and you will almost always find the same thing: the background dissolves. Objects in the background that should be sharp become soft and undefined. A bookshelf behind a "professor" might have books β but the book spines have no titles, or titles that blur into visual noise that almost looks like letters. A plant in the corner might have leaves β but the leaves are too perfect, too symmetrically arranged, and the pot has no drainage holes or soil texture.
This is not a software limitation that newer models have fixed. It's a feature of how diffusion models allocate computational attention. The model "knows" that the face is the subject, so it concentrates its pixel-probability work on the face. The background gets the statistical average of what backgrounds look like β which is plausible from a distance but collapses under scrutiny.
The most extreme version of this is text. AI image models β even the most advanced ones as of 2024 β are notoriously bad at rendering legible text. A newspaper in someone's hand in an AI image will have headlines made of letter-shaped forms that are not actual letters. A coffee cup label might start with recognizable letterforms and then dissolve into glyphs from no real alphabet. A street sign might say something like "RSTED ST" or "AVEN E" β almost correct, but not.
This is because text, unlike faces or trees, has an exact correct answer. "STOP" must say exactly S-T-O-P, in that order, in that shape. The probabilistic nature of AI generation β which thrives on producing plausible approximations β fundamentally struggles with anything that requires precision. Text in backgrounds is one of the fastest single checks you can run on a suspected AI image.
You now read faces differently than most people. When you see a headshot β a profile photo, a news source's expert, a product review avatar β you have a mental checklist: ears, earring symmetry, eye catchlights, iris boundaries, background sharpness, any text. This is what professional image forensics analysts are trained to do. You're doing it now.
The Reuters investigation uncovered something important: the fake expert profiles weren't used to sell products or go viral for laughs. They were used to manufacture the appearance of expert consensus on contested scientific questions β making it look like credentialed researchers agreed with claims that real researchers disputed.
This is a specific kind of harm. It doesn't trick you into thinking a celebrity wore a funny jacket. It tricks you β and policymakers, and journalists β into thinking that expert opinion exists where it doesn't. The credibility borrowed by a fake face with a fake PhD is borrowed from every real scientist who has spent decades earning theirs.
Here is the ethical question without a clean answer: If you can generate a convincing fake expert face in five seconds, who is responsible for the harm when it's used to mislead? The person who generated it? The platform that hosts the generator? The organization that used the face? The platform where the fake profile appeared? All of them?
Real legal cases in the US and EU are working through exactly these questions right now. The answers will define what's legal, what's ethical, and what's simply possible for the next generation of image technology β which means for your entire adult life.
In March 2022, days after Russia's full-scale invasion of Ukraine began, a set of photographs circulated online purporting to show Ukrainian soldiers surrendering in Kherson. The images showed groups of men in military fatigues, hands raised, in what appeared to be an urban street setting. Ukrainian and Western fact-checkers at organizations including Bellingcat β the open-source intelligence group founded by Eliot Higgins in 2014 β began examining the images within hours.
The analysts who flagged the images as suspicious first didn't cite fingers or text. They cited shadows. In the photographs, shadows cast by the soldiers fell in directions inconsistent with the shadows cast by the buildings behind them. The sun, in other words, appeared to be in two different places at once. One set of shadows pointed roughly northwest. Another pointed east. A single real outdoor scene with a single sun cannot produce shadows pointing in two different directions.
This is one of the clearest signatures of early AI image generation, and it remained a reliable tell even as other artifacts improved. The reason is fundamental: AI models learn that "outdoor scenes have shadows" but do not consistently enforce the physics that all shadows in a scene must share a single light source.
Every real photograph taken outdoors has exactly one sun. Every real photograph taken indoors has a finite number of light sources β lamps, windows, overhead lights β each producing shadows with consistent, predictable geometry. When you photograph a person standing in front of a building at noon, their shadow points north. The building's shadow points north. The lamppost's shadow points north. Everything in the scene agrees.
AI image generators produce shadows by learning statistical associations: this type of object, in this type of setting, tends to have a shadow that looks roughly like this. But they don't always enforce the rule that all shadows in a scene must be consistent with the same light source. The result is what forensic analysts call light source inconsistency β and once you learn to see it, you cannot unsee it.
To check for it, pick two objects in the image β any two β and trace the direction of their shadows. Do they point the same way? Are the shadows sharp or soft (sharp shadows mean a single, distant light source like the sun; soft shadows mean a diffuse or multiple light source like an overcast sky)? Do the hardness of the shadows match across the image? A scene where one object has hard shadows and another has soft shadows has no physical explanation β it is a composite or an AI image.
Reflections are even more revealing. Water, glass, polished floors, and metal surfaces all reflect light in ways governed by strict geometry β the angle of incidence equals the angle of reflection, always. AI models often get reflections approximately right but violate the geometry in specific ways: a reflection that doesn't mirror the actual object, a reflection visible in glass that doesn't correspond to anything in the scene, or a puddle that reflects a sky that doesn't match the sky visible above it.
Light source inconsistency: A forensic indicator where shadows, highlights, or reflections in an image are inconsistent with a single coherent light source. In real photographs, all light-related phenomena in a scene share a common physical origin. In AI images, they often don't β because the model approximates rather than simulates physical light.
Look at any AI-generated image of a person and zoom in on the edges β where the person's hair meets the background, where their clothing meets the air around them. In real photographs, these edges are crisp and carry the physics of the scene: hair backlit by sunlight has a rim of light; a person standing in fog has edges that fade. In AI images, edges are often too clean in some places and too blurry in others β but the inconsistency doesn't follow physical logic.
Bellingcat's analysts developed what they call a "halo check" β looking for a faint, slightly off-color halo around subjects in AI images, a ghost of the compositing process that placed the subject into the background. This is especially visible in images where a person appears in a specific environment: a political rally, a crime scene, a foreign country. The person often has a very slightly different color temperature (the warmness or coolness of the light) than the background, and the edge between them shows it.
Hair presents its own specific challenges. Real hair is made of individual strands that respond to light individually, creating complex interactions. AI models generate hair as a texture β it looks like hair from a distance, but at the edges it often loses individual strand definition and becomes a smooth, painted-looking mass. Curly hair is especially difficult for AI to generate convincingly at the edges where it meets the background.
Bellingcat's analysts are professional investigators who help hold governments accountable. They use light source analysis to expose wartime disinformation that affects real military and political decisions. You now have the same basic skill set they use β the physics of light, the geometry of shadows, the logic of reflections. That's not a small thing.
Here is a problem that Bellingcat and other verification organizations talk about openly: every time we publish what we look for, the AI generators get better at not producing that artifact. When enough people knew about the six-finger problem, Midjourney and other generators prioritized training data and model adjustments to produce better hands. Shadow inconsistency is now less common in 2024 than it was in 2022 β because generators have been improved with that specific criticism in mind.
This is a genuine arms race. Detectors publish what they find. Generators improve. New artifacts emerge. Detectors learn those. Generators improve again.
The ethical question is this: Is it responsible for researchers to publish detailed guides to AI image artifacts, knowing that those guides will be read by people who generate AI images and used to make the generators better at hiding their tracks?
The alternative β keeping detection methods secret β is also problematic. It would mean only intelligence agencies and large tech companies would be able to detect AI images, while ordinary people would have no tools at all. There is no clean answer. The knowledge that helps you is also the knowledge that helps the generators improve.
At the World Economic Forum in Davos, Switzerland, in January 2024, a session on AI and information integrity featured a demonstration that went quietly unreported outside policy circles. Researchers from the MIT Media Lab showed a set of twenty photographs to a panel of professional fact-checkers β journalists, intelligence analysts, and academic researchers whose full-time job was identifying false media.
Ten of the photographs were real. Ten were AI-generated using the then-current generation of image models. The professional fact-checkers correctly identified real versus AI at a rate barely better than random chance β roughly 55% accuracy, where 50% would be pure guessing.
The researchers, led by Jevin West of the University of Washington's Calling Bullshit project, were not trying to embarrass the fact-checkers. They were making a specific argument: visual inspection alone is no longer sufficient. The tools you have learned in this module β hands, skin, shadows, edges β are real and useful. But you must also know their limits, because those limits are shrinking.
When a real camera takes a photograph, it writes invisible data into the image file. This data is called EXIF metadata (Exchangeable Image File Format). It includes the camera model, the lens used, the aperture and shutter speed, the GPS coordinates of where the photo was taken, and the exact date and time β down to the second.
AI-generated images don't have this data β or rather, they have the wrong data, or no data at all. When you examine the EXIF metadata of an AI-generated JPEG, you typically find one of three things: completely absent camera data, generic placeholder data, or metadata that describes a software application (like Photoshop or Midjourney itself) rather than a physical camera.
You can check EXIF data without any special tools. Right-click any image file on your computer and look at "Properties" β "Details" (Windows) or "Get Info" β "More Info" (Mac). Websites like Jeffrey's Exif Viewer (exifdata.com) let you drop in any image URL. A photograph that claims to show breaking news but has no camera metadata, or whose metadata shows it was "taken" with no camera at all, is a significant red flag.
Important caveat: EXIF data can be stripped by social media platforms during upload (Twitter/X, Facebook, Instagram all strip EXIF by default). So the absence of EXIF data doesn't prove a fake β but its presence with consistent real-camera information is meaningful evidence of authenticity.
EXIF metadata: Invisible technical data embedded in image files by the camera that captured them. Includes camera make/model, date, time, GPS location, and exposure settings. AI image generators produce files with absent, generic, or software-identified metadata rather than real camera data β making metadata examination a non-visual forensic technique.
The single most powerful non-visual tool for verifying an image is reverse image search β feeding the image back into a search engine to find where it has appeared before, and in what context. Google Images, TinEye, and Bing Visual Search all offer this capability for free.
If a photograph of a supposed disaster in Country X appears in a reverse image search as a photo of a different event in Country Y from three years earlier, it's re-used real photography β not AI-generated, but still false in context. If a supposedly new photograph returns no results whatsoever β no other appearances anywhere on the indexed internet β it may be genuinely new, and is worth scrutinizing more carefully rather than less carefully. Newly generated AI images often have zero prior appearances.
Context verification means asking: does the environment in this image match the claimed location and time? Bellingcat pioneered a technique called geolocation β cross-referencing details in an image (building architecture, street signs, mountain silhouettes, vegetation) with satellite imagery and street-level maps to confirm or refute where a photograph was actually taken. This approach helped verify and debunk hundreds of images from conflict zones in Syria, Ukraine, and Gaza between 2015 and 2024.
The important insight is this: the most powerful verification techniques are not about staring at pixels harder. They are about checking the image against everything else in the world β metadata, prior appearances, geographic reality. An image exists in a context. If the context doesn't add up, the image probably doesn't either.
Check hands, fingers, ears, earring symmetry. Count fingers. Look for merged or extra digits.
AI skin is too smooth. Fabric textures tile or repeat. Hair loses strand definition at edges.
Trace two shadows. Do they point the same direction? Do reflections match the visible scene?
Read any text in the image. AI text dissolves into near-letters. Backgrounds lose object detail.
Check EXIF data for real camera information. Missing or software-only metadata is a red flag.
Reverse image search. Geolocation check. Does the environment match the claimed location and time?
Professional fact-checkers in January 2024 were operating at 55% accuracy on visual inspection alone. You now have a six-point framework β visual and non-visual β that mirrors what Bellingcat, Reuters, and MIT Media Lab researchers actually use. The tools exist. Most people don't use them. You now know that they do.
Here is the hardest practical problem in visual verification: speed. Running a thorough image check β visual inspection, EXIF review, reverse image search, geolocation β takes between five and thirty minutes for someone who knows what they're doing. News cycles move in seconds. By the time a verification is complete, a false image may have been shared by millions.
Some researchers argue for AI-powered detection tools β algorithms trained to identify AI-generated images automatically. These tools exist; some are free (like Google's SynthID detector for images generated with their tools). But they have known failure modes: they miss some AI images, and they sometimes flag real photographs as fake. Deploying them at scale in content moderation means making millions of decisions automatically, each with potential for error.
The ethical question you are left with is this: Is it better to have imperfect automated detection deployed at scale β catching most AI fakes but making some mistakes β or to require human verification, which is more accurate but too slow to stop rapid spread?
Newsrooms, social media platforms, and governments are actively choosing between versions of these options right now. There is no universally correct answer. But the question itself is one that people with power are answering on your behalf β and understanding it as well as you now do means you can hold those decisions to account.
You've been given a suspected AI-generated image to analyze. Your partner is a senior forensics analyst β they won't give you answers, but they'll push you to be more precise, more specific, and more honest about what the evidence actually shows versus what you're assuming.
Start by describing how you would approach inspecting a portrait photograph for AI artifacts. Be specific about what you look for and why. Your partner will challenge your reasoning and ask follow-up questions.
A policy brief has landed on your editor's desk. It cites three "experts" with professional headshots, institutional affiliations, and published opinions. Your editor suspects the experts are fake β AI-generated faces attached to invented credentials. They need you to explain exactly how you'd verify or debunk these profiles.
Your partner β the editor β is not technically trained. They want practical steps, not jargon. And they'll push back if your approach sounds like it requires special software they don't have.
You are giving expert testimony about whether a photograph used in a legal case is authentic. The opposing lawyer is skilled and will challenge every claim you make: "How do you know?" "What exactly makes that impossible?" "Could a real camera produce that?"
Your partner will play the cross-examining lawyer. Explain how light source inconsistency and shadow analysis work as forensic tools β and be prepared to defend your reasoning under pressure.
A senator is deciding whether to support a bill that would require all social media platforms to deploy automated AI image detection at scale β flagging and labeling suspected AI-generated images automatically, with no mandatory human review. You have been brought in as an expert advisor.
The senator has heard arguments on both sides and is not easily impressed. They want to understand the real tradeoffs β speed vs. accuracy, automation vs. human judgment, and who bears the cost of errors. Take a position and defend it.