L1
·
Quiz
·
Lab
L2
·
Quiz
·
Lab
L3
·
Quiz
·
Lab
L4
·
Quiz
·
Lab
Module Test
Module 3 · Lesson 1

How AI Generates Images — and Why Clues Remain

Understanding the machine helps you beat the machine.
Why do AI-generated images still leave behind detectable traces even when they look convincing?

In February 2024, images of Taylor Swift in sexually explicit AI-generated scenarios spread across X (formerly Twitter), reaching tens of millions of views before the platform suspended the main account responsible. Microsoft later acknowledged that its Designer tool had been misused in their creation. Within days, researchers at Hany Farid's lab at UC Berkeley and at MIT had published analyses identifying specific pixel-level artifacts that betrayed each image's synthetic origin — artifacts invisible to most casual viewers but measurable by software. The incident accelerated Congressional calls for legislation on AI-generated imagery and showed precisely why understanding how these images are made is the first step to spotting them.

How Diffusion Models Build an Image

Most modern AI image generators — including Stable Diffusion, DALL-E 3, Midjourney, and Adobe Firefly — use a technique called diffusion. The process starts with pure random noise and progressively refines it into a coherent image over dozens or hundreds of computational steps. The model has been trained on billions of images and learns to associate visual patterns with descriptive words.

This process is powerful, but it is also statistical. The model is not taking a photograph of anything real. It is synthesizing pixel values that are statistically likely to go together given its training data. That statistical process introduces characteristic errors — places where the math produces something that looks slightly wrong to a trained human eye.

Earlier AI image techniques like Generative Adversarial Networks (GANs), used widely until around 2022, left different artifacts. GAN images often had wavy, unnatural background textures and characteristic blurring around facial hair. Diffusion models solved many of those problems — but introduced new ones involving fine-grained local structure.

👁️
Eyes & Pupils
Pupils may be asymmetric, shaped oddly, or have inconsistent light reflections between left and right eyes.
Hands & Fingers
Fingers may be fused, have extra joints, bend the wrong way, or appear at the wrong scale relative to the palm.
🔤
Text in Scene
Signs, shirts, and labels almost always contain garbled, mirrored, or nonsense letters — the model cannot truly write.
👂
Ears & Hair
Ear shapes are frequently asymmetric or biologically implausible. Fine hair strands often merge, float, or disappear at edges.
💡
Lighting Consistency
Shadows may fall in contradictory directions, or a subject's lighting may not match the background light source.
🖼️
Background Artifacts
Backgrounds often contain blurry, repeated, or melted-looking elements — especially edges near the main subject.
Why Hands Are So Hard

Human hands are among the most geometrically complex structures a person can perceive. We are hypersensitive to hand errors because we use our own hands constantly. AI models trained on image data do not have a model of geometry — they synthesize visual patterns statistically. For a hand to look right, hundreds of spatial constraints must be satisfied simultaneously, and current diffusion models frequently fail at this.

The "Uncanny Valley" of Pixels

Psychologists use the term uncanny valley to describe the eerie feeling triggered by something that looks almost — but not quite — human. AI images often occupy this territory at the detail level. The face may look entirely convincing at a glance. But zoom in on the teeth (often too uniform or blended together), the ear cartilage, or the stitching on clothing, and the image begins to dissolve into improbable shapes.

Researchers at MIT's Media Lab have shown that people trained to look for these micro-level inconsistencies improve their detection accuracy from around 50% (chance level) to above 70% within a single one-hour training session. The clues are there — the skill is learning to look.

Key Point

No AI image is produced by optics and physics — it is produced by statistics. Statistics can produce very compelling results overall while failing at the fine-grained local structure that physics would never violate. That mismatch is where the clues live.

Glossary
Diffusion modelAn AI image-generation technique that begins with random noise and iteratively refines it toward a coherent image guided by a text prompt.
GAN (Generative Adversarial Network)An earlier AI image technique using two competing networks; largely superseded by diffusion models for photorealistic outputs.
ArtifactA visible error or anomaly in an image that results from the generation process rather than real-world optics.
Pixel-level analysisExamining an image at the individual pixel scale to detect statistical patterns that betray synthetic origin.

Lesson 1 Quiz

How AI images are made — and where clues remain
1. What fundamental technique do most modern AI image generators (DALL-E 3, Midjourney, Stable Diffusion) use?
Correct. Diffusion models start with random noise and iteratively refine it guided by a text prompt. They have largely replaced GANs for photorealistic generation.
Not quite. Modern systems like DALL-E 3 and Midjourney use diffusion models — a technique that begins with pure noise and refines it step by step.
2. Why do AI-generated images almost always contain garbled or nonsense text on signs and clothing?
Correct. Diffusion models generate pixel patterns statistically. While they learn that signs have text-like shapes, they cannot reliably reproduce correctly spelled, grammatically coherent written language.
Not quite. The real reason is that diffusion models work with pixel statistics, not linguistic rules — they approximate what text looks like without understanding letter sequences.
3. In the February 2024 Taylor Swift deepfake incident, what confirmed the images were AI-generated at a technical level?
Correct. Researchers at UC Berkeley and MIT published analyses showing pixel-level artifacts that revealed the synthetic origin — clues invisible to most viewers but detectable by software.
Not quite. Researchers at UC Berkeley and MIT independently identified measurable pixel-level artifacts that technically confirmed the images' AI-generated origin.

Lab 1 — Reading the Generation Artifacts

Discuss pixel-level clues with your AI analysis assistant

Your Mission

You're developing a mental checklist for spotting AI-generated images. Use this lab to explore which visual clues are most reliable, why hands are a consistent problem, and how to explain artifact detection to someone who has never thought about it.

Try asking: "Walk me through what to look for in the hands of an AI-generated image" — or — "Why does lighting consistency matter more than overall image quality?"
AI Artifact Analyst
Lab 1
Welcome to Lab 1. I'm your AI artifact analysis assistant. Let's sharpen your eye for the telltale clues that AI image generators leave behind. What would you like to explore first — hands, text, lighting, or something else?
Module 3 · Lesson 2

Faces, Bodies, and the Anatomy of Failure

The human form is the hardest thing for AI to get consistently right.
What makes human faces and bodies so difficult for AI to fake — and where exactly do the failures appear?

In March 2022, a video appeared online purporting to show Ukrainian President Volodymyr Zelensky telling Ukrainian soldiers to surrender. The deepfake was identified within hours partly because of visible artifacts: Zelensky's head appeared slightly too large for his body, his skin tone had an unusual uniformity, and the motion of his jaw did not match natural speech patterns. Meta, YouTube, and Twitter all removed the video. Zelensky posted a rebuttal video from his actual location within the hour. Media analysts noted that viewers who knew what to look for — the neck-to-head boundary, head size proportionality, and jaw movement — could spot the fake without any technical tools.

Why Faces Are Both Easy and Hard

Human faces are the most scrutinized objects in our visual environment. Evolution has given us a dedicated brain region — the fusiform face area — for facial recognition. We are extraordinarily sensitive to subtle facial anomalies. This cuts both ways: it makes us good at detecting bad AI faces, but also means AI face generators have been heavily optimized because their failures are immediately noticed.

By 2023, AI-generated faces had become highly convincing at first glance. The website thispersondoesnotexist.com, which displays GAN-generated faces, was regularly fooling internet users. However, specific failure zones remain consistent across models:

😁
Teeth
Individual teeth often merge, have no gaps, or appear as a uniform white mass rather than distinct structures with shadow between them.
👀
Eye Asymmetry
The two eyes often have different iris colors, different pupil sizes, or different catch-light positions — physically impossible in real photography.
💈
Hairline Edges
Where hair meets background, AI models frequently produce a halo of blurred or smeared pixels — especially visible against light backgrounds.
👂
Ear Structure
Ears are biologically precise structures. AI ears frequently have misshapen, asymmetric, or structurally impossible cartilage folds.
🔗
Neck / Shoulder Join
The join between head, neck, and clothing is a frequent failure point — look for unnatural blending, odd proportions, or clothing that doesn't make anatomical sense.
👓
Glasses & Jewelry
Glasses frames often have only one arm, are asymmetric, or pass through facial features. Earrings may not match or may melt into the earlobe.
Body Proportions and Clothing

Beyond the face, AI-generated full-body images consistently fail at anatomical proportions. Limbs may be slightly too long or too short. The relationship between shoulder width and waist, or between torso length and leg length, may fall outside the normal human range without looking obviously wrong to a casual viewer — but close inspection reveals it.

Clothing presents its own failure modes. Patterns — stripes, plaid, logos, prints — almost never maintain geometric consistency across folds. A plaid shirt's pattern will not correctly converge at seams. Buttons may be randomly placed or the wrong number. Zippers may not align with the garment's geometry. These errors are especially reliable detection clues because clothing pattern geometry is physically constrained in ways that a statistical model struggles to replicate.

The Zelensky Test

The 2022 Zelensky deepfake case established a practical principle now used by media verification teams: check the head-to-body size ratio and neck/shoulder boundary first in any suspected face-swap deepfake video. These areas show disproportionate failure because early deepfake systems were trained primarily on faces in isolation, not on the head-body relationship.

Improving With Practice

Research from the MIT Media Lab (2023) and from the University of Waterloo (published in Proceedings of the National Academy of Sciences, 2024) both confirmed that brief, targeted training significantly improves human detection of AI faces. The training that worked best focused on specific anatomical zones — not on general impressions. Participants told to examine teeth, ears, and eye symmetry systematically outperformed those relying on gut feeling, even when gut feeling had been working well before better AI arrived.

Practical Rule

When you suspect an AI-generated face: examine the teeth, both ears separately, the whites of each eye, and the hairline boundary. Four zones, thirty seconds. That systematic check catches the majority of current AI faces that would otherwise fool a casual glance.

Lesson 2 Quiz

Faces, bodies, and anatomy of AI failure
1. What visual clues in the 2022 Zelensky deepfake video allowed analysts to identify it quickly?
Correct. Analysts identified an oversized head relative to the body, unnaturally uniform skin tone, and jaw movement that didn't match natural speech — all failure points of face-swap deepfake technology at that time.
Not quite. The key visual clues were anatomical: head-to-body size ratio, unnaturally uniform skin, and mismatched jaw movement — all detectable without technical tools.
2. Why are clothing patterns (stripes, plaid, logos) especially reliable clues for detecting AI-generated images?
Correct. Real clothing patterns obey geometry — stripes must converge correctly at seams, plaid must align at folds. These physical constraints are hard for statistical models to satisfy consistently, making pattern errors a reliable clue.
Not quite. The reason clothing patterns are reliable clues is that they must obey geometric rules at every fold and seam — rules that a statistical image model cannot reliably satisfy.
3. According to research from MIT and the University of Waterloo, what type of training most improved AI face detection accuracy?
Correct. Both studies found that targeted training on specific anatomical zones — systematically checking teeth, ears, and eye symmetry — outperformed gut-feeling approaches, even when gut feeling had been reliable before better AI existed.
Not quite. Research showed that examining specific anatomical zones (teeth, ears, eye symmetry) systematically produced better results than relying on general impressions or gut feeling.

Lab 2 — Anatomy of AI Face Failures

Practice systematic facial zone analysis with your AI assistant

Your Mission

You are developing a four-zone facial inspection checklist. Use this conversation to build that checklist, test your understanding of why each zone fails, and think through real-world application — like spotting a deepfake political photo in your social media feed.

Try asking: "Build me a four-zone checklist for inspecting a suspicious face image" — or — "Why do AI teeth look wrong even when the rest of the face looks convincing?"
AI Face Analysis Coach
Lab 2
Welcome to Lab 2. I'll help you build a systematic approach to inspecting faces in suspected AI images. We can build a zone-by-zone checklist, explore why specific areas fail, or walk through what you'd do with a suspicious photo. What would you like to start with?
Module 3 · Lesson 3

Context Clues — When the Scene Betrays the Image

Sometimes the surrounding context exposes a fake more clearly than the subject itself.
Beyond the human figure, what environmental and contextual clues reveal that an image is AI-generated?

On May 22, 2023, an AI-generated image of a large explosion near the Pentagon spread across social media and was briefly reported as real by several financial news accounts. The image briefly caused a dip in U.S. stock markets before being debunked within about 30 minutes. Investigators examining the image noted several contextual failures: the surrounding fence structure was geometrically impossible, the smoke plume's interaction with background buildings was physically implausible, and the overall lighting of the explosion did not match the ambient daylight visible in the scene. No real explosion produces perfectly centered, symmetrically balanced smoke clouds — but AI generators, trained partly on action movie stills, tend toward dramatic visual symmetry. The FBI and the Pentagon confirmed no explosion had occurred, and the Arlington Police Department stated they had received no emergency calls.

Environmental Geometry

Real photographs obey the laws of optics and physics. Perspective lines converge correctly. Buildings have straight walls with consistent angles. Roads have proper vanishing points. Shadows fall at angles consistent with a single light source positioned in a physically possible location relative to the time of day.

AI models are trained on images that obeyed these rules — but they do not know the rules. When they synthesize novel scenes, especially complex ones, they frequently produce environments where the geometry is subtly wrong: a building that leans slightly differently on its left and right sides, a sidewalk that curves in an impossible direction, or fences and railings that don't maintain consistent spacing.

🏗️
Structural Geometry
Buildings, fences, stairs, and railings must obey perspective. AI scenes often have structures that can't exist physically — walls that lean wrong, steps of inconsistent height.
🌿
Organic Backgrounds
Crowds, trees, foliage, and grass are processing-intensive. AI often produces repeated identical elements or blurry, melted-looking organic textures at the edges of scenes.
☀️
Shadow Logic
Every shadow in a real image points away from the same light source. AI images frequently have objects whose shadows point in different directions.
💧
Reflections
Reflections in water, windows, and glasses must match the scene. AI frequently gets reflections wrong — showing objects not present in the scene, or reversed incorrectly.
🌫️
Smoke & Fire
AI-generated disaster images often have suspiciously perfect or symmetrical smoke/fire, because the model learned from cinematic imagery rather than real incident photography.
🏙️
Street-Level Details
Street signs, lane markings, license plates, and storefronts all contain text or geometric patterns that AI consistently garbles or renders implausibly.
The Crowd Problem

Large crowds present a severe challenge for AI image generators. Each person in a crowd is a complex object that must be rendered consistently with their neighbors. AI generators frequently resort to repeating or mirroring individual figures across a crowd, producing eerie symmetries that no real crowd photograph would contain. Background crowd members may merge into one another, share limbs, or appear in physically impossible postures relative to the space they occupy.

Fact-checkers at AFP, Reuters, and the Associated Press have all published guidance specifically noting crowd repetition as a high-reliability indicator of AI generation. During the 2024 election cycle, multiple AI-generated crowd images purporting to show political rallies were debunked using this method alone — by zooming into background figures and finding mirrored duplicates.

The Pentagon Image — What Analysts Found

Post-incident analysis of the 2023 Pentagon explosion image identified: (1) a perimeter fence with structurally impossible geometry, (2) smoke that interacted with background buildings in a physically implausible way, (3) lighting of the explosion inconsistent with the ambient daylight angle, and (4) cinematic symmetry in the smoke column that real explosions do not produce. Each clue alone was ambiguous. Together, they were definitive.

Using Reverse Image Search Alongside Visual Analysis

Visual clue analysis works best when combined with contextual investigation. If an image purports to show a real event at a real location, use Google Street View, satellite imagery, or news archive searches to verify whether the depicted setting matches the claimed location. The Pentagon explosion image failed this check immediately — the Pentagon's actual fence layout is publicly documented and bears no resemblance to what the AI generated.

Tools like Google Reverse Image Search, TinEye, and Yandex Images can surface the oldest known versions of an image, often revealing either that it predates the claimed event or that it has appeared in unrelated contexts before.

The Layered Check

For any suspicious scene image: (1) check structural geometry for physical impossibilities, (2) check shadow consistency across multiple objects, (3) zoom into background crowds for repeated figures, (4) read all visible text, (5) search the image in reverse image tools. Five steps. Each step that passes makes the image more credible. Any step that fails shifts the burden of proof onto whoever is sharing it.

Lesson 3 Quiz

Context clues — when the scene betrays the image
1. In the May 2023 Pentagon explosion hoax image, which environmental clue was identified by investigators as especially telling?
Correct. Analysts identified an impossible fence geometry and suspiciously cinematic, symmetrical smoke — both consistent with AI models trained on action movie imagery rather than real incident photography.
Not quite. The key environmental clues were an impossible fence geometry and smoke symmetry typical of cinematic training data — patterns a real explosion would not produce.
2. Why do background crowds in AI-generated images often contain repeated or mirrored figures?
Correct. Crowds are computationally demanding because every person must be consistent with adjacent figures. AI models frequently reuse or mirror patterns rather than generating hundreds of fully unique individuals.
Not quite. The real reason is computational: generating each crowd member uniquely and consistently is very difficult, so models often repeat or mirror figures — a reliable detection clue in both GAN and diffusion outputs.
3. What does "shadow logic" mean as an AI detection clue?
Correct. Real photographs obey a single-light-source rule — every shadow points away from the same source. AI images frequently violate this because the model learns shadow shapes statistically rather than understanding physical light geometry.
Not quite. Shadow logic means checking that all shadows in a scene are consistent with a single light source. AI images frequently have objects with shadows pointing in different directions — a physical impossibility.

Lab 3 — Scene & Context Analysis

Explore environmental clues and the five-step layered check

Your Mission

You're building skills for evaluating entire scenes, not just faces. Use this lab to practice the five-step layered check, explore specific clues like shadow logic and crowd repetition, and think through how you'd evaluate a suspicious political or news image you encounter online.

Try asking: "Walk me through the five-step layered check on a suspicious news image" — or — "How would I use reverse image search alongside visual analysis to debunk a fake disaster photo?"
AI Scene Analyst
Lab 3
Welcome to Lab 3. I'm your scene analysis assistant. Let's practice evaluating entire images — not just faces — for the environmental and contextual clues that reveal AI generation. We can work through shadow logic, structural geometry, crowd analysis, or the full five-step layered check. Where do you want to start?
Module 3 · Lesson 4

Tools, Metadata, and the Arms Race

Software tools help — but knowing their limits is as important as knowing how to use them.
What detection tools exist, what are their real-world accuracy rates, and why will visual clue skills remain essential even as AI improves?

In January 2024, voters in New Hampshire received robocalls featuring an AI-generated voice of President Joe Biden telling Democratic primary voters not to vote. The calls used a cloned voice — audio deepfake — rather than an image deepfake, but the investigation that followed illustrated the full toolkit available to verification teams. Investigators from the New Hampshire Attorney General's office, working with the FCC and private research teams, used audio spectrogram analysis (the audio equivalent of pixel-level image analysis), metadata from the originating phone numbers, and forensic matching against known Biden voice recordings to confirm the audio was synthetic. They ultimately traced the calls to a political consultant in New Orleans. The case became a landmark example of how metadata trails and technical analysis work together — and how neither alone is sufficient without the other.

Software Detection Tools: What's Available

A range of commercial and research tools attempt to detect AI-generated images. The most widely used in 2024–2025 include:

Hive Moderation — a commercial AI content moderation API that includes AI image detection. In independent tests by MIT Technology Review (2023), it achieved approximately 85% accuracy on synthetic images from DALL-E 2 and Stable Diffusion 1.x, but accuracy dropped significantly on more recent model outputs.

AI or Not — a consumer-facing tool built on similar technology. Widely shared on social media; documented accuracy varies from around 70% to 90% depending on the source model.

Google's SynthID — a watermarking system embedded directly into images generated by Google's Imagen model. Rather than detecting artifacts after the fact, SynthID embeds an invisible, machine-readable watermark at generation time. As of 2024, it applies only to Imagen-generated content and cannot detect images from other generators.

C2PA (Coalition for Content Provenance and Authenticity) — a technical standard that embeds cryptographic provenance metadata into images at creation. Adopted by Adobe, Microsoft, Sony, Leica, and others. Images carrying C2PA metadata can be verified through tools like Adobe Content Credentials. The limitation: stripping the metadata removes verification capability, and most existing AI-generated images were created before this standard was widely implemented.

What Tools Do Well

Detecting images from known AI generators using older architectures. Flagging statistical patterns that are consistent with training data used by specific models. Integrating into platform-level content moderation pipelines. Verifying provenance when C2PA metadata is present and intact.

What Tools Struggle With

Images from new or custom-fine-tuned models the detector wasn't trained on. Images that have been post-processed (compressed, cropped, filtered). Images generated by models with explicit adversarial training to evade detection. Any image where provenance metadata has been stripped.

Metadata: A Partial Window

Every digital image file contains metadata — information about when, where, and how the image was created. Metadata fields like EXIF data can reveal: the camera model used, the GPS coordinates of capture, the date and time, and the software used for editing. An AI-generated image saved to disk typically lacks the camera model field or shows a software tool rather than camera hardware.

However, metadata is easily stripped or spoofed. Uploading an image to most social platforms strips EXIF data entirely. A sophisticated bad actor can add fake camera metadata to an AI image before sharing it. Metadata is therefore useful as a supporting clue — not as a definitive test.

The most useful metadata check is confirming that stated metadata is internally consistent: does the GPS location match the claimed location? Does the stated time match the lighting in the image? Does the camera model metadata match the shooting style? Inconsistencies are red flags even if no single field definitively proves fakery.

The Arms Race Dynamic

Detection tools improve. AI generators then improve to evade detection. Then detection tools improve again. This has been the consistent pattern since GANs were introduced. Every published, widely used detection method eventually becomes a target for adversarial training. The implication: no single tool is permanently reliable. Human visual skills, combined with contextual verification and multiple tools, remain more robust than any single technological solution.

Building a Verification Stack

Professional verification teams — at Reuters, the Associated Press, AFP Fact Check, and academic groups like the Stanford Internet Observatory — use layered verification rather than any single tool. A robust check combines: (1) visual clue inspection (the skills in this module), (2) one or more AI detection tools for a statistical second opinion, (3) metadata examination, (4) reverse image search, and (5) contextual plausibility checking against known facts about the claimed event.

No item in that stack alone is decisive. Together they produce a probability assessment — and very often, a conclusion strong enough to act on.

Why Visual Skills Remain Essential

Software detection tools will continue to lag behind the most advanced generation tools — especially custom, privately trained models used by sophisticated actors. Visual clue skills apply to any image, from any generator, without needing to know in advance what tool made it. They are the floor that technical tools build on — not a fallback when tools fail.

Glossary
EXIF dataExchangeable Image File data — metadata embedded in image files by cameras and software, recording technical details of an image's creation.
SynthIDGoogle DeepMind's watermarking system that embeds invisible, detectable markers into AI-generated images at the point of creation.
C2PACoalition for Content Provenance and Authenticity — an industry standard for embedding cryptographic content provenance metadata into digital media.
Adversarial trainingTraining an AI model specifically to evade detection by another AI system — a key driver of the detection/generation arms race.

Lesson 4 Quiz

Tools, metadata, and the arms race
1. What is the key limitation of Google's SynthID watermarking system?
Correct. SynthID embeds watermarks only at generation time by Google's Imagen — it has no ability to detect AI-generated images from Midjourney, Stable Diffusion, DALL-E, or other generators.
Not quite. The key limitation is that SynthID can only detect what it watermarked — images from Google's Imagen model. It offers no detection capability for images from other AI generators.
2. Why is image metadata (EXIF data) an unreliable sole basis for confirming an image is real?
Correct. Most social platforms strip EXIF data on upload, and sophisticated bad actors can add fake camera metadata to AI-generated images. Metadata is a useful supporting clue, not a definitive test.
Not quite. Metadata is unreliable as a sole test because platforms strip it on upload and it can be manually added or spoofed — making it a supporting clue at best, not a definitive confirmation of authenticity.
3. What does the 2024 New Hampshire Biden deepfake robocall case illustrate about verification methodology?
Correct. The investigation combined audio spectrogram analysis, phone number metadata, and forensic voice matching — demonstrating that layered verification using multiple methods produces stronger conclusions than any single tool.
Not quite. The New Hampshire case is a landmark example of layered verification — audio analysis, metadata forensics, and voice matching all contributed to the conclusion, illustrating why no single method is sufficient.

Lab 4 — Building Your Verification Stack

Integrate tools, metadata, and visual skills into a complete workflow

Your Mission

You're designing a complete verification workflow for a newsroom, a school, or your own personal use. Use this conversation to put together everything from this module — visual clue inspection, detection tool awareness, metadata checking, and contextual verification — into a practical checklist you could actually use.

Try asking: "Help me design a five-minute verification workflow for a suspicious image I see on social media" — or — "What are the three most reliable things to check if I only have 60 seconds?"
AI Verification Coach
Lab 4
Welcome to Lab 4. This is where all the module's skills come together. Let's build a practical verification workflow — one you could actually use when you encounter a suspicious image online. We can design a quick 60-second check, a full five-minute deep dive, or discuss specific tools like reverse image search and AI detectors. What would you like to work on?

Module 3 — Test

Spot the Clues in Fake Images · 15 questions · Pass at 80%
1. What is the core technique used by DALL-E 3, Midjourney, and Stable Diffusion to generate images?
Correct. Diffusion models — starting from random noise and refining it guided by a text prompt — are the dominant technique in current photorealistic AI image generation.
Diffusion models are the dominant technique — they start from random noise and iteratively refine it, guided by the text prompt, rather than using GANs or any photomontage approach.
2. Why do AI-generated images frequently get hands wrong?
Correct. Human hands are geometrically complex. A correct hand requires satisfying hundreds of spatial constraints simultaneously — something statistical models frequently fail to do.
Hands require so many simultaneous geometric constraints (finger length ratios, knuckle positions, palm scale) that statistical models cannot reliably satisfy them all at once.
3. In the February 2024 Taylor Swift deepfake case, what technical confirmation was provided by researchers?
Correct. Researchers at UC Berkeley's Hany Farid lab and at MIT published analyses identifying measurable pixel-level artifacts that confirmed the images' synthetic origin.
Researchers at UC Berkeley and MIT independently analyzed the images and identified pixel-level artifacts — measurable by software — that confirmed AI generation.
4. Which of the following is the most reliable zone to inspect first when checking a suspected AI face?
Correct. Research from MIT and the University of Waterloo confirms that systematically examining teeth, ears, and eye symmetry catches the majority of AI-generated faces that would fool a casual glance.
The highest-reliability inspection zones for AI face detection are teeth (often merged), ears (often misshapen), and eye symmetry (pupils, iris color, catch-lights).
5. What visual clue in the 2022 Zelensky deepfake video was noted as a high-reliability indicator of face-swap deepfake technology at the time?
Correct. The head-to-body size ratio was wrong — a characteristic failure of face-swap systems trained on isolated face data rather than full-body imagery. Analysts also noted uniform skin tone and jaw movement mismatches.
The key clue was head-to-body size ratio: Zelensky's head was disproportionately large — a characteristic failure mode of face-swap deepfakes trained on face data in isolation.
6. Why are clothing patterns like plaid and stripes especially reliable detection clues for AI images?
Correct. Plaid, stripes, and logos must obey geometry — converging correctly at seams and folds. This physical constraint is one statistical models consistently fail to satisfy.
Clothing patterns are reliable because they must obey geometry — stripes must converge at seams, plaid must align at folds. These physical constraints are hard for statistical models to satisfy consistently.
7. What did analysis of the 2023 Pentagon explosion hoax image reveal about AI-generated disaster imagery?
Correct. Multiple environmental clues converged: impossible fence geometry, smoke that didn't interact realistically with buildings, inconsistent lighting, and cinematic-style symmetry typical of models trained on action movie imagery.
Analysis revealed impossible fence geometry, implausible smoke-building interaction, inconsistent lighting angles, and smoke symmetry characteristic of cinematic training data — not real explosion photography.
8. What does "shadow logic" refer to as an AI image detection method?
Correct. Real scenes have one sun (or dominant light source) — all shadows point away from it. AI images statistically learn shadow shapes without understanding this physical constraint, frequently producing contradictory shadow directions.
Shadow logic means all shadows must point away from the same light source. AI models learn shadow shapes statistically without understanding physics, often producing objects with shadows pointing in different directions.
9. Why do AI-generated crowd scenes frequently contain repeated or mirrored figures?
Correct. Each person in a crowd must be consistent with their spatial neighbors. Meeting that constraint for hundreds of individuals is computationally very challenging — models frequently reuse or mirror figures as a result.
Generating unique, spatially consistent figures across an entire crowd is computationally demanding. Models frequently reuse or mirror figures as a result — a reliable detection clue in both GAN and diffusion outputs.
10. What is the C2PA standard, and what is its key limitation?
Correct. C2PA is a cryptographic provenance standard adopted by Adobe, Microsoft, Sony, and others. Its limitation: metadata can be stripped, rendering verification impossible, and it doesn't apply to images created before its adoption.
C2PA embeds cryptographic provenance data at creation. The limitation: stripping metadata removes verification capability, and most existing AI-generated images were created before the standard was widely implemented.
11. According to research from MIT Media Lab, what level of detection accuracy can targeted training achieve for AI face detection?
Correct. MIT Media Lab research showed that people trained on specific anatomical zones improved from approximately 50% (chance) to above 70% in a single one-hour session — a meaningful and achievable improvement.
MIT Media Lab research found that one hour of targeted anatomical zone training improved detection from around 50% (chance level) to above 70% — a realistic and meaningful improvement.
12. What makes AI-generated text (on signs, shirts, etc.) such a reliable detection clue?
Correct. Diffusion models approximate what text looks like (letter-shaped pixel patterns) without understanding actual language structure — the result is almost universally garbled, mirrored, or semantically meaningless text in signs and labels.
AI models produce pixel patterns that approximate what text looks like, without any model of actual language — meaning text in AI images is almost universally garbled, wrong, or nonsensical.
13. The 2024 New Hampshire Biden robocall deepfake was identified through what combination of methods?
Correct. Investigators combined three methods: audio spectrogram analysis (the audio equivalent of pixel-level analysis), phone number metadata forensics, and forensic voice matching against authentic recordings.
The case combined audio spectrogram analysis, metadata from originating phone numbers, and forensic voice matching against real Biden recordings — a classic example of layered verification.
14. What is "adversarial training" in the context of AI image detection?
Correct. Adversarial training means optimizing a generator to fool a specific detector — whenever a detection method becomes publicly known, it can be used as a training target, driving the perpetual arms race between generation and detection.
Adversarial training means optimizing an AI generator specifically to fool known detectors — the reason no single detection tool remains reliable as AI generators improve.
15. What is the recommended "five-step layered check" for evaluating a suspicious scene image?
Correct. The five-step layered check combines: visual clue inspection, AI detection tools for a statistical second opinion, metadata examination, reverse image search, and contextual plausibility against known facts about the claimed event.
The five-step layered check is: (1) visual clue inspection, (2) AI detection tool, (3) metadata examination, (4) reverse image search, (5) contextual plausibility checking. Each step builds on the others — no single step is definitive alone.