In April 2020, a Belgian climate organization called Extinction Rebellion Belgium released a video. In it, Belgian Prime Minister Sophie Wilmès appeared to link the COVID-19 pandemic directly to the destruction of natural ecosystems — and seemed to call for radical environmental policy changes. The footage looked real. She sounded like herself. Her face moved naturally, her voice carried the right weight.
The video spread quickly. Thousands of people watched it before noticing the small text at the bottom: "This is not real." Extinction Rebellion had deliberately created a deepfake — a video generated by AI — to make a political point about climate inaction. They wanted to show what Wilmès might say if she took the science seriously. Instead, they showed something else entirely: how easily a real person's face and voice can be borrowed without permission.
The Belgian government did not find it funny. Wilmès never said those words. But the damage — the brief, convincing illusion — had already happened for everyone who watched before the disclaimer registered.
The word "deepfake" comes from combining "deep learning" — a type of AI training — with "fake." But the name undersells how sophisticated the technology has become. A deepfake isn't just a badly-edited photo or a voiced-over clip. It's a video (or audio recording) where an AI model has learned to replace, reconstruct, or generate the appearance and voice of a real person so convincingly that standard viewing can't detect the difference.
The underlying technique is called a generative adversarial network, or GAN. Think of it as two AIs competing: one tries to generate a fake face that's convincing, the other tries to catch it as fake. They run against each other thousands of times until the generator gets so good the detector can't tell anymore. The result is a face — or a voice — that can be placed on anyone, saying anything.
By 2019, GAN technology had advanced to the point where videos could be generated in near-real-time. By 2022, consumer apps were making it accessible to anyone with a smartphone. The Belgian video was notable not because the technology was cutting-edge — it wasn't, by that point — but because a well-intentioned organization used it in a public, political context and revealed how blurry the ethics immediately get.
To understand why deepfakes work on human eyes, you need to understand why human eyes are wired to trust faces. Faces are the single most processed input in the human brain. We have entire brain regions dedicated to reading them. We detect the tiniest microexpressions — a flicker of doubt, a slight asymmetry, the way eyes move during speech. This is why fake faces in old movies looked wrong: the movement wasn't right, even if the still frames were fine.
What modern deepfake systems learned to do is model exactly those micro-movements. A trained model for a specific person's face learns not just what they look like, but how their skin moves when they talk, how their eyelids behave, how light falls differently on their features at different angles. It then transfers those patterns onto a target video frame by frame.
The result fools us because it's actually using our own strengths against us. Our brains are pattern-matchers. Deepfakes feed the patterns we expect to see. What they often fail to replicate correctly — and this matters for detection — are things like: background lighting consistency, the exact texture of hair near the ears, the way a person blinks, and the edges where a face meets a neck or hairline. These are your detection footholds.
When checking a suspicious video: look at the hairline, watch the blinking rate (deepfakes often blink too little or at wrong intervals), and check whether the lighting on the face matches the background. These three areas are where most deepfake models still struggle most.
Extinction Rebellion thought they were making a creative, clearly-labeled protest piece. They put "This is not real" in the video. They genuinely believed in the climate message. And yet: Sophie Wilmès never consented to having her face and voice used. She couldn't veto the video before it went public. Thousands of people saw it before they noticed the disclaimer.
Here's the ethical question this lesson won't answer for you: Does a good cause justify using someone's likeness without permission? What if the video had been about a cause you personally believe in? Does the size of the disclaimer matter? What if the real person is a public figure — does that change the calculation?
These questions don't have clean answers. Different democracies have reached different conclusions. Some countries now have laws specifically about deepfakes. Others don't. Reasonable, thoughtful people disagree about where the line should be. What you now know — that a convincing, political deepfake can be made quickly and cheaply, and that most people won't catch it in time — changes how you think about every video of a public figure you'll watch for the rest of your life.
The majority of people who watched that Belgian video in April 2020 saw a real politician saying real things. You now understand the technology that made that possible, the pattern it exploited, and the ethical fault lines it opened. That's a different way of watching — and it's a skill most adults don't have either.
Most deepfakes in the world right now are not political. The largest category — by a wide margin — is non-consensual intimate imagery: real people's faces placed on other people's bodies without permission. In 2023, a study by Sensity AI estimated that over 96% of deepfake videos online fell into this category, and that the targets were almost exclusively women. This is an ongoing harm affecting real people, not a hypothetical.
The second-largest category is financial fraud. In February 2024, an employee at a multinational company in Hong Kong was tricked into transferring the equivalent of $25 million USD after attending what she believed was a video call with her company's CFO and multiple colleagues. Every person on the call was a deepfake — real faces, synthesized in real time. The employee had doubts, but the faces and voices were convincing enough that she completed the transfer.
Political deepfakes — like the Belgian video — get the most news coverage but are actually a smaller fraction of total deepfake use. Understanding this matters, because it means the threat isn't primarily about elections. It's happening right now to private individuals and to companies' finance departments.
There are also legitimate uses. Film studios use deepfake-adjacent technology to de-age actors (The Irishman, 2019), to restore deceased performers (Star Wars: Rogue One brought back Peter Cushing, who died in 1994), and to dub films in other languages with lip-synced accuracy. The technology is neither entirely good nor entirely bad — it's a tool, and the ethics live in how and why it's deployed.
A news editor has flagged a video for you. It shows a well-known tech CEO announcing a massive company layoff — a video no major news outlet has confirmed or reported on. Your job is to investigate and advise whether the editor should publish a story based on this footage.
Your lab partner VERA is a fellow analyst — not a teacher. She has opinions, asks hard questions, and won't just tell you the right answer. Work through the investigation with her.
On January 10, 2023, a track appeared on Spotify and Apple Music titled "Heart on My Sleeve." It featured what sounded unmistakably like Drake and The Weeknd — two of the most recognizable voices in contemporary music — on a song neither of them had recorded, performed, or agreed to release. The track was uploaded by a user called ghostwriter977 and generated millions of streams before being removed. Universal Music Group, which represents both artists, demanded takedowns across every platform.
The song was made using AI voice cloning — software trained on enough recordings of each artist to synthesize new performances in their vocal style. It wasn't a bad imitation. It passed the casual listening test. It was submitted, reportedly, for Grammy consideration before anyone caught it. The Recording Academy later ruled that AI-generated work is ineligible unless a human is credited as the author of a meaningful portion.
The question the music industry has been arguing about ever since: where exactly is the line between influence, imitation, and theft? Artists have always been influenced by other artists. Producers have always approximated sounds. But this was different — and nobody agrees on precisely why.
A voice is more distinctive than most people realize. It carries your pitch range, your resonance (how your throat and chest vibrate), your articulation patterns (the exact way you form consonants), your rhythm, your accent, and thousands of micro-qualities that together make your voice identifiably yours. This is why you can recognize a friend on the phone in one syllable.
Voice cloning works by training an AI on recordings of a target voice — sometimes just a few minutes of audio is enough with modern systems. The AI extracts those micro-qualities and builds a model of the voice. Then, given new text input, it can generate speech in that voice saying anything at all. It's not playing back a recording — it's synthesizing new audio in real time from the learned model.
The earliest versions of this technology, around 2018–2019, required hours of training audio and produced robotic results. By 2023, companies like ElevenLabs were offering voice cloning from as little as one minute of audio, with results that routinely fooled listeners in controlled tests. ElevenLabs' own platform was used to clone the voice of conservative commentator Ben Shapiro in early 2023, producing fake audio that spread on social media before the company added restrictions.
Detection is harder for voice than for video. Humans are better at detecting facial inconsistencies than vocal ones. The brain doesn't have the same dedicated circuits for voice verification that it does for face recognition. This makes voice cloning, in some ways, more dangerous for deception than deepfake video — and it's faster and cheaper to produce.
In March 2019, the CEO of a UK-based energy firm received a phone call from who he believed was his parent company's chief executive in Germany. The voice was familiar — the slight German accent, the cadence, the authority. The caller asked the CEO to urgently transfer €220,000 (approximately $243,000) to a Hungarian supplier. The CEO transferred the funds.
The caller was not his boss. It was an AI voice clone, used in what became the first publicly documented case of voice cloning for financial fraud. The actual fraud was even more layered: when the CEO called back to confirm, the fraudsters had a second cloned call ready. The money moved through multiple accounts and was largely unrecoverable.
The CEO had done nothing obviously wrong. He knew his boss's voice. He asked questions. The answers came back in the right voice. This is the scenario that makes voice cloning genuinely frightening: it attacks the verification method most people rely on without thinking — the recognition of someone's voice as proof they are who they say they are.
Security researchers now recommend that families and organizations establish a code word — a pre-agreed word or phrase that can be requested in any suspicious call and that the clone would not know. This is called an "anti-spoofing protocol." It sounds extreme until you realize it's already standard practice in some financial institutions. If a voice alone is no longer trustworthy verification, something else has to take its place.
Voice cloning has uses that are genuinely beneficial. People who lose their voices to disease can have them preserved — before a surgery, a person can donate recordings, and afterward their synthetic voice speaks for them using their actual sound. The nonprofit organization VocaliD has built voice "banks" for people with ALS and other conditions. This is deeply meaningful work.
Documentary filmmakers use voice synthesis to bring historical figures' words to life in their own voices when recordings don't exist. Audiobook narrators can record a book once and license the voice model rather than re-recording editions in other languages. The technology isn't inherently harmful.
But here's the unresolved ethical question this lesson is leaving with you: Who owns your voice? If someone trains an AI model on public recordings of you — things you willingly made public, like interviews or podcasts — and uses it to generate new speech without your permission, have they done something wrong? You didn't consent, but you also made the recordings public. Does your implicit ownership of your voice extend to a model learned from it? This question is currently being litigated in multiple countries, and courts haven't reached a consensus. Knowing that this question exists — and that it's genuinely unresolved — is part of being a competent person in 2024 and beyond.
You now understand that a voice alone — even a perfectly familiar one — is no longer reliable proof of identity. Most adults still trust voices instinctively. You know why that's no longer safe, and you know the existence of anti-spoofing protocols as an alternative. That's a meaningful edge in a world where this technology is becoming more common, not less.
A school district has asked you to help design an anti-spoofing protocol for their administration. Teachers and staff regularly receive calls from people claiming to be the principal, parents, or the district office — and they need a verification system that works when voice alone can no longer be trusted.
MARCUS is your security consultant partner. He's skeptical of protocols that look good on paper but fail in practice. He'll push back on anything he thinks won't hold up under real conditions.
On March 16, 2022, three weeks into Russia's full-scale invasion of Ukraine, a video began circulating on Telegram and Twitter. In it, Ukrainian President Volodymyr Zelensky appeared to be telling Ukrainian soldiers to lay down their weapons and surrender. The video had been posted to Ukraine's Channel 24 after what appeared to be a hack, and within hours it had reached millions of viewers. Zelensky's face was recognizable. His voice — though slightly off — delivered the devastating message.
The video was a deepfake. Zelensky immediately filmed himself on his phone to refute it — holding up his phone, standing in Kyiv, explicitly saying "I am here." Meta, Twitter, and YouTube removed the video. But for a period of hours, it was live and spreading during an active war. The purpose wasn't necessarily to convince everyone. Even if only a fraction of viewers believed it, a real soldier seeing it might pause. A confused civilian might repeat it. An adversarial media ecosystem can use doubt as a weapon even if the truth wins eventually.
This was the moment the implications of deepfake technology in conflict zones became impossible to ignore at the level of governments and militaries.
Video has been used as evidence of war crimes since the Nuremberg trials. Footage from concentration camps, from conflicts in Rwanda, Bosnia, and Syria — video has been a critical tool in establishing that documented atrocities actually occurred. Human rights organizations like Bellingcat and the Syrian Archive have built entire methodologies around geolocating and verifying footage to establish what happened where and when.
Deepfakes create a specific threat to this process — not because they're so convincing that evidence becomes impossible to verify, but because they give bad actors a new defense: claim that real footage is fake. This is called the "liar's dividend" — the benefit that accrues to people who want to deny accountability as the existence of faking technology makes denial more plausible.
In 2018, Myanmar military officials denied that video evidence of the Rohingya genocide was real, calling footage "staged" and "fabricated." This predates the widespread availability of deepfake tools — but the argument became significantly more viable after those tools became public. When anyone can point to deepfake technology and say "that's probably AI-generated," it becomes harder to use video as unimpeachable evidence in any forum — legal, diplomatic, or public.
For students following international news: every major conflict since 2020 has featured competing claims about whether footage is real or fake. Knowing that the liar's dividend exists is essential context for evaluating those claims. The question to ask is not just "could this be faked?" but "who benefits from claiming it's fake, and what does the forensic evidence actually show?"
Organizations like the MIT Media Lab, Sensity AI, and academic researchers at UC Berkeley have developed deepfake detection tools that analyze video at the pixel level — looking for statistical artifacts left by AI generation processes. These aren't things humans can see; they're patterns in the data that reveal the fingerprints of synthetic generation.
Alongside these technical tools, open-source investigators use geolocation — matching landmarks, shadows, satellite imagery, and terrain to verify where footage was actually filmed. Chronolocation — determining when footage was filmed by analyzing shadow angles, weather, and other time-stamped elements — adds another layer. Together these techniques can often establish whether footage is authentic without needing to detect whether it's AI-generated specifically.
The difficulty is that these techniques require time, expertise, and access to data — and social media moves in hours, not weeks. By the time a forensic conclusion is reached, the footage has already shaped opinion for millions of viewers. The verification infrastructure has not kept pace with the spread speed of the content it's supposed to verify.
Research consistently shows that corrections and fact-checks reach a much smaller audience than the original misinformation — sometimes less than 5% of the original spread. This means even perfect eventual verification doesn't undo the damage of initial belief. The speed of sharing is the core problem, and technology hasn't solved it.
Here is the institutional-stakes reality: in 2023, the United Nations Office for Disarmament Affairs held consultations on whether deepfake technology should be regulated under international law as a tool of information warfare. The question on the table: does releasing a deepfake during an armed conflict constitute an act of information warfare under international humanitarian law?
No binding resolution was reached. The legal framework for this is still being written by experts in rooms most people never hear about. But those decisions will shape what governments can do, what platforms are required to do, and what constitutes accountability for synthetic media in conflict contexts.
The ethical question you're sitting with: If verified genuine footage of a war crime could be convincingly dismissed as a deepfake by the perpetrators — and if it's possible that synthetic footage could be used to falsely accuse someone of a war crime — how should evidence standards for international accountability adapt? There's no clean answer. Lawyers, technologists, and policy makers are actively working on this. You now understand enough to follow those debates intelligently when they reach public attention.
The liar's dividend — the idea that deepfake technology's existence harms truth even when the technology isn't used — is a concept most people, including most adults following international news, haven't encountered. You have. It changes how you interpret every official denial of footage you'll encounter going forward.
You work for an international human rights documentation group. A 47-second video has arrived showing what appears to be a military strike on a civilian building in an unnamed conflict zone. A government official has already publicly stated the footage is "AI-fabricated." Your team needs to decide whether to include it in an official report submitted to an international tribunal.
PRIYA is your senior analyst. She's done this for eight years. She's seen footage that turned out to be genuine denied as fake, and footage that seemed real turn out to be staged. She doesn't jump to conclusions in either direction.
In January 2019, Gabonese President Ali Bongo Ondimba had been absent from public view for nearly two months following a reported stroke. His government released a New Year's address video to prove he was alive and governing. Within hours, generals in his own military declared the video a deepfake — saying the president was incapacitated — and attempted a coup. They cited "institutional vacuum" and the video's supposedly artificial appearance as justification.
The coup attempt failed. And here's the twist: most analysts who examined the video concluded it was probably authentic. Bongo looked and moved strangely because he had suffered a real stroke. The military officers either genuinely believed the fake claim or used it as political cover for a power grab they wanted to attempt anyway. Independent analysis eventually satisfied most outside observers that the footage was real.
The Gabon case is the reverse of the usual scenario. Instead of a fake being mistaken for real, something real was mistaken for a fake — and that mistake was used to justify an armed coup attempt against a sitting government. The liar's dividend, applied in reverse by people trying to seize power. Detection, it turns out, isn't just about catching fakes. It's about calibrating correctly in both directions.
Professional deepfake forensic analysts work across several layers simultaneously. You won't have their tools, but understanding the layers gives you a framework for systematic doubt — which is more durable than any single detection trick.
Layer 1: Pixel-level analysis. AI-generated video leaves statistical fingerprints in the data. Compression artifacts look different in AI-generated content than in genuine camera footage. Skin texture at the pixel level has distinct properties when synthesized. These aren't visible to the naked eye but are detectable with tools like FotoForensics and commercial platforms like Sensity. For your own viewing: if something looks unusually smooth — skin too even, texture too uniform — that's a flag.
Layer 2: Temporal inconsistency. Deepfakes process video frame by frame and sometimes struggle with consistency across time. Watch for: faces that seem to "swim" or shift slightly between frames, lighting that changes discontinuously between cuts, hair that behaves differently in consecutive frames. These are usually invisible at normal speed — slowing to 0.25x on most video players makes them visible.
Layer 3: Physiological signals. Human faces carry physiological information — pulse rate creates subtle color changes in skin, breathing affects shoulder movement, genuine emotion affects micro-musculature in ways current AI models don't perfectly replicate. Some detection tools analyze these signals. For casual viewing: look for eyes that seem dead or flat, mouths that move without corresponding throat movement, and expressions that reset unnaturally between words.
For voice clones, the detection signals are different because the medium is different. The most reliable human-accessible indicators:
Background consistency: Genuine recordings pick up room acoustics, ambient noise, and environmental sounds. Cloned voices are often generated in a kind of acoustic "clean room" — they sound unusually clear, without the background hiss or reverb that real recordings carry. If a supposed phone call sounds cleaner than a studio recording, that's worth questioning.
Prosody errors: Prosody is the musicality of speech — the rhythm, stress, and intonation that make language sound natural. Current voice models struggle with unusual sentence structures, proper nouns, technical terms, and emotional variation. They tend to apply stress patterns that are statistically average rather than contextually specific. A cloned voice reading a speech it was trained on sounds fine; a cloned voice improvising sounds slightly robotic.
Breath and pause placement: Real speakers breathe. They pause at natural points — before complex ideas, for emphasis, from emotion. AI voice models generate pauses based on punctuation and statistical patterns, which rarely match exactly where a human would breathe. Listen for pauses that feel grammatically correct but emotionally wrong.
No single technical check is permanent. Models improve. Artifacts disappear. The most durable skill is systematic sourcing: asking not "does this look real?" but "where did this come from, who recorded it, who published it first, and what is the chain of custody?" Authentic footage has a traceable origin. Synthetic content often appears without one — or with a traceable origin that, on inspection, goes only as deep as the first person to post it.
The technology will keep improving. Every specific artifact mentioned in this lesson — blinking anomalies, hairline blurring, prosody errors — will be reduced in future model generations. Detection checklists have an expiration date. What doesn't expire is a calibrated approach to evidence.
Researchers at the MIT Sloan School of Management found in 2023 that the most effective misinformation resistance doesn't come from fact-checking skills specifically. It comes from what they call accuracy motivation — the habit of asking "is this actually true?" before sharing or acting on media, regardless of whether the content aligns with what you already believe. People who ask this question consistently outperform people who know many specific detection techniques but apply them selectively.
What this means practically: the goal isn't to become someone who can detect deepfakes. The goal is to become someone who automatically treats high-stakes media with calibrated skepticism — including media that confirms your existing beliefs, which is actually the harder case. Deepfakes that confirm what we already think are far more dangerous than deepfakes that contradict it, because our critical systems engage more readily when something challenges us.
The Gabon case — where real footage was called fake — is a reminder that calibration works in both directions. Skepticism isn't the same as denial. The question isn't "assume everything is fake." The question is "what does the evidence actually support, and am I asking that question as consistently for things I want to believe as for things I don't?"
You've completed a module on deepfakes, voice cloning, synthetic conflict footage, and forensic detection. You know what a GAN is and how it's trained. You know the liar's dividend. You know what prosody errors sound like and why they exist. You know why the Gabon case matters as much as the Zelensky case. Most people who encounter this technology — including most adults — haven't thought through any of this. You have. That's not a reason for arrogance. It's a reason for responsibility — to ask better questions, share more carefully, and slow down when it matters most.
You've been asked to create a one-page guide for a community journalism organization — volunteers who cover local events with no forensic lab access. They need a practical detection framework for suspicious video and audio that doesn't require special tools, that works under time pressure, and that calibrates in both directions (real mistaken for fake, fake mistaken for real).
DASH is a veteran community journalist who has seen every kind of "media literacy guide" and finds most of them useless in the field. He's going to test every part of your framework against real-world conditions. He expects you to defend your choices.