Who Are You Online With AI? · Introduction

Your Digital Self Is Being Read by Machines That Never Forget

This course exists because the rules changed — and almost nobody told you.

In October 2021, a 17-year-old in New Jersey posted a sarcastic tweet during a school lockdown drill — the kind of dark joke teenagers make when they're bored and a little anxious. Within hours, the tweet had been screenshotted, stripped of its context, and shared by thousands of strangers who had no idea it was a joke. Her school called her parents. News stations picked up the story. She received death threats from adults across the country. The post was up for less than six hours before she deleted it. The consequences lasted years. That's the world this course is about — not the world your parents grew up in, and not the world the adults writing internet-safety pamphlets imagined.

Right now, AI systems are being used to scan social media, analyze text for emotional tone, build profiles of people based on their posts, and flag behavior that algorithms decide is suspicious. Schools are buying software that monitors students' online activity. Employers search candidates' social histories going back a decade. The things you say online don't just reach your followers — they feed systems designed to draw conclusions about who you are. Most people have no idea this is happening. You are about to.

This course won't tell you to "stay safe online" or to never post anything you wouldn't want your grandmother to see. That advice was outdated in 2010. What it will do is show you the actual mechanics — why posts go viral against the poster's will, what a digital footprint actually means when AI is reading it, how online identities get stolen or distorted, and what privacy means when nothing is ever truly deleted. By the end, you'll have a framework for making smarter choices — not because someone told you to, but because you understand what's actually going on.

Who Are You Online With AI? · Lesson 1 of 4

The Post Nobody Meant to Go Viral

How ordinary content becomes unstoppable — and what AI is doing to speed that up.

If you didn't intend for something to spread, are you still responsible for where it goes?

On the morning of December 20, 2013, a 30-year-old public-relations director named Justine Sacco boarded a flight from London to Cape Town, South Africa. Before takeoff, she posted a tweet to her 170 followers — a clumsy attempt at ironic humor about AIDS and race that was, by most readings, offensive. Then she put her phone on airplane mode and fell asleep for eleven hours.

While she slept, the internet did not. A writer named Sam Biddle saw the tweet, retweeted it, and added: "Justine Sacco, PR executive, @JustineSacco, is tweeting this from JFK right now." Her original post had been made to 170 people. Within two hours it had been seen by millions. A hashtag — #HasJustineLandedYet — trended globally. Strangers coordinated to find her flight number. Someone drove to the Cape Town airport just to photograph her expression when she turned on her phone and saw what had happened. By the time she landed, she had lost her job. This all occurred between approximately 11 a.m. and 10 p.m. on a single Friday, while she was completely unreachable in the air.

Sacco's case is not unique. It is, however, one of the most precisely documented examples of what journalist Jon Ronson later called a "public shaming event" — and it happened before modern AI content-amplification systems existed. Today, those systems are running. What took the 2013 internet a few hours of human effort now takes algorithms a few minutes. The question this lesson explores is not whether Justine was right or wrong. The question is: how does a post escape the person who made it?

Section 1 — The Mechanics of Going Viral

Most people think content goes viral because it's popular. That's only half true. Content goes viral because platforms are designed to make it go viral — and the design works on emotions, not logic.

Every major social platform uses a system called a recommendation algorithm. Think of it as a machine whose job is to keep you scrolling. It learns what makes you stop and engage — what makes your thumb pause, what makes you tap like, what makes you comment in anger. Then it shows you more of that. Not because it cares about truth or fairness, but because engagement equals time-on-app, and time-on-app equals advertising money.

The emotions that drive the most engagement are not happiness or calm. Research by William Brady at New York University, published in 2017, found that moral outrage spreads faster than almost any other kind of content. Every word that signals anger, disgust, or perceived injustice increases the likelihood a post gets shared by about 17%. A post that makes someone feel righteous — like they're punishing someone who deserves it — spreads almost automatically.

Justine Sacco's tweet was tailor-made for this, even though she never intended it to be. It combined a recognizable name, a clear "villain" role, a moral transgression, and an irresistible dramatic element — she didn't know. The not-knowing made it a story. And once it was a story, the platform's mechanics did the rest.

Recommendation Algorithm — A set of automated rules that decides what content to show you next, designed to maximize how long you spend on a platform. It favors emotionally activating content regardless of whether that content is accurate or fair.

Moral Outrage — The feeling that someone has violated a shared rule about right and wrong, combined with the urge to punish them. Online, this feeling spreads faster than almost any other type of emotional content.

Why This Matters Now

In 2013, amplification was driven mostly by humans retweeting. Today, AI systems decide within milliseconds whether to push content to thousands of extra people. A post that gets a few angry replies in the first ten minutes can be auto-promoted to hundreds of thousands of users before any human reviewer sees it. The window between "posted" and "viral" is shorter than it has ever been.

Section 2 — Context Collapse: When Your Audience Multiplies

There is a concept in communication theory called context collapse. Here's what it means in plain terms: when you talk to different people in real life, you naturally adjust. You talk differently to your best friend than you do to your teacher, your grandparent, or a stranger on the street. You're not being fake — you're just reading the room. Every human does this. It's called code-switching, and it's a sign of social intelligence, not dishonesty.

Online, that adjustment is almost impossible. When you post on a public platform, you are technically speaking to everyone at once — your close friends, your relatives, people who hate you, people who've never heard of you, journalists, future employers, and AI systems that crawl the internet to build databases. All of those audiences collapse into one. The thing you posted for your three closest friends can be read by your principal within the hour.

Context collapse was named and described by researcher danah boyd (she spells her name in lowercase) in her 2014 book It's Complicated: The Social Lives of Networked Teens. Boyd studied thousands of teenagers' online behavior over a decade and found that the most consistent problem wasn't that young people were being reckless. It was that they had no adequate vocabulary for the fact that their intended audience and their actual audience were completely different things.

Justine Sacco was posting to 170 followers, most of whom presumably understood her sense of humor. Her actual audience, by nightfall, was tens of millions of strangers — none of whom had any context for who she was or what she meant. That gap — between who you were talking to and who actually hears you — is context collapse. And AI systems that automatically scrape, index, and redistribute content make that gap permanent.

Context Collapse — When content created for one specific audience is seen by many different audiences simultaneously — including audiences the creator never intended, who may interpret it completely differently.

You Can Now See What Most People Miss

When someone says "just don't post anything offensive," they're treating context collapse as a content problem. It's actually an audience problem. The exact same words that are funny to your friend group can be career-ending to a stranger with no context. Understanding this means you're thinking about communication at a level most adults haven't caught up to yet.

Section 3 — AI and the Amplification Engine

In 2013, the viral spread of Justine Sacco's tweet was driven by human decisions — each retweet was a person choosing to share. What's different now is that AI recommendation systems have inserted themselves into that process, and they don't make choices the way humans do. They optimize.

Here's a concrete example. In 2021, Facebook's own internal research — later leaked by whistleblower Frances Haugen — revealed that the platform's algorithm had been modified in 2018 to prioritize content that received "angry" reactions. The goal was to increase engagement. The result, as Facebook's own researchers documented, was that the algorithm systematically amplified outrage, misinformation, and divisive content because those got more clicks. The AI wasn't trying to make people angry. It just learned that anger kept people on the app, and it did its job.

This matters for understanding your own digital life because the AI systems running these platforms don't know you. They don't know your tone, your irony, your context, or your intent. They know engagement signals. A post you made sarcastically gets treated identically to one made sincerely. A joke that your friends understood gets amplified to strangers who don't get it. The AI is a powerful distribution machine that is completely indifferent to meaning.

By 2023, short-video platforms like TikTok were running recommendation systems so refined that they could identify a user's emotional state from their scrolling patterns and adjust content delivery accordingly — slowing down when a user seemed to pause longer on certain topics, speeding up when they were disengaged. These systems have no concept of "this post might ruin someone's life." They have only the concept of "this post drives engagement."

Ethical Question — No Clean Answer

If a platform's AI system amplifies your post to millions of people who misunderstand it and those people send you death threats — who is responsible? You made the post. The AI did the amplifying. The strangers sent the threats. At what point does the platform bear responsibility for what its own algorithm does to real people? This question is being debated right now in courts and legislatures around the world, and nobody has a settled answer yet.

Section 4 — What "Going Viral" Actually Costs

There is a cultural myth that going viral is good. Sometimes it is. Musicians have been discovered, businesses have been launched, injustices have been exposed because the right content reached the right people at the right time. But the same mechanics that create a lucky break can create a catastrophe — and the person it's happening to rarely knows which one it is until it's over.

Consider what happened to Ghyslain Raza, a 15-year-old Canadian student who in 2002 filmed himself swinging a golf ball retriever like a lightsaber in a school studio and forgot to erase the tape. Classmates found it, digitized it, and posted it online. By 2006 it had been viewed over 900 million times — he became known as "Star Wars Kid," one of the first major internet viral figures. Raza did not benefit from this. He was bullied so severely that his parents pulled him from school and he required psychiatric care. His family sued the classmates who posted the video. In 2013, he spoke publicly about the experience for the first time: "No matter how hard I tried to ignore people telling me to commit suicide, I couldn't help but feel worthless, like my life had no value."

Raza's case predates AI-driven amplification by almost two decades. The video spread through early message boards and manual sharing — slow by today's standards. In a 2023 environment, with algorithmic amplification, that same video would have reached a billion views in days, not years. The human cost — the bullying, the shame, the psychological harm — would have been compressed into weeks instead of years, with no off-ramp.

Knowing this changes how you see every piece of content that "blows up." Behind almost every viral post about a private individual is a real person who did not design their life to be public entertainment. This doesn't mean viral content is always wrong. It means you now have a frame — a way of seeing the person inside the post — that most people scrolling past them don't have.

The Thing to Carry Forward

Virality is not a feature of content. It is a feature of systems — systems designed by companies, optimized by AI, and activated by human emotion. You are not powerless inside those systems, but you are also not in control of them. The gap between what you intended and what actually happens to your post can be enormous. Lesson 2 explores what happens after the damage — and why the internet's memory is longer than you think.

Lesson 1 Quiz

Five questions — testing your reasoning, not just your memory.

1. On December 20, 2013, Justine Sacco's tweet went viral while she was on a flight. Which factor best explains why it spread so quickly?

Correct. The tweet's virality was not accidental — it had the ingredients (named person, moral violation, ironic twist of unavailability) that platform mechanics are designed to amplify.

Not quite. Review Section 1 on what types of content recommendation algorithms favor and why outrage spreads faster than other emotions.

2. A student posts a meme to a private group chat with six friends. One friend screenshottts it and posts it publicly. Hundreds of strangers see it and are angry. Which concept from Lesson 1 best describes what happened?

Correct. Context collapse is exactly this — the original audience (six friends with shared context) and the actual audience (hundreds of strangers) are completely different, and the content gets interpreted through the wrong lens.

Look again. The key here is the gap between the intended audience and the actual audience — that's the definition of context collapse, covered in Section 2.

3. According to William Brady's 2017 research, what happens to a social media post for every word that signals moral outrage?

Correct. Brady's research quantified something that feels intuitive once you know it — outrage is contagious online, and it's measurably so.

That's not what Brady found. Reread Section 1 — the specific statistic about moral outrage and sharing rates is there.

4. You post a joke about your school's lunch food. It's meant to be funny to your 50 followers. Instead, it gets shared and reaches 80,000 people, some of whom tag local journalists. Which of the following is the MOST accurate explanation of why this is a problem that "just post nicely" advice doesn't solve?

Correct. This is the key insight of Section 2. Context collapse is an audience structure problem, not a content quality problem. "Post better" doesn't fix it — understanding that your actual audience is always larger and less contextualized than your intended audience is the only real defense.

Think again. The lesson argues that the real problem isn't what you post — it's who ends up seeing it and why they can't understand the context. Reread the gold callout in Section 2.

5. The 2021 Facebook whistleblower Frances Haugen revealed that the platform's algorithm had been changed in 2018 to prioritize "angry" reactions. What was the stated goal of this change, and what was the documented result?

Correct. This is the core tension of AI-driven amplification — the system optimizes for what it's told to optimize for (engagement), without any understanding of the human consequences of what it's spreading.

Review Section 3. The Haugen example specifically demonstrates what happens when an AI is optimizing for engagement without any concept of harm.

Lab 1 — The Virality Auditor

Your role: investigate why a real post spread — and whether the spread was justified.

Your Assignment

You are a digital media auditor. Your job is to analyze viral events — not to decide if someone was a good or bad person, but to identify the mechanics that made their content spread. The AI you're working with is a fellow analyst — skeptical, direct, and will push back if your reasoning is weak.

Use the case of Justine Sacco (2013) or Ghyslain Raza (2002) — or bring in another case you know about. Identify: which virality mechanics were active, whether context collapse occurred, what role AI/algorithmic amplification played (if any), and whether the spread was proportionate to what the person actually did.

Start by picking a case and telling your analyst which virality mechanic you think was most important in that case — and why you think so. Be specific. The analyst will push back.

Analyst — Virality Lab

Pick a case — Sacco, Raza, or one you know — and tell me which virality mechanic you think was most important. Don't summarize the story. Make an argument. I'll tell you if I think you're right.

Who Are You Online With AI? · Lesson 2 of 4

Nothing Is Ever Really Deleted

Your digital footprint is not what you posted — it's what the internet kept without asking you.

If you deleted something five years ago, and an AI found it today, is that still yours?

In 1998, a Spanish lawyer named Mario Costeja González had his home repossessed and auctioned to pay a social security debt. As required by law, the auction was published in a newspaper, La Vanguardia. The debt was paid. The auction happened. Life moved on. Then the internet arrived. By 2009, if you searched Mario's name on Google, the first result was still that 1998 auction announcement — a fact from his past that he had legally resolved over a decade earlier, now permanently attached to his professional identity.

Mario complained to Google and to the Spanish data protection authority. Google refused to remove the link, arguing it was public information published by a newspaper. In 2014, the Court of Justice of the European Union ruled against Google. The court established what is now known as the "right to be forgotten" — the legal principle that individuals have the right, under certain conditions, to request that search engines remove links to information about them that is outdated, irrelevant, or harmful to their reputation. By 2020, Google had received over 845,000 such removal requests from Europeans.

The United States has no equivalent law. If you are in the US and something embarrassing or damaging was published about you online, it stays — indexed, searchable, and available to any AI system that crawls the web. Mario's case shows that the fight over digital memory is real, legal, and ongoing. It also shows something more immediate: the internet doesn't forget just because you do.

Section 1 — What a Digital Footprint Actually Is

The phrase "digital footprint" gets used so often in internet-safety talks that it's lost most of its meaning. So let's be precise about what it actually includes, because the real list is probably longer than you think.

Your digital footprint has two layers. The first is your active footprint — things you deliberately created: posts, comments, messages you sent, accounts you made, photos you uploaded, reviews you wrote. These feel like choices, and they are. The second layer is your passive footprint — data generated about you without you consciously creating it. This includes: every search you've ever typed, every website you've visited, how long you spent on each page, your location when you accessed an app, what time of day you're most active, what you looked at for more than three seconds, which ads you hovered over without clicking.

The passive footprint is almost always larger than the active one. And it's almost entirely invisible to you. You can't see it. You can't easily delete it. And it's being read by AI systems constantly.

In 2018, researchers at the University of Cambridge published a study showing that from Facebook likes alone — without any other data — machine learning models could predict a person's political views, religion, sexual orientation, and even personality type with accuracy rates comparable to people who had known that person for years. The likes were passive data. Most people never thought of liking a post as a statement about who they are. The AI read it that way anyway.

Active Footprint — Digital content you deliberately create: posts, comments, accounts, uploaded photos and videos.

Passive Footprint — Data generated about you automatically — search history, location data, time-on-page, scrolling patterns — without you consciously choosing to create it. Usually larger and more revealing than your active footprint.

Pause Point

Before going further: think about one thing you've searched for online in the past week that you'd prefer nobody else to know. That search is stored somewhere. It was processed by an algorithm that used it to build a profile of your interests. This isn't hypothetical — it is the actual business model of most free online services.

Section 2 — The Wayback Machine and the Permanent Archive

In 1996, a nonprofit organization called the Internet Archive began systematically copying and storing the entire public web. Every few months — and in many cases, daily — automated bots crawl billions of web pages and save snapshots. By 2023, the archive contained over 800 billion web pages. It is accessible to anyone, for free, at web.archive.org. It is known informally as the Wayback Machine.

This means that a post you made on a public forum in 2017, deleted in 2019, may still exist in the Wayback Machine's archive from 2017. Journalists regularly use it to recover deleted statements. So do opposition researchers in political campaigns. So do employers doing background checks. And increasingly, AI systems training on internet data have ingested archived web content — meaning something you deleted years ago may have already been read and processed by AI models that will continue using that training data for years.

In 2019, a high school student from Georgia named Kyle Kashuv — who had become a prominent gun-rights activist after surviving the 2018 Parkland shooting — had his admission to Harvard University rescinded. The reason: private messages from two years earlier, when he was 16, had been screenshotted and shared publicly. The messages contained racial slurs. Kashuv apologized publicly and said he had grown significantly since writing them. Harvard revoked admission anyway. The institution making the decision about who he was in 2019 based on what he'd written as a 16-year-old in 2017.

This is not a lesson about whether Harvard was right or wrong. That's genuinely complicated. It is a lesson about the gap between the person who wrote something and the record that remains after they've changed.

Ethical Question — No Clean Answer

Kyle Kashuv wrote racist messages when he was 16. He says he changed. Should a permanent digital record define who someone is years later, when they were young and the context has shifted? How long should a record be held against someone? Is there a difference between a public figure and a private person in this regard? There's no agreed answer — but this exact debate is shaping privacy law, college admissions policies, and HR practices right now.

Section 3 — What AI Does With Your Old Data

Here is something that surprises most people: modern AI language models — the kind that power chatbots and writing assistants — were trained on enormous amounts of text scraped from the internet. That includes old forum posts, deleted blog entries, social media content, news articles, and archived web pages. When you "delete" a post, you remove it from the platform's visible interface. But if that post was already scraped by a crawler — by the Internet Archive, by a search engine's index, by an AI training dataset — the deletion didn't reach those copies.

This created a legal crisis in 2023 that is still unresolved. Authors discovered their published books had been included in datasets used to train AI models without permission. In July 2023, Sarah Silverman, Christopher Golden, and Richard Kadrey filed a lawsuit against Meta, arguing that its LLaMA AI model had been trained on illegally obtained copies of their copyrighted work. Similar suits were filed against OpenAI. The underlying question — who owns data once it's online, and what can AI companies do with it — has no settled legal answer.

For you, the practical implication is this: the internet is not a whiteboard. It is closer to a stone carving. You can paint over the carving. You can put a cloth over it. But underneath, the marks remain — and sophisticated tools, including AI systems, can see through paint.

You Can Now See What Most People Miss

Every conversation about "just delete it" assumes that deletion is the end of the story. Knowing what you now know — about the Wayback Machine, about AI training data, about the difference between visible and indexed content — you understand that deletion is often just the beginning of a much longer story. This is information that shapes real decisions being made right now about AI copyright law, data privacy regulations, and the "right to be forgotten" in the US.

Section 4 — Managing Your Footprint When Deletion Isn't Enough

If you can't truly erase the past, what can you actually do? This is a real question with real, if imperfect, answers.

The first strategy is proactive publication — the idea that the best way to control what comes up when someone searches your name is to actively create good content that ranks higher than bad content. Search engines show results in order of relevance and authority. A well-maintained public profile, a portfolio of work, a consistent positive presence can push older or negative results down — not eliminate them, but make them less visible.

The second strategy is understanding the legal landscape. In Europe, the "right to be forgotten" established in the 2014 Costeja González ruling means you can formally request that Google and other search engines remove links to outdated personal information. In the US, the Children's Online Privacy Protection Act (COPPA) provides some protections for users under 13. Several US states — California, Virginia, Colorado — have passed their own data privacy laws that give residents more rights to request data deletion. These are imperfect tools, but they exist.

The third strategy is the hardest: accepting that some things persist and managing expectations about future audiences accordingly. This doesn't mean never posting anything personal. It means occasionally asking: if someone who doesn't know me at all sees this five years from now with no context, what might they conclude? That question isn't about being paranoid. It's about the gap between your intended audience and your actual audience — the context collapse you learned about in Lesson 1, now viewed through the lens of time rather than space.

Lesson 2 Quiz

Digital memory — how much do you actually understand about what persists?

1. What was the legal principle established by the 2014 Court of Justice of the European Union ruling in the case of Mario Costeja González vs. Google Spain?

Correct. This is the "right to be forgotten" — a legal tool that exists in Europe but not in the US, which is why the debate around digital memory is so different in the two regions.

Review the Opening Scene. The González case established a specific right about search engine results — not newspapers, not AI companies.

2. Which of the following is an example of a PASSIVE digital footprint — not an active one?

Correct. Time-on-page data is generated automatically by your behavior — you didn't choose to create it. That's what makes it passive, and that's why it's so revealing.

The distinction is between things you consciously created (active) and data generated about your behavior without a deliberate act (passive). Reread Section 1.

3. A 2018 Cambridge study showed that from Facebook likes alone, AI could predict political views, religion, and personality with high accuracy. What is the most important implication of this finding for how you think about your passive footprint?

Correct. This is the key implication — the gap between what you think you're sharing and what AI infers from your behavior can be enormous. Passive data is often more revealing than active data precisely because people don't guard it.

The point isn't about Facebook specifically or about malice — it's about what passive data reveals. Reread Section 1 on the Cambridge study finding.

4. You delete a post you made on a public forum three years ago. Why might that post still be accessible to someone researching you today?

Correct. The Internet Archive saves snapshots of billions of web pages. If your post was crawled before you deleted it, the archived version persists and is publicly accessible.

The specific answer here is about archiving services, not legal requirements or platform mechanics. Review Section 2 on the Wayback Machine.

5. Kyle Kashuv had his Harvard admission rescinded in 2019 based on messages he wrote in 2017 as a 16-year-old. From the perspective of digital footprint management, which strategy from Section 4 would have been MOST relevant to his situation — and what does it actually involve?

Correct. The messages were private — proactive publication wouldn't have addressed private messages, and legal deletion didn't apply. The most honest framing is the third strategy: private messages can escape their context, and future audiences will not share your past self's frame of reference.

Think carefully about which strategy applies to private messages that were screenshotted and shared. Reread Section 4 and consider which tool is actually relevant to that specific situation.

Lab 2 — The Footprint Investigator

Your role: map a real or hypothetical person's digital footprint and identify what an AI would infer.

Your Assignment

You are a digital footprint investigator. A college admissions office has asked you to assess what a candidate's online presence says about them — not based on their application, but on publicly available data. Your AI partner helps you think through what data would actually be findable and what it would reveal.

The twist: you are investigating yourself (hypothetically) or a fictional 17-year-old who has been online since age 10. What active and passive data would exist? What would AI systems infer from it? Is that inference fair?

Start by listing three types of passive footprint data that would exist for a typical 17-year-old who has been online since age 10. Then tell me: which one do you think reveals the most about who they are — and whether that inference is fair or unfair. Your analyst will challenge your reasoning.

Investigator — Footprint Lab

Ready when you are. Give me the three passive data types, pick the most revealing one, and take a position on whether using it is fair. Don't hedge — I want an actual argument.

Who Are You Online With AI? · Lesson 3 of 4

Who Gets to Decide Who You Are Online?

Identity theft, AI impersonation, and the gap between your real self and your digital profile.

If an AI can write convincingly as you, using your posts as training data — who owns that voice?

In early 2023, actress Scarlett Johansson discovered that her face and apparent likeness had been used in AI-generated advertisements being run on major social media platforms — ads she had never agreed to, for products she had never endorsed. The ads used deepfake technology to superimpose her face on a spokesperson, then used AI voice cloning to add narration in a voice designed to sound like hers. The ads ran for weeks before being taken down. Similar incidents happened to Tom Hanks, who in September 2023 publicly warned his followers on Instagram: "There's a video out there promoting some dental plan with an AI version of me. I have nothing to do with it."

Then in May 2024, OpenAI released a voice for its ChatGPT assistant called "Sky" — which Johansson said sounded so similar to her voice (she had previously recorded the AI character Samantha in the 2013 film Her) that friends and her own agents contacted her assuming she had agreed to the deal. She had not. She had in fact declined a direct offer from OpenAI's CEO. OpenAI paused the Sky voice while the dispute was ongoing. The legal question — does a person own the right to their voice and likeness in the AI era, even when the AI never literally copied them but only mimicked them — has no settled answer.

These cases involve celebrities because celebrities have public profiles and legal resources. But the underlying technology is available to anyone. In 2023, the cost of training an AI model to mimic a specific person's writing style, voice, or visual appearance had dropped to near zero for anyone with moderate technical skill. The question of who you are online is no longer only about what you post. It's about what AI can construct from what you've posted — and whether you have any say in the result.

Section 1 — The Anatomy of Digital Identity Theft

Most people think of identity theft as stealing a credit card number or a Social Security number. That version still exists. But in the AI era, digital identity theft has added several new forms that are harder to detect and harder to recover from.

The first new form is account takeover — gaining unauthorized access to someone's actual accounts through password theft, phishing (fake login pages designed to capture your credentials), or SIM-swapping (convincing a phone carrier to transfer someone's number to the thief's device, bypassing two-factor authentication). In August 2020, a 17-year-old from Tampa, Florida named Graham Ivan Clark was arrested for using SIM-swapping to take over the Twitter accounts of Barack Obama, Joe Biden, Elon Musk, Bill Gates, and dozens of others simultaneously — running a bitcoin scam that netted over $100,000 in a single afternoon. Clark was 17.

The second new form is synthetic identity creation — using publicly available data about you to build a fake version of you. This might mean creating a fake social media account using your photos, writing in your style based on your posts, and using it to say things you never said. In 2022, researchers at Georgetown University documented cases where AI-generated "sock puppet" accounts — fake accounts designed to look like real people — were being used to spread political messages that the real people whose identities were being mimicked would never have supported.

The third form — the newest and least legally defined — is AI voice and likeness cloning. Given fifteen to thirty seconds of someone's recorded voice, widely available AI tools can generate unlimited new audio in that person's voice saying anything. Given enough photos, AI can create video of someone doing things they never did.

SIM-Swapping — A social engineering attack where a criminal convinces a phone carrier to transfer someone's phone number to a device they control, bypassing SMS-based two-factor authentication and gaining access to accounts that use that number.

Deepfake — AI-generated media — video, audio, or images — in which a person's face, voice, or likeness has been digitally replaced or synthesized to make it appear they said or did something they did not.

Section 2 — Your Online Identity Is Already a Profile, Not a Person

Here is a distinction that changes how you think about everything: there is a difference between you and your digital profile. You are a person — complex, changeable, full of context, capable of explaining yourself. Your digital profile is a collection of data points — interpreted by algorithms, indexed by search engines, and read by AI systems that have no capacity to ask you what you meant.

This distinction matters because systems that make decisions about you — hiring algorithms, credit scoring systems, college admissions software, social media safety tools — do not interact with you. They interact with your profile. In 2014, Amazon built an AI hiring tool intended to automatically screen job applications. By 2018, internal reviewers discovered that the model had learned to penalize resumes that included the word "women's" (as in "women's chess club") and downgrade graduates of all-women's colleges. Amazon scrapped the tool. The AI hadn't been told to discriminate. It had just learned patterns from a decade of Amazon's previous hiring decisions — and those decisions had been made by humans who had, consciously or not, favored male candidates.

Your digital profile will be read by systems like this. Understanding that the profile and the person are different things — and that the profile can be wrong, incomplete, or actively biased — is one of the most practically useful things you'll take from this course.

You Can Now See What Most People Miss

When a system makes a decision about you based on your digital profile, and the decision is wrong, the instinct is to say "the AI made a mistake." The deeper truth is that the AI made a decision based on incomplete data — data that can never capture who you actually are. Knowing this distinction means you can challenge those decisions more effectively, because you know what the system is reading and what it's missing.

Section 3 — When AI Invents a Version of You

In April 2023, a journalist named Kashmir Hill at The New York Times reported on an experiment: she asked ChatGPT to write a biography of herself based on publicly available information about her. The biography contained several invented facts — conferences she had never attended, positions she had never held, articles she had never written. The AI wasn't lying in any meaningful sense. It was doing what language models do: generating plausible-sounding text based on patterns. But those invented facts were presented with the same confident tone as the accurate ones. If someone read that biography without knowing Kashmir Hill personally, they would have no way to know which parts were real.

This phenomenon — AI generating false information about real people stated confidently and specifically — is called hallucination. It's a known limitation of current AI systems. And it creates a practical problem: AI-generated profiles of real people are already appearing on the internet, being indexed by search engines, and potentially feeding back into the training data of future AI models. A hallucinated fact about you, repeated often enough across the internet, could become a durable part of your digital profile — attributed to you, searchable under your name, and very difficult to correct.

In 2023, a Georgia radio host named Mark Walters sued OpenAI after a ChatGPT-generated legal summary falsely accused him of embezzlement. The summary was produced in response to a journalist's question. It named him specifically, described crimes in detail, and was entirely fabricated. This was the first defamation lawsuit filed against an AI company in the US. The case was ongoing as of 2024. The question at its center — whether an AI company is legally responsible for falsehoods its model invents about real people — has no settled precedent.

Ethical Question — No Clean Answer

If an AI model hallucinates false criminal accusations about a real person and those accusations spread online, who is responsible? The company that built the model? The person who asked the question? The platform that hosted the output? The answer determines who you would sue, who would pay, and whether the harm could ever be undone. Courts around the world are working on this right now — without consensus.

Section 4 — What You Can Actually Control

There's a realistic version of control and an unrealistic one. The unrealistic version is: manage everything about your online identity so that no one can ever misrepresent you. That's not possible. The realistic version is: understand what systems are reading about you, reduce unnecessary exposure, correct errors when you find them, and build a strong enough authentic presence that misrepresentations are harder to sustain.

Specifically: review what appears when you search your own name. If there are results you didn't create, know that they exist. In Europe, you can request removal under GDPR. In the US, you can often request removal directly from websites (with varying success) or from data broker services that aggregate personal information. Companies like Spokeo, Whitepages, and BeenVerified collect and sell profiles of private individuals — most allow opt-out requests.

More importantly: the most durable protection for your digital identity is a well-documented, authentic one. Not because it prevents bad actors — it doesn't — but because people who know who you actually are, from your own documented record, are more resistant to being fooled by a fake version of you. A person with no real online presence is easier to impersonate than a person whose real presence is clear and consistent.

And finally: understand that the gap between your digital profile and your actual self is not a flaw in you. It is a structural feature of how these systems work. You are not reducible to your posts. Neither is anyone else.

Lesson 3 Quiz

Identity, AI, and the gap between profile and person.

1. In 2024, Scarlett Johansson objected to OpenAI's "Sky" voice. What made this case legally significant beyond ordinary copyright?

Correct. The case raised new legal territory — existing law covers copying, but mimicry by AI without literal copying had no settled precedent. That's what makes it significant.

Review the Opening Scene. The key issue was mimicry, not copying — and that distinction is exactly what makes this legally new territory.

2. What is a "deepfake," and which of the following is the most accurate description of why deepfakes are a new kind of identity threat compared to earlier forms of impersonation?

Correct. The critical shift is that deepfakes don't require the subject's participation, cooperation, or even presence — only their publicly available voice, image, or data.

Reread the key term definition for deepfake in Section 1, then think about what makes this different from older forms of impersonation.

3. Amazon built an AI hiring tool that learned to penalize resumes mentioning "women's" activities and downgrade all-women's college graduates. The AI was never told to do this. What is the best explanation for why it happened?

Correct. This is a core concept in AI ethics: a model trained on biased historical data will reproduce that bias, even without any explicit instruction to discriminate. "Garbage in, garbage out" — except the garbage was invisible inside years of seemingly normal business decisions.

Review Section 2. The Amazon case illustrates how AI systems can amplify existing biases without any malicious intent — because they learn from historical data, and that data carries human prejudice.

4. AI "hallucination" — generating false but confident-sounding information about real people — creates which of the following specific long-term risks to someone's digital identity?

Correct. The feedback loop is the real danger — hallucinated content gets indexed, the index becomes training data, and the hallucination gets reinforced and amplified across the web.

Review Section 3. The long-term risk is about how false information persists and potentially amplifies through the web's own information ecosystem.

5. The lesson argues that a "well-documented, authentic" online presence is the most durable protection for your digital identity. Which of the following best explains the reasoning behind this claim?

Correct. The logic is about audience resilience, not technical prevention. You can't stop someone from creating a fake version of you, but people who already know your real presence are harder to deceive.

Reread Section 4. The argument isn't about platform protection or SEO — it's about the people who know you and whether they can recognize a fake.

Lab 3 — The Identity Auditor

Your role: assess an AI-generated profile of a fictional person and identify where it diverges from reality.

Your Assignment

You are an identity auditor working with a law firm that handles cases where AI-generated false information has damaged someone's reputation. Your job is to identify the specific mechanisms by which an AI profile diverges from reality — not just that it's wrong, but how and why it gets things wrong.

The fictional subject is Alex Chen, a 19-year-old college student who has been online since age 9. Alex's digital footprint includes: a gaming YouTube channel from ages 10–14 (now private), a Twitter account active 2016–2020 (deleted), current Instagram (private), and three years of Discord server activity (semi-public). An AI was asked to generate a profile of Alex for a job application background check.

Tell me: which part of Alex's digital history do you think an AI background-check tool would be most likely to get wrong — and what kind of wrong would it be (hallucination, outdated data, missing context, or bias in the training data)? Take a specific position. Your analyst will challenge it.

Auditor — Identity Lab

Alex has a complicated history — content created as a child, a deleted account, private current presence, semi-public Discord. Pick the most likely failure point in an AI background check and tell me specifically what kind of error it would produce and why. Make a real argument.

Who Are You Online With AI? · Lesson 4 of 4

Privacy Isn't Dead — But It Works Differently Now

What privacy actually means when data is permanent, AI is reading everything, and the rules are still being written.

If something is technically public, does that make it fair game for any use — including by AI systems you never agreed to?

In January 2020, journalist Kashmir Hill (the same reporter from Lesson 3) published a story in The New York Times that most readers found genuinely unsettling. A company called Clearview AI, founded in 2017, had built a facial recognition database containing over three billion photographs — scraped without permission from Facebook, Venmo, YouTube, and millions of other websites. The database allowed law enforcement clients to upload a photo of an unknown person and receive back a list of results showing that person's social media profiles, along with links to the pages where their photos appeared.

Clearview had sold this service to over 600 law enforcement agencies in the US and abroad, including the FBI and Interpol, before the Times story ran. Every photograph in the database had been technically public — posted voluntarily on social media, visible to anyone who visited those pages. Clearview argued that scraping public photos was legal, just as anyone could manually search someone's public social media. The company's attorney compared it to "a super-Google for faces." Privacy advocates argued that there is a fundamental difference between a photo being visible to the people who encounter it naturally and a company aggregating billions of such photos into a searchable system that could track any individual's movements, relationships, and history.

By 2022, Clearview had been banned from selling its services to private companies in the US, fined over $9 million in the UK, and ordered to delete data on European citizens. The US government continued to use it. The core legal question — whether scraping and aggregating public data crosses a privacy line even if each individual piece of data was technically public — is still being resolved in courts. Clearview did not invent this question. It just made it impossible to ignore.

Section 1 — What Privacy Actually Means Now

The traditional definition of privacy is "the right to be left alone." That definition comes from a 1890 Harvard Law Review article by Samuel Warren and Louis Brandeis — and it was written in response to newspapers publishing society gossip, which was the cutting-edge privacy threat of the time. It's not a bad definition. But it was written for a world where information was scarce and required human effort to collect and distribute.

Legal scholar Helen Nissenbaum proposed a more useful framework in her 2010 book Privacy in Context. Her concept, called contextual integrity, argues that privacy is not about secrecy — it's about appropriate information flow. Information flows appropriately when it matches the norms of the context in which it was originally shared. A doctor sharing your medical information with another doctor is appropriate. A doctor sharing that same information with your employer is a violation — not because the information became secret, but because it moved outside the context where it was supposed to stay.

This framework is far more useful for thinking about Clearview. The photos in Clearview's database were public. But they were posted in a specific context — social media profiles, where the expected audience is people who encounter the profile naturally. Aggregating those photos into a facial recognition database for law enforcement fundamentally violates the original context. The information moved outside the norms of the context where it was shared. Under Nissenbaum's framework, that's a privacy violation — even if each individual photo was technically visible.

Contextual Integrity — The principle that privacy is maintained when information flows match the norms of the context in which information was originally shared. A violation occurs when information moves to a context with different norms — even if the original information was "public."

Why This Matters at an Institutional Level

Contextual integrity is the framework being used by many privacy lawyers, regulators, and technologists to argue for new AI regulations in 2024. The EU's AI Act, passed in 2024, draws partly on this logic to restrict certain uses of biometric data. This is the kind of thinking happening at policy levels right now — and it started as an academic concept developed by one person at NYU.

Section 2 — Surveillance That Doesn't Feel Like Surveillance

There's a version of surveillance most people recognize: cameras in stores, police following someone, a government tapping a phone. And then there's the version that most people experience constantly without recognizing it as surveillance at all.

In 2018, an investigation by the Associated Press found that Google was recording users' location even when they had turned off "Location History" in their account settings. The data was being stored in a separate system called "Web & App Activity," which was enabled by default. Google updated its disclosures after the story, but the underlying practice — collecting data through systems users don't know are running — had been operating for years. In 2020, Google paid $391 million to settle a class action lawsuit in 40 US states over the practice.

In 2019, a report by researchers at Oxford found that the average website contains 7 tracking technologies — third-party scripts that report your behavior back to advertising networks. If you visited 20 websites in a day, your behavior across those sites was likely reported to dozens of separate companies, each adding it to a profile that they sell to advertisers, data brokers, and in some cases, government agencies. None of these sites asked you specifically if this was acceptable. Most mentioned it somewhere in a privacy policy that nobody reads.

For a 12-year-old who has been online since age 7, this means there are likely profiles of your interests, behavioral patterns, emotional responses to content, and social connections held by companies you have never heard of, built over five-plus years, that you have never consented to and cannot easily access or delete.

Ethical Question — No Clean Answer

Most free online services are free because they sell your data or your attention to advertisers. If you use Google Search, Gmail, YouTube, TikTok, or Instagram for free, the business model involves your data. Is this a fair trade? You get access to powerful tools; they get data about you. Some people argue this is a reasonable exchange — you can always pay for alternatives. Others argue that the exchange isn't transparent, that young users can't meaningfully consent, and that the power imbalance between a teenager and a trillion-dollar company makes "consent" meaningless. Where do you land?

Section 3 — What the Rules Currently Say (and Don't)

Privacy law in the US is fragmented, inconsistent, and significantly behind the technology it's supposed to govern. Here's the honest picture.

The most relevant federal law for young users is the Children's Online Privacy Protection Act (COPPA), passed in 1998. It requires websites to obtain parental consent before collecting data from children under 13. This is why you must be 13 to sign up for most social platforms. It also means that if you were under 13 when you signed up — using a fake birthdate, as millions of children do — the platform may claim it had no legal obligation to protect your data, because you technically lied about your age.

Beyond COPPA, federal privacy protections in the US are sparse. The US does not have a comprehensive national privacy law — unlike the EU, which has the General Data Protection Regulation (GDPR), in force since 2018. The GDPR gives EU residents specific rights: to access their data, correct it, delete it, and object to certain uses. California has the California Consumer Privacy Act (CCPA), in force since 2020, which provides similar rights to California residents. Several other states have followed. But if you live in a state without specific privacy legislation, you have far fewer formal rights over your data than someone in Germany or France.

The EU's AI Act, passed by the European Parliament in March 2024, goes further — banning real-time facial recognition in public spaces for most purposes, prohibiting AI systems that exploit psychological vulnerabilities, and requiring transparency disclosures when AI is used to make consequential decisions about people. These regulations do not apply to US companies operating in the US — but they affect US companies operating in Europe, which is most of them.

You Can Now See What Most People Miss

The gap between US and EU privacy law is not abstract — it affects what companies can do with your data right now, depending on where you are. Knowing that this gap exists, that it's the subject of ongoing legislative debate, and that the rules are actively changing means you're reading every story about "AI and privacy" with a frame that most adults don't have. The EU AI Act is not just a European story — it is reshaping how every major tech company builds its products globally.

Section 4 — Agency in a World Where the Rules Are Still Being Written

The most important thing to understand at the end of this course is that you are not a passive subject of these systems. You are a person making choices inside them — choices with real consequences, but also real possibilities.

The practical moves: use privacy-protecting browsers and search engines (Firefox with uBlock Origin, Brave, DuckDuckGo) where you have a choice. Review app permissions on your phone — many apps request access to your microphone, location, and contacts far beyond what their function requires. Regularly search your own name to know what's out there. Use two-factor authentication that doesn't rely solely on SMS. Know what COPPA and CCPA entitle you to. If you're in California, you can formally request that data brokers delete your data — and some advocacy groups run tools to help with that process automatically.

The bigger moves: the people writing the rules that govern how AI handles your data are — right now — regulators, lobbyists, technologists, and academics. Almost none of them are teenagers. The EU's GDPR was shaped significantly by a 14-year-old Austrian student named Max Schrems who filed a complaint against Facebook in 2011 when he was in college. He did it because he was curious and persistent, not because he had special access. That complaint eventually led to the invalidation of the EU-US Privacy Shield agreement — one of the most significant privacy law outcomes of the decade. You are not too young to have an opinion about these rules, or to make noise about them.

You now understand something consequential: the mechanics of virality, the permanence of digital memory, the gap between your profile and your person, and the contested landscape of privacy law. That understanding doesn't give you control over every system. But it means you are navigating these systems with your eyes open — and that is not a small thing.

Lesson 4 Quiz

Privacy, surveillance, and what the rules actually say.

1. Clearview AI's database contained billions of photos scraped from public social media. Under Helen Nissenbaum's concept of "contextual integrity," why does this constitute a privacy violation even though the photos were technically public?

Correct. Contextual integrity is about appropriate information flow — not secrecy. The photos being public doesn't mean they're appropriate for every possible use. Moving them to a fundamentally different context (facial recognition for law enforcement) violates the original sharing norms.

Review Section 1 and the definition of contextual integrity. The violation isn't about how many people see the photos — it's about the context in which they're now being used.

2. In 2018, the Associated Press found that Google was recording user locations even when "Location History" was disabled. What is the most accurate name for this kind of data collection practice?

Correct. This is passive footprint collection — the data was generated automatically by behavior, through a system users didn't know was active, without any deliberate act of sharing on the user's part.

Think about the distinction between active and passive footprints from Lesson 2. The key here is that users didn't know the collection was happening — it wasn't something they chose to do.

3. COPPA (Children's Online Privacy Protection Act) requires websites to get parental consent before collecting data from users under 13. A 12-year-old signs up for Instagram using a fake birthdate saying they're 16. What does this mean for their data rights under COPPA?

Correct. This is one of the real problems with COPPA in practice — the law places the responsibility for age verification on platforms, but platforms argue that when users lie, the obligation shifts. This is an ongoing legal and policy debate.

Review Section 3. The lesson specifically addresses this scenario — what happens to your legal protections when you use a fake birthdate to sign up for a platform.

4. The EU's AI Act, passed in March 2024, bans real-time facial recognition in public spaces for most purposes. An American student visiting London uses a facial recognition app on their phone in a public square. Which of the following is most accurate about the legal situation?

Correct. The EU AI Act, like the GDPR before it, applies based on where activity occurs — not where the company or user is from. This is why US tech companies must comply with EU rules when operating in Europe, regardless of their home country.

Review Section 3. EU regulations apply based on territory — not nationality. This is a critical principle that shapes how global tech companies operate.

5. Max Schrems filed a privacy complaint against Facebook in 2011 as a college student. That complaint eventually led to the invalidation of the EU-US Privacy Shield. What does this case most directly illustrate about the relationship between individual action and institutional privacy outcomes?

Correct. The Schrems case is used in Section 4 precisely because it shows that understanding the rules is itself a form of power — you don't need institutional access, just clarity about what the rules say and the persistence to act on that knowledge.

Review Section 4. The point isn't about legal design or cultural values — it's about what individual people with specific knowledge can actually accomplish.

Lab 4 — The Privacy Policy Designer

Your role: design a privacy policy for a fictional AI product — then defend it against hard questions.

Your Assignment

You have been hired to design the privacy policy for a new AI-powered app called "Pulse" — a journaling app that analyzes your entries and provides emotional pattern insights, suggesting when you might be stressed or anxious. Pulse is targeted at teenagers aged 13–17. It collects: journal text, emotional tone analysis results, usage timestamps, device location, and it shares anonymized aggregate data with university researchers.

Your analyst is a privacy rights advocate. They will push back on every policy choice you make. Your goal is not to make a perfect policy — it's to think through the real trade-offs and defend the choices you make with specific reasoning.

Start by picking ONE of these policy questions and giving your initial answer: (1) Should Pulse require parental consent for users aged 13–15, or just 13 and under? (2) Should journal content ever be shared with parents who request it? (3) What should happen to a user's data if they delete their account? Take a clear position and give your reasoning. The advocate will challenge it.

Advocate — Privacy Lab

Pick one of the three policy questions and give me your answer with real reasoning. I'm not looking for "it depends" — I want an actual position you can defend. I'll push hard on it.

Module Test — Who Are You Online With AI?

15 questions across all four lessons. 80% to pass.

1. On December 20, 2013, Justine Sacco's tweet went viral while she was unreachable on a flight. What was the primary mechanism that allowed this to happen so quickly?

Correct.

Review Lesson 1, Section 1 — the mechanics of virality and what types of content spread fastest.

2. "Context collapse" refers to:

Correct.

Review Lesson 1, Section 2 — danah boyd's work on context collapse and the audience problem.

3. Facebook's internal research (leaked by Frances Haugen in 2021) found that the 2018 algorithm change to prioritize "angry" reactions resulted in:

Correct.

Review Lesson 1, Section 3 on the Haugen revelations and AI amplification.

4. The 2014 EU ruling in the case of Mario Costeja González established:

Correct.

Review Lesson 2, Opening Scene and Section 2.

5. Which of the following is an example of a PASSIVE digital footprint?

Correct.

Review Lesson 2, Section 1. Passive footprint = data generated automatically about your behavior, not deliberately created by you.

6. The Internet Archive (Wayback Machine) is relevant to digital footprint management because:

Correct.

Review Lesson 2, Section 2.

7. In 2020, a 17-year-old used SIM-swapping to take over Twitter accounts of Obama, Biden, Gates, and others. SIM-swapping works by:

Correct.

Review Lesson 3, Section 1 and the key term definition for SIM-swapping.

8. The key distinction between "you" and "your digital profile" that the lesson draws is:

Correct.

Review Lesson 3, Section 2.

9. AI "hallucination" is defined as:

Correct.

Review Lesson 3, Section 3 on AI hallucination and the Kashmir Hill / Mark Walters cases.

10. Helen Nissenbaum's concept of "contextual integrity" argues that privacy is violated when:

Correct.

Review Lesson 4, Section 1 and the key term definition for contextual integrity.

11. Clearview AI's facial recognition database was built by scraping photos from public social media. Under US law at the time, this was:

Correct.

Review Lesson 4, Opening Scene. The Clearview case is significant precisely because its legal status was contested and varied by jurisdiction.

12. A student uses a fake birthdate to sign up for a social media platform at age 12, stating they are 16. Under COPPA, what is the most accurate description of their data rights?

Correct.

Review Lesson 4, Section 3 on COPPA and the age verification gap.

13. A 14-year-old posts a sarcastic comment to a school Discord server with 40 members. A member screenshots it and posts it publicly with no context. It reaches 50,000 people who interpret it literally and angrily. Which TWO concepts from this module best explain the full situation? Choose the answer that names both.

Correct. This scenario combines context collapse (intended audience of 40 vs. actual audience of 50,000 who lack interpretive context) with the moral outrage dynamics that drive rapid amplification.

Think about which two concepts from Lessons 1 and 2 directly explain both the audience problem and the spread mechanism together.

14. Amazon's AI hiring tool was scrapped in 2018 after it was found to systematically penalize female candidates. The lesson uses this case to illustrate:

Correct.

Review Lesson 3, Section 2. The Amazon case is about how AI reproduces historical bias — not about deliberate discrimination or general AI unreliability.

15. Max Schrems filed a complaint against Facebook in 2011 that eventually led to the invalidation of the EU-US Privacy Shield. The lesson uses this case to argue:

Correct. The Schrems case closes the module on a specific, real example of individual knowledge translating into institutional change — the core message of Section 4.

Review Lesson 4, Section 4. The lesson ends with this case for a specific reason — to argue that understanding the rules is itself a form of power accessible to anyone.