Module 3 · Lesson 1

The Data You Didn't Know You Were Giving

Every click, pause, and scroll is a data point. You opted in without reading the fine print.

What does your digital behavior reveal about you — and who is collecting it?

In 2012, the New York Times reported that Target Corporation had assigned every customer a "pregnancy prediction score." The retailer analyzed purchasing patterns — unscented lotion, certain vitamins, cotton balls — to identify shoppers who were likely pregnant, often before they had told family members. One man complained to a Target store manager after his teenage daughter received coupons for baby items. She was, in fact, pregnant. Target had known before her father did.

Passive Data vs. Active Data

Most people understand that they share data when they fill out a form, create an account, or make a purchase. This is active data — you consciously provide it. What most people do not understand is that the majority of data collected about them is passive: generated automatically as a byproduct of using digital services.

Passive data includes how long your mouse hovers over an image before you scroll past it, the exact time of day you unlock your phone, how fast you type, which words you delete before sending a message, and the precise GPS coordinates of every location your phone visits — even when you are not actively using any app.

Documented Scale

Google's Location History feature, enabled by default on Android devices until 2019, recorded and stored GPS coordinates every few minutes. In 2018, the Associated Press documented that Google continued collecting precise location data even when users explicitly turned "Location History" off — through a separate system called "Web & App Activity." This affected approximately 2 billion Android users and hundreds of millions of iPhone users with Google apps installed.

The Five Categories of Passive Data

Researchers and privacy advocates have identified five broad categories of passive data that are routinely harvested:

Behavioral

Click patterns, scroll depth, dwell time

Reveals interests, attention span, emotional reactions

Locational

GPS, Wi-Fi triangulation, cell tower pings

Reveals home, work, religion, health, relationships

Temporal

Time of activity, sleep patterns, routine breaks

Reveals work schedule, health habits, life events

Social Graph

Who you contact, how often, response time

Reveals relationship strength, influence, mental state

Device & Network

Hardware IDs, IP address, browser fingerprint

Enables re-identification across platforms

The Inference Gap

The raw data points are not the real concern. The concern is inference — what AI systems can deduce from those data points that you never explicitly shared. Stanford researcher Michal Kosinski published research in 2013 demonstrating that Facebook "likes" alone could predict, with statistical accuracy, a person's IQ, sexual orientation, political affiliation, religion, and whether their parents divorced before they turned 21. Participants had not disclosed any of this information.

This inference gap — the distance between what you consciously share and what AI can calculate — is the central privacy challenge of our era. It means that traditional privacy protections, which focus on what you disclose, offer far less protection than most people assume.

Key Concept

The digital footprint is not just what you post or submit. It is the full record of every digital interaction, including behavior you consider private or ephemeral, combined with AI's ability to extract meaning from patterns you would never recognize as revealing.

Inference Gap The difference between what a person actively discloses and what AI systems can calculate or deduce from passive behavioral data.

Passive Data Data generated automatically as a byproduct of using digital services, without the user consciously providing it.

Re-identification The process of matching anonymized data to a specific individual using auxiliary information, device identifiers, or behavioral fingerprints.

Lesson 1 Quiz

The Data You Didn't Know You Were Giving · 4 questions

What does the Target pregnancy prediction case primarily illustrate?

Correct. Target inferred pregnancy from purchasing patterns — unscented lotion, cotton balls, certain vitamins — without customers disclosing any health information. This is a textbook example of behavioral inference.

Not quite. The case shows that AI systems can infer highly sensitive personal facts from mundane behavioral patterns that users never intended as disclosures.

According to the 2018 AP investigation, what did Google collect even when users turned "Location History" off?

Correct. The AP found that disabling "Location History" did not stop location collection — a separate system continued recording precise coordinates, affecting billions of users who believed they had opted out.

The AP investigation found Google continued collecting precise GPS data through "Web & App Activity" even after users explicitly disabled Location History — a separate toggle most users didn't know existed.

In Kosinski's 2013 Stanford research, what could Facebook "likes" alone predict with statistical accuracy?

Correct. Kosinski's research demonstrated that seemingly innocuous "likes" on public pages could be used to predict a remarkable range of sensitive personal attributes that users had never explicitly shared.

Kosinski's research was far more expansive — Facebook likes predicted IQ, sexual orientation, political views, religion, and childhood family structure, none of which participants had disclosed.

Which of the following best defines the "inference gap"?

Correct. The inference gap captures why traditional privacy protection — focused on what you disclose — is insufficient in an era where AI can deduce sensitive facts from data you never considered personal.

The inference gap is specifically about what AI systems can calculate versus what users consciously share — the reason traditional disclosure-based privacy frameworks are inadequate.

Lab 1: Mapping Your Passive Data Exposure

Interactive AI discussion · Complete 3 exchanges to finish

What You'll Explore

In this lab you'll have a guided conversation about the passive data trails you generate daily. The AI assistant will help you audit which apps and services are likely collecting which categories of passive data about you.

Start by telling the assistant: which three apps do you use most every day? It will help you map what each one likely knows about you beyond what you've explicitly shared.

AI Lab Assistant

Passive Data Audit

Welcome to Lab 1. I'm here to help you map your passive data footprint — the data generated automatically as you use apps and services, without you actively providing it.

To get started: what are the three apps or digital services you use most on a typical day? I'll walk through what each one is likely collecting about you beyond the obvious.

Module 3 · Lesson 2

The Data Broker Shadow Economy

Thousands of companies are selling detailed profiles of you. You've never heard of most of them.

Who are the real buyers and sellers of your personal data — and what exactly are they trading?

In 2023, the Federal Trade Commission sued data broker Kochava Inc. for selling precise geolocation data that could be used to track individuals to sensitive locations including reproductive health clinics, addiction recovery centers, and places of worship. The data was not anonymized in any meaningful way — it allowed anyone who purchased it to identify specific individuals and follow their movements. Kochava's data came from ordinary smartphone apps that users had installed for unrelated purposes.

The Data Broker Industry

The data broker industry is estimated by some researchers to generate over $200 billion annually. The core business model: collect data from hundreds of sources, aggregate it into individual profiles, and sell access to those profiles. The industry operates almost entirely outside the awareness of the people whose data is being sold.

The major brokers — Acxiom, LexisNexis Risk Solutions, Experian, Equifax, Oracle Data Cloud, and hundreds of smaller operators — maintain profiles on virtually every American adult. Acxiom has publicly stated it holds data on approximately 2.5 billion people globally. These profiles can contain thousands of data points per individual.

Where the Data Comes From

Data brokers aggregate from an enormous range of sources, most of which individuals interact with daily without awareness:

Primary Data Source Categories for Brokers

📱

Mobile app SDKs (location, behavior)

92%

🛒

Retail loyalty programs & purchase records

85%

🌐

Website tracking pixels & cookies

80%

📋

Public records (property, court, voter)

75%

💳

Financial transaction data

68%

📡

Telecom & ISP usage metadata

55%

The Buyer Side: Who Purchases These Profiles

The buyers are as diverse as the sources. Insurance companies use broker data to inform risk scoring. Employers use it in background checks. Political campaigns use it for voter targeting. Banks use it for credit decisions. Landlords use it to screen tenants. Law enforcement agencies in the United States have used broker data to bypass the warrant requirements that would apply to direct surveillance — a practice confirmed by a 2023 Senate investigation led by Senator Ron Wyden.

Senate Investigation Finding · 2023

A Senate investigation released in 2023 found that the Defense Intelligence Agency, IRS, CBP, and other federal agencies had purchased location data from commercial data brokers without obtaining warrants — effectively circumventing Fourth Amendment protections by purchasing what they could not legally compel. The report named specific agencies and the broker companies they contracted with.

The Anonymization Myth

Data brokers routinely claim their data is "anonymized" or "de-identified." Research has consistently shown this claim is largely false when applied to location and behavioral data. A landmark 2013 study in Nature by de Montjoye et al. found that just four spatiotemporal data points — location at four different times — were sufficient to uniquely identify 95% of individuals in a dataset of 1.5 million people, even after anonymization. The uniqueness of human movement patterns makes true anonymization of location data practically impossible at the individual level.

Data Broker A company that collects personal data from multiple sources, aggregates it into profiles, and sells access to those profiles to third parties — without direct relationships with the individuals profiled.

SDK (Software Development Kit) Code libraries embedded in mobile apps that collect and transmit user data — often location and behavioral data — to third-party companies as a revenue mechanism for app developers.

De-identification The removal of directly identifying information (name, SSN) from a dataset. Research shows this provides limited protection when behavioral or location data remains, as individuals can usually be re-identified from patterns alone.

Lesson 2 Quiz

The Data Broker Shadow Economy · 4 questions

What did the FTC allege in its 2023 lawsuit against Kochava?

Correct. The FTC alleged Kochava sold location data precise enough to track individuals to reproductive health clinics, addiction recovery centers, and places of worship — collected from ordinary smartphone apps.

The FTC sued Kochava for selling precise location data that enabled identification and tracking to sensitive locations — the data came from ordinary apps users had installed for unrelated purposes.

According to the 2013 Nature study by de Montjoye et al., how many location data points are sufficient to uniquely identify 95% of individuals in a large dataset?

Correct. The de Montjoye study demonstrated that just four location-time pairs were sufficient to uniquely re-identify 95% of individuals — illustrating why location data anonymization is largely ineffective.

The striking finding was that just four location-time data points could re-identify 95% of individuals. Human movement patterns are unique enough that minimal data is sufficient for identification.

What did the 2023 Senate investigation reveal about federal agency use of data broker data?

Correct. The Senate investigation named specific agencies — including the Defense Intelligence Agency and IRS — that purchased commercial location data to bypass warrant requirements that would apply to direct surveillance.

The Senate investigation found multiple agencies including the DIA and IRS purchased commercial location data without warrants — a legal workaround to Fourth Amendment protections that apply to direct government surveillance.

What is the primary mechanism by which data brokers collect location data from mobile apps?

Correct. SDKs are code libraries embedded in apps — often weather, flashlight, or game apps — that collect and transmit location and behavioral data to broker companies as a revenue source for developers who offer their apps for free.

The main mechanism is SDKs — code embedded in ordinary apps that silently collects and transmits location data to data brokers. This is how an app about weather can generate location data sold to brokers.

Lab 2: The Data Broker Profile

Interactive AI discussion · Complete 3 exchanges to finish

What You'll Explore

Data brokers compile profiles with thousands of data points. In this lab you'll explore what a broker profile might realistically contain, where each element comes from, and how to begin reducing your exposure through opt-out processes.

Ask the assistant: "What would a data broker profile about me likely contain, and where would each piece of information come from?" Then follow up with questions about opt-out rights or specific data categories that concern you.

AI Lab Assistant

Data Broker Profiles

Welcome to Lab 2. I'm your guide on the data broker ecosystem — the largely invisible industry that profiles virtually every adult in the US.

I can walk you through what a broker profile likely contains about you, where each data category originates, and what rights you have to access or delete that information. What aspect of data broker profiles would you like to explore first?

Module 3 · Lesson 3

Social Media's Hidden Data Architecture

The platform you see is the interface. The data architecture underneath is the product.

What does a social platform actually collect — and how does it use data from people who never signed up?

During the 2018 Cambridge Analytica congressional hearings, Senator Orrin Hatch asked Mark Zuckerberg how Facebook sustains a business model where users pay nothing. Zuckerberg paused and replied: "Senator, we run ads." The exchange became iconic — but it obscured the more important answer. Facebook does not merely run ads. It builds psychographic profiles, infers emotional states from post timing and word choice, tracks users across the web via embedded Like and Share buttons, and maintains shadow profiles on people who have never created a Facebook account.

What Social Platforms Actually Collect

The visible layer of social media data — posts, photos, likes, follows — is only a fraction of what platforms collect. The larger and more valuable dataset consists of behavioral signals generated during use:

Dwell time: Exactly how long your eyes rested on each piece of content before scrolling. Facebook has filed patents on using this as a sentiment signal.
Draft messages: In 2013, Facebook acknowledged studying text typed and then deleted before sending — including messages that were never posted.
Device signals: Battery level, nearby Bluetooth devices, Wi-Fi networks, and whether the phone is being charged (suggesting home vs. away).
Cross-site tracking: Facebook Pixel, embedded on millions of third-party websites, reports back every page visit to Facebook — including medical symptom checkers, legal information sites, and financial services.

Documented Case · The Pixel & Health Data

In 2022, The Markup investigated hospital websites and found that 33 of the top 100 US hospitals had installed the Facebook Pixel on patient-facing web pages. The Pixel transmitted data to Facebook including URL strings that revealed specific conditions patients were searching for — such as "My Scheduled Medications" or "Depression Bipolar Support" — along with identifying information. This occurred without patient knowledge or HIPAA authorization.

Shadow Profiles: Tracking Non-Users

Perhaps the least understood element of social platform data collection is the practice of building profiles on people who have never created an account. Facebook has admitted to this practice — internally called "shadow profiles" — in European regulatory proceedings. The mechanism: when Facebook users upload their phone contacts, Facebook receives the phone numbers and email addresses of non-users. When those non-users visit any website with a Facebook Pixel, Facebook can link the visit to the shadow profile via device identifiers, building behavioral histories for people who have explicitly chosen not to use the platform.

In 2019, the Irish Data Protection Commission opened an investigation into Facebook's shadow profile practices under GDPR. Facebook ultimately paid a €265 million fine related to data exposure practices in 2022.

The Emotional Contagion Experiment

In 2014, Facebook published research in the Proceedings of the National Academy of Sciences revealing that it had conducted a study on 689,003 users without their knowledge. Researchers had altered the emotional content of users' News Feeds — showing some users more positive posts, others more negative — to test whether emotional states could be transferred through social media. The study confirmed they could. The controversy was not only ethical; it revealed that Facebook's algorithmic systems were powerful enough to measurably alter users' emotional states, and that the company was willing to do so without notification.

2013

Draft message study: Facebook acknowledges analyzing text typed but not sent, studying what users self-censor.

2014

Emotional contagion: Facebook publishes study confirming it manipulated 689,003 users' feeds to alter emotional states without consent.

2018

Cambridge Analytica: Up to 87 million users' data harvested via third-party quiz app and sold to political consulting firm. $5B FTC fine.

2022

Hospital Pixel scandal: The Markup finds Facebook Pixel on major hospital websites, transmitting patient health information without authorization.

2023

TikTok Congressional hearing: ByteDance engineers confirmed accessing US user data from China; TikTok spent $1.5B on "Project Texas" to isolate US data.

Shadow Profile A data profile compiled by a platform about a person who has not created an account, built from contact uploads by existing users, pixel tracking, and device identifiers.

Tracking Pixel A 1×1 invisible image or JavaScript snippet embedded on third-party websites that reports user behavior — including page visits and form entries — back to the platform that operates it.

Lesson 3 Quiz

Social Media's Hidden Data Architecture · 4 questions

What did The Markup's 2022 investigation find about hospital websites and Facebook Pixel?

Correct. The Markup found that pixels on hospital pages transmitted URL strings revealing specific health conditions patients were researching — such as depression support pages — along with identifying device data, to Facebook.

The Markup found that 33 major hospital systems had Facebook Pixel on patient-facing pages, and the pixel transmitted specific health-related URL strings and device identifiers to Facebook without HIPAA authorization.

What was the Facebook "emotional contagion" study, and why was it controversial?

Correct. The 2014 study revealed both that Facebook could measurably alter users' emotional states through feed manipulation, and that the company was willing to conduct such experiments without notifying participants.

The emotional contagion study covertly manipulated 689,003 users' feeds, showing more positive or negative content, and confirmed emotional states were transferable through social media — without user knowledge or consent.

How does Facebook build "shadow profiles" on people who have never created a Facebook account?

Correct. When Facebook users upload contacts, Facebook receives non-users' phone numbers and emails. When those non-users visit Pixel-equipped websites, Facebook links the visit to the shadow profile — building behavioral histories for people who never signed up.

Shadow profiles are built from contact uploads (giving Facebook non-user phone numbers/emails) combined with Pixel tracking on third-party sites — linked via device identifiers — building profiles on people who have actively chosen not to have an account.

Which data category did Facebook specifically acknowledge studying in 2013 — revealing how deeply behavioral monitoring extends?

Correct. Facebook researchers in 2013 acknowledged studying "self-censorship" — text typed into the status box and then deleted without posting. This reveals that even digital behavior you abandon generates data.

In 2013, Facebook acknowledged analyzing draft messages — text users typed and then deleted. Even thoughts you decide not to share become data points, illustrating the full depth of behavioral monitoring.

Lab 3: Social Platform Data Audit

Interactive AI discussion · Complete 3 exchanges to finish

What You'll Explore

Social platforms provide "Download Your Data" tools that reveal only a fraction of what they actually hold. In this lab you'll explore the gap between what platforms show you and what they actually know — and examine specific tracking mechanisms in the platforms you use.

Tell the assistant which social platforms you use, then ask: "What is the platform probably inferring about me that I haven't explicitly shared?" Explore shadow profiles, tracking pixels, or any aspect of platform data collection that interests you.

AI Lab Assistant

Social Platform Tracking

Welcome to Lab 3. Social media platforms are extraordinarily sophisticated data collection systems — and the data you can download about yourself represents only a small window into what they actually hold.

I can walk you through the hidden data architecture of specific platforms — what they infer from behavioral signals, how tracking pixels extend their reach across the web, and how the "shadow profile" phenomenon works in practice. Which platform would you like to start with?

Module 3 · Lesson 4

AI's Ability to Reconstruct You From Fragments

You don't have to share everything. AI only needs enough pieces to reconstruct the rest.

How do modern AI systems synthesize fragmented data into complete behavioral profiles — and what can they infer that you would never volunteer?

In 2021, researchers at MIT and Harvard published findings that smartphone accelerometer data alone — the sensor that detects phone movement — could be used to identify individuals with high accuracy. The accelerometer requires no permission to access on most phones. It cannot hear you, see you, or track your location. Yet the subtle, unique way each person holds and moves a phone creates a fingerprint distinctive enough to re-identify them. The data no one thought to protect was enough.

The Mosaic Effect

Intelligence agencies have long understood a principle called the mosaic effect: individual pieces of information that appear innocuous can be combined to reveal sensitive conclusions that none of the pieces would suggest alone. This principle now governs commercial AI profiling. A single data point — your ZIP code, your search query, your music listening time at 2am — reveals little. Three hundred data points, aggregated across sources and processed by machine learning models trained on millions of similar profiles, can predict your health status, financial distress, relationship stability, and likelihood of making specific decisions.

Documented Case · Spotify & Mental State Inference

Spotify filed a patent in 2021 describing technology to analyze users' speech patterns, ambient sounds, and emotional tone of voice to infer their "emotional state, mental health state, or well-being." The system would use microphone access during voice searches to classify users and adjust recommendations. Spotify confirmed the patent but said it was not currently deployed. The significance: the technical capability to infer mental health state from ambient audio was considered routine enough to patent.

What AI Can Infer From Each Data Type

Research from multiple institutions has documented what modern machine learning models can infer from specific data categories that users typically consider innocuous:

Typing Speed & Patterns

Cognitive load, emotional state, possible neurological conditions

Keystrokes as health indicators — documented in Parkinson's research (MIT, 2021)

Sleep Schedule (from phone use)

Depression risk, relationship stress, work instability

Irregular late-night phone use correlates with mental health outcomes

Music Listening Patterns

Personality (Big Five), mood state, cultural identity

Spotify's own research confirmed personality prediction from playlists

Scroll Speed & Pause Points

Attention span, emotional triggers, topic sensitivity

Platforms use this to refine emotional engagement models

Purchase Timing & Category Mix

Life events (divorce, illness, job loss), stress level

The Target pregnancy prediction model applied more broadly

Search Query Timing

Mental health crises, major decisions, relationship status

3am health searches have very different profiles than daytime searches

The Aggregation Problem in Practice

The 2018 Cambridge Analytica scandal illustrated what aggregated behavioral profiling can produce at scale. By combining Facebook behavioral data with psychographic survey results from 270,000 users, Cambridge Analytica claimed to have built profiles on up to 87 million Americans. The profiles were used to identify persuadable voters and serve them targeted content designed to shift political behavior. The models relied not on what users said they believed, but on behavioral signals — which pages they paused on, which friends' posts they engaged with — to classify them into psychographic segments.

In 2018, Federal Election Commission filings and a UK Information Commissioner's Office investigation confirmed the data practices. The ICO issued the maximum fine then available under UK law. Facebook paid a $5 billion FTC fine — the largest privacy fine in US history at the time.

The Core Insight of This Module

Your digital footprint is not primarily what you post or submit. It is the accumulated record of behavioral signals — movement, timing, hesitation, attention — that modern AI systems can synthesize into predictions and inferences far more revealing than anything you would consciously share. The footprint cannot be eliminated through careful posting. It requires structural understanding of where data is generated and who has access to what.

Three Things That Reduce Exposure

Privacy research converges on three structural interventions that meaningfully reduce digital footprint exposure — not to zero, but to a level where inference becomes less precise:

1. App permission audits. Regularly review which apps have access to location, microphone, contacts, and motion sensors. Revoke any permission that is not actively required for the app's core function. In 2021, the iOS App Tracking Transparency framework required apps to request permission before cross-app tracking — and 96% of US users denied it when asked, suggesting that when given a genuine choice, most people prefer not to be tracked.

2. Browser fingerprint reduction. Standard incognito mode does not prevent fingerprinting. Firefox with enhanced tracking protection, or the Tor Browser for sensitive searches, reduces the amount of behavioral data available for cross-site correlation.

3. Data broker opt-outs. While cumbersome, submitting opt-out requests to major brokers (Acxiom, Spokeo, WhitePages, Intelius, BeenVerified) removes the most accessible aggregated profiles. Services like Privacy.com and DeleteMe automate this at scale.

Mosaic Effect The phenomenon whereby individually innocuous data points, when combined, reveal sensitive conclusions that none of the pieces would suggest in isolation.

Psychographic Profiling The use of behavioral and attitudinal data to classify individuals by personality type, values, and psychological traits — distinct from demographic profiling.

Browser Fingerprinting A tracking technique that identifies users by the unique combination of browser settings, installed fonts, screen resolution, and hardware characteristics — without using cookies.

Lesson 4 Quiz

AI's Ability to Reconstruct You From Fragments · 4 questions

What did MIT/Harvard researchers find smartphone accelerometer data could do in 2021?

Correct. The accelerometer — which requires no special permission — captures each person's unique way of holding and moving a phone, creating a behavioral fingerprint sufficient for re-identification.

MIT/Harvard researchers found accelerometer data — the motion sensor that needs no permission — could re-identify individuals from their unique movement patterns, illustrating how seemingly irrelevant data becomes a tracking vector.

What does the "mosaic effect" mean in the context of AI profiling?

Correct. The mosaic effect is why limiting what you post is insufficient protection — AI systems synthesize fragments from many sources, each harmless alone, into profiles that are collectively very revealing.

The mosaic effect refers to the aggregation of individually innocuous data points into sensitive conclusions — the reason that "I have nothing to hide" reasoning misunderstands how modern data profiling works.

What happened when Apple's App Tracking Transparency framework required apps to request explicit permission for cross-app tracking in 2021?

Correct. The 96% opt-out rate demonstrated that most users prefer not to be tracked when given a genuine, clearly presented choice — suggesting that pervasive tracking persists primarily because the choice is buried, not because users approve of it.

96% of US users denied tracking permission when ATT required apps to ask explicitly. This demonstrates that widespread tracking exists not because users consent enthusiastically, but because meaningful choice was previously absent.

What did the Cambridge Analytica case specifically demonstrate about behavioral profiling?

Correct. Cambridge Analytica used behavioral signals — pause patterns, friend engagement, content reactions — rather than stated beliefs, to classify users for targeting. The $5B FTC fine and ICO investigation confirmed the scale and practice.

Cambridge Analytica used behavioral Facebook data to profile up to 87 million Americans psychographically — demonstrating that behavioral signals, not stated beliefs, are the real input for modern influence operations.

Lab 4: Mosaic Analysis — Your Data Fragments

Interactive AI discussion · Complete 3 exchanges to finish

What You'll Explore

In this final lab, you'll apply the mosaic effect to your own digital footprint. By listing apparently innocuous data points about your digital behavior, you'll explore what an AI system could infer from combining them — and what that means for how you think about your data.

Start by describing three or four mundane aspects of your digital routine — when you typically use your phone, what apps you use most, roughly what you search for, what music you listen to. Ask the assistant what could be inferred from combining those signals. Then explore what you can do about it.

AI Lab Assistant

Mosaic Effect Analysis

Welcome to the final lab of Module 3. We're going to apply the mosaic effect — the idea that individually innocuous data fragments combine to reveal sensitive conclusions — directly to your digital routine.

Tell me three or four mundane things about your digital behavior: roughly when you use your phone, what apps are on your home screen, what kinds of things you search for, what music you play. I'll walk through what an AI profiling system could realistically infer from combining those signals — and what it would mean for your privacy.

Module 3 Test

Your Digital Footprint Is Bigger Than You Know · 15 questions · Pass at 80%

1. What is "passive data" in the context of digital footprints?

Correct. Passive data is automatically generated by digital behavior — location pings, scroll patterns, typing speed — without users actively providing it.

Passive data is generated automatically as a byproduct of using services — click timing, scroll behavior, location pings — without any conscious user decision to share it.

2. In the Target pregnancy prediction case, what type of data was analyzed to infer pregnancy?

Correct. Target's model identified pregnancy-correlated purchase patterns in mundane items — not medical data — demonstrating how behavioral data enables sensitive health inference.

Target inferred pregnancy from retail purchase patterns — unscented lotion, cotton balls, prenatal vitamins — not medical data. This illustrates the inference gap between what was shared and what was concluded.

3. The 2013 Nature study by de Montjoye et al. showed that location data could re-identify 95% of individuals using how many data points?

Correct. Just four location-time pairs were sufficient to uniquely identify 95% of individuals in a 1.5 million person dataset, demonstrating that location data anonymization is practically ineffective.

The de Montjoye study found just four spatiotemporal data points were sufficient — showing that human movement patterns are unique enough that minimal data enables re-identification despite "anonymization."

4. Which federal agencies did the 2023 Senate investigation identify as purchasing commercial location data without warrants?

Correct. The Senate investigation named the DIA, IRS, CBP and others as purchasing commercial location data — using commercial broker purchases to bypass the warrant requirements that apply to direct government surveillance.

The Senate investigation named the Defense Intelligence Agency, IRS, CBP, and other agencies as purchasing commercial location data — effectively using broker purchases as a workaround to Fourth Amendment warrant requirements.

5. What is a "shadow profile" on Facebook?

Correct. Shadow profiles are built from contact uploads by existing users (giving Facebook non-users' phone numbers) and Pixel tracking across third-party sites, linked via device identifiers — creating behavioral histories for non-account holders.

Shadow profiles are compiled on people who never signed up — built from contact data uploaded by their connections and Pixel tracking when they visit other websites. Facebook admitted this in European regulatory proceedings.

6. What did the FTC allege in its suit against data broker Kochava?

Correct. The FTC alleged Kochava sold location data precise enough to identify individuals visiting reproductive health clinics, addiction recovery centers, and religious institutions — collected via ordinary smartphone apps.

The FTC sued Kochava for selling location data enabling tracking to sensitive locations — health clinics, recovery centers, places of worship — with the data sourced from ordinary apps users installed for unrelated purposes.

7. The Facebook "emotional contagion" study conducted in 2014 revealed which capability?

Correct. The study covertly manipulated 689,003 users' News Feeds and confirmed that emotional states transferred through content — demonstrating both the capability and Facebook's willingness to use it without notification.

The emotional contagion study showed that Facebook's algorithms could measurably shift users' emotional states by altering feed content — conducted covertly on 689,003 users without their knowledge or consent.

8. What is the significance of the "inference gap" for privacy protection?

Correct. The inference gap undermines disclosure-based privacy protections — the legal and social assumption that privacy means controlling what you share. When AI can deduce what you haven't shared, that framework no longer provides meaningful protection.

The inference gap means traditional privacy protection — built on controlling what you disclose — is insufficient. AI can calculate sensitive conclusions from data points users never thought of as personal disclosures.

9. According to Michal Kosinski's 2013 Stanford research, what could Facebook "likes" alone predict?

Correct. Kosinski's research demonstrated that behavioral signals as simple as public page likes could predict a remarkably broad range of sensitive personal attributes — none of which participants had explicitly shared.

Kosinski found Facebook likes could predict IQ, sexual orientation, political views, religion, and childhood parental divorce — all from publicly visible like patterns, none explicitly shared by participants.

10. What is the primary mechanism data brokers use to collect location data from mobile users?

Correct. SDKs — code libraries embedded in apps like weather, flashlight, or games — silently collect and transmit location data to broker companies, funding free apps without user awareness of the data transaction.

SDKs embedded in ordinary apps are the primary collection mechanism — code that harvests location data in the background and transmits it to brokers, enabling developers to offer apps for free by monetizing user data.

11. What did The Markup's 2022 investigation find about Facebook Pixel on hospital websites?

Correct. 33 of the top 100 US hospitals had Facebook Pixel transmitting health-condition-revealing URL data and device identifiers from patient-facing pages to Facebook — without patient knowledge or HIPAA authorization.

The Markup found 33 major hospital systems with Facebook Pixel on patient-facing pages, transmitting URL strings that revealed specific health conditions patients were researching, along with identifying device data.

12. In the context of this module, what does "re-identification" mean?

Correct. Re-identification is the process of linking "anonymized" data back to a specific person — made possible by behavioral fingerprints, device characteristics, and movement patterns that are unique enough to serve as identifiers.

Re-identification is matching anonymized data to specific individuals — possible because behavioral and location patterns are unique enough to serve as fingerprints even after names and direct identifiers are removed.

13. What was the significance of Apple's App Tracking Transparency result in 2021?

Correct. The 96% opt-out rate demonstrated that widespread tracking had persisted not because users approved of it, but because they were never genuinely asked — the choice was buried in terms of service rather than presented clearly.

96% of US users declined tracking when ATT made the request explicit and clear. This shows that pervasive tracking exists not because of user consent, but because consent mechanisms were designed to obscure the choice.

14. What is "browser fingerprinting" and why does incognito mode fail to prevent it?

Correct. Incognito mode prevents cookie storage but does nothing to hide the unique combination of browser settings, fonts, screen resolution, and hardware characteristics that fingerprinting uses for identification. These characteristics are transmitted with every web request.

Incognito mode stops cookies from being stored locally — but fingerprinting doesn't use cookies. It identifies you from the unique combination of device and browser characteristics transmitted with every request, which incognito mode doesn't change.

15. The Cambridge Analytica case demonstrated that Facebook behavioral data could be used to do which of the following?

Correct. Cambridge Analytica used behavioral Facebook signals to psychographically profile up to 87 million Americans — the $5 billion FTC fine and UK ICO investigation confirmed the scale. Critically, the model used behavioral data, not self-reported political views.

Cambridge Analytica built psychographic profiles of up to 87 million Americans using behavioral Facebook data — not self-reported beliefs. The model used behavioral signals (pause patterns, engagement) to classify and target users for political influence.