In 2012, the New York Times reported that Target Corporation had assigned every customer a "pregnancy prediction score." The retailer analyzed purchasing patterns — unscented lotion, certain vitamins, cotton balls — to identify shoppers who were likely pregnant, often before they had told family members. One man complained to a Target store manager after his teenage daughter received coupons for baby items. She was, in fact, pregnant. Target had known before her father did.
Most people understand that they share data when they fill out a form, create an account, or make a purchase. This is active data — you consciously provide it. What most people do not understand is that the majority of data collected about them is passive: generated automatically as a byproduct of using digital services.
Passive data includes how long your mouse hovers over an image before you scroll past it, the exact time of day you unlock your phone, how fast you type, which words you delete before sending a message, and the precise GPS coordinates of every location your phone visits — even when you are not actively using any app.
Google's Location History feature, enabled by default on Android devices until 2019, recorded and stored GPS coordinates every few minutes. In 2018, the Associated Press documented that Google continued collecting precise location data even when users explicitly turned "Location History" off — through a separate system called "Web & App Activity." This affected approximately 2 billion Android users and hundreds of millions of iPhone users with Google apps installed.
Researchers and privacy advocates have identified five broad categories of passive data that are routinely harvested:
The raw data points are not the real concern. The concern is inference — what AI systems can deduce from those data points that you never explicitly shared. Stanford researcher Michal Kosinski published research in 2013 demonstrating that Facebook "likes" alone could predict, with statistical accuracy, a person's IQ, sexual orientation, political affiliation, religion, and whether their parents divorced before they turned 21. Participants had not disclosed any of this information.
This inference gap — the distance between what you consciously share and what AI can calculate — is the central privacy challenge of our era. It means that traditional privacy protections, which focus on what you disclose, offer far less protection than most people assume.
The digital footprint is not just what you post or submit. It is the full record of every digital interaction, including behavior you consider private or ephemeral, combined with AI's ability to extract meaning from patterns you would never recognize as revealing.
In this lab you'll have a guided conversation about the passive data trails you generate daily. The AI assistant will help you audit which apps and services are likely collecting which categories of passive data about you.
In 2023, the Federal Trade Commission sued data broker Kochava Inc. for selling precise geolocation data that could be used to track individuals to sensitive locations including reproductive health clinics, addiction recovery centers, and places of worship. The data was not anonymized in any meaningful way — it allowed anyone who purchased it to identify specific individuals and follow their movements. Kochava's data came from ordinary smartphone apps that users had installed for unrelated purposes.
The data broker industry is estimated by some researchers to generate over $200 billion annually. The core business model: collect data from hundreds of sources, aggregate it into individual profiles, and sell access to those profiles. The industry operates almost entirely outside the awareness of the people whose data is being sold.
The major brokers — Acxiom, LexisNexis Risk Solutions, Experian, Equifax, Oracle Data Cloud, and hundreds of smaller operators — maintain profiles on virtually every American adult. Acxiom has publicly stated it holds data on approximately 2.5 billion people globally. These profiles can contain thousands of data points per individual.
Data brokers aggregate from an enormous range of sources, most of which individuals interact with daily without awareness:
The buyers are as diverse as the sources. Insurance companies use broker data to inform risk scoring. Employers use it in background checks. Political campaigns use it for voter targeting. Banks use it for credit decisions. Landlords use it to screen tenants. Law enforcement agencies in the United States have used broker data to bypass the warrant requirements that would apply to direct surveillance — a practice confirmed by a 2023 Senate investigation led by Senator Ron Wyden.
A Senate investigation released in 2023 found that the Defense Intelligence Agency, IRS, CBP, and other federal agencies had purchased location data from commercial data brokers without obtaining warrants — effectively circumventing Fourth Amendment protections by purchasing what they could not legally compel. The report named specific agencies and the broker companies they contracted with.
Data brokers routinely claim their data is "anonymized" or "de-identified." Research has consistently shown this claim is largely false when applied to location and behavioral data. A landmark 2013 study in Nature by de Montjoye et al. found that just four spatiotemporal data points — location at four different times — were sufficient to uniquely identify 95% of individuals in a dataset of 1.5 million people, even after anonymization. The uniqueness of human movement patterns makes true anonymization of location data practically impossible at the individual level.
Data brokers compile profiles with thousands of data points. In this lab you'll explore what a broker profile might realistically contain, where each element comes from, and how to begin reducing your exposure through opt-out processes.
During the 2018 Cambridge Analytica congressional hearings, Senator Orrin Hatch asked Mark Zuckerberg how Facebook sustains a business model where users pay nothing. Zuckerberg paused and replied: "Senator, we run ads." The exchange became iconic — but it obscured the more important answer. Facebook does not merely run ads. It builds psychographic profiles, infers emotional states from post timing and word choice, tracks users across the web via embedded Like and Share buttons, and maintains shadow profiles on people who have never created a Facebook account.
The visible layer of social media data — posts, photos, likes, follows — is only a fraction of what platforms collect. The larger and more valuable dataset consists of behavioral signals generated during use:
In 2022, The Markup investigated hospital websites and found that 33 of the top 100 US hospitals had installed the Facebook Pixel on patient-facing web pages. The Pixel transmitted data to Facebook including URL strings that revealed specific conditions patients were searching for — such as "My Scheduled Medications" or "Depression Bipolar Support" — along with identifying information. This occurred without patient knowledge or HIPAA authorization.
Perhaps the least understood element of social platform data collection is the practice of building profiles on people who have never created an account. Facebook has admitted to this practice — internally called "shadow profiles" — in European regulatory proceedings. The mechanism: when Facebook users upload their phone contacts, Facebook receives the phone numbers and email addresses of non-users. When those non-users visit any website with a Facebook Pixel, Facebook can link the visit to the shadow profile via device identifiers, building behavioral histories for people who have explicitly chosen not to use the platform.
In 2019, the Irish Data Protection Commission opened an investigation into Facebook's shadow profile practices under GDPR. Facebook ultimately paid a €265 million fine related to data exposure practices in 2022.
In 2014, Facebook published research in the Proceedings of the National Academy of Sciences revealing that it had conducted a study on 689,003 users without their knowledge. Researchers had altered the emotional content of users' News Feeds — showing some users more positive posts, others more negative — to test whether emotional states could be transferred through social media. The study confirmed they could. The controversy was not only ethical; it revealed that Facebook's algorithmic systems were powerful enough to measurably alter users' emotional states, and that the company was willing to do so without notification.
Social platforms provide "Download Your Data" tools that reveal only a fraction of what they actually hold. In this lab you'll explore the gap between what platforms show you and what they actually know — and examine specific tracking mechanisms in the platforms you use.
In 2021, researchers at MIT and Harvard published findings that smartphone accelerometer data alone — the sensor that detects phone movement — could be used to identify individuals with high accuracy. The accelerometer requires no permission to access on most phones. It cannot hear you, see you, or track your location. Yet the subtle, unique way each person holds and moves a phone creates a fingerprint distinctive enough to re-identify them. The data no one thought to protect was enough.
Intelligence agencies have long understood a principle called the mosaic effect: individual pieces of information that appear innocuous can be combined to reveal sensitive conclusions that none of the pieces would suggest alone. This principle now governs commercial AI profiling. A single data point — your ZIP code, your search query, your music listening time at 2am — reveals little. Three hundred data points, aggregated across sources and processed by machine learning models trained on millions of similar profiles, can predict your health status, financial distress, relationship stability, and likelihood of making specific decisions.
Spotify filed a patent in 2021 describing technology to analyze users' speech patterns, ambient sounds, and emotional tone of voice to infer their "emotional state, mental health state, or well-being." The system would use microphone access during voice searches to classify users and adjust recommendations. Spotify confirmed the patent but said it was not currently deployed. The significance: the technical capability to infer mental health state from ambient audio was considered routine enough to patent.
Research from multiple institutions has documented what modern machine learning models can infer from specific data categories that users typically consider innocuous:
The 2018 Cambridge Analytica scandal illustrated what aggregated behavioral profiling can produce at scale. By combining Facebook behavioral data with psychographic survey results from 270,000 users, Cambridge Analytica claimed to have built profiles on up to 87 million Americans. The profiles were used to identify persuadable voters and serve them targeted content designed to shift political behavior. The models relied not on what users said they believed, but on behavioral signals — which pages they paused on, which friends' posts they engaged with — to classify them into psychographic segments.
In 2018, Federal Election Commission filings and a UK Information Commissioner's Office investigation confirmed the data practices. The ICO issued the maximum fine then available under UK law. Facebook paid a $5 billion FTC fine — the largest privacy fine in US history at the time.
Your digital footprint is not primarily what you post or submit. It is the accumulated record of behavioral signals — movement, timing, hesitation, attention — that modern AI systems can synthesize into predictions and inferences far more revealing than anything you would consciously share. The footprint cannot be eliminated through careful posting. It requires structural understanding of where data is generated and who has access to what.
Privacy research converges on three structural interventions that meaningfully reduce digital footprint exposure — not to zero, but to a level where inference becomes less precise:
1. App permission audits. Regularly review which apps have access to location, microphone, contacts, and motion sensors. Revoke any permission that is not actively required for the app's core function. In 2021, the iOS App Tracking Transparency framework required apps to request permission before cross-app tracking — and 96% of US users denied it when asked, suggesting that when given a genuine choice, most people prefer not to be tracked.
2. Browser fingerprint reduction. Standard incognito mode does not prevent fingerprinting. Firefox with enhanced tracking protection, or the Tor Browser for sensitive searches, reduces the amount of behavioral data available for cross-site correlation.
3. Data broker opt-outs. While cumbersome, submitting opt-out requests to major brokers (Acxiom, Spokeo, WhitePages, Intelius, BeenVerified) removes the most accessible aggregated profiles. Services like Privacy.com and DeleteMe automate this at scale.
In this final lab, you'll apply the mosaic effect to your own digital footprint. By listing apparently innocuous data points about your digital behavior, you'll explore what an AI system could infer from combining them — and what that means for how you think about your data.