Module 3 · Lesson 1

The Architecture of Invisibility

How data collection became so routine that consent became an afterthought — and what that costs us.

When did you last read what you agreed to?

In May 2018, the day the GDPR took effect, Max Schrems filed complaints against Facebook, Google, Instagram, and WhatsApp within hours of midnight. His argument was precise: these platforms had used "forced consent" — you could not use the service at all unless you accepted data collection. The option to say no did not exist in any meaningful sense. Within twelve hours of the regulation's start, the companies faced complaints totalling €3.9 billion in potential fines.

What Schrems had named was not a technical violation. It was an architecture — a system designed so that the question of consent could never actually be answered in the negative.

What "Consent" Actually Requires

In law and ethics, valid consent has four components: it must be informed (you know what you're agreeing to), voluntary (refusal must be a real option), specific (blanket consent to "anything we might do" doesn't count), and revocable (you can withdraw it). Digital platforms have historically undermined all four simultaneously.

The Federal Trade Commission's 2012 report on privacy described the average US privacy policy as requiring 76 work days per year to read in full — for every service an average person uses. The policies are long by design. Length functions as a consent mechanism that denies actual consent.

AI systems compound this problem. Where a 2005 website might collect your email address and browsing history, a 2024 AI platform trains on the text of every prompt you submit, may use your conversations to improve future models, and may retain inferences about your beliefs, health status, and relationships — none of which you explicitly disclosed.

Documented Case — Google Photos (2022)

In 2022, Google's privacy policy update quietly extended the right to use profile photos in AI training. Users who had uploaded family photos years earlier had not consented to this secondary use. No notification was sent; the update was embedded in routine policy language. Consumer advocacy groups in the EU filed formal complaints under GDPR Article 6 (lawfulness of processing), arguing the original consent could not retroactively extend to AI training purposes not yet disclosed at the time of upload.

The Layered Consent Problem

Modern data collection rarely happens in a single step. Your fitness app shares data with its analytics provider, which sells aggregate data to a health insurer's data broker, which sells enriched profiles to an employer background-check service. At each handoff, the original consent — "allow this app to track your steps" — is stretched further from its original meaning.

AI systems sit at the end of many such chains. A large language model trained on scraped internet data contains, implicitly, the words of bloggers who never imagined being training data, forum posts from people who later deleted their accounts, and medical questions asked anonymously on patient support sites. The scrape was legal; the consent was never sought.

The Common Crawl dataset — used in training GPT-3, LLaMA, and many other major models — contains approximately 3.1 trillion tokens scraped from the open web. No individual whose writing contributed to it gave consent for this purpose. Common Crawl itself is a nonprofit that makes no commercial use of the data, but the companies that train billion-dollar commercial models on it did not obtain separate consent from content creators.

Key Terms

Informed ConsentAgreement given with full knowledge of what is being consented to, including secondary uses.

Dark PatternsUI design choices that nudge users toward agreeing to data collection by making refusal difficult, confusing, or costly.

Secondary UseUsing data for a purpose different from the one for which it was originally collected.

GDPR Article 7EU legal standard requiring that consent be freely given, specific, informed, and unambiguous — and that withdrawal be as easy as giving consent.

Why This Is Especially Hard for AI

Traditional data collection is transactional: company takes X, uses X for purpose Y. AI training is transformative. A model trained on your data doesn't store your data — it distills patterns from it into weights. You cannot be "deleted" from a trained model the way you can be deleted from a database. This makes the right to erasure (GDPR Article 17) technically complex and, in many cases, practically impossible without retraining the entire model.

When Adobe updated its Terms of Service in June 2024 to include language allowing the company to access users' work via "automated techniques" to improve its AI products, users and professional photographers interpreted this as consent to train AI on their creative work. Adobe denied this interpretation, but the episode illustrated how ambiguous policy language — even unintentionally — can create a consent gap between what companies say they do and what users believe they are agreeing to.

The problem isn't that companies are necessarily acting in bad faith. It's that the existing architecture of consent — long policies, bundled agreements, retroactive updates — was designed for a world of simpler data transactions. AI broke that architecture, and we haven't built a replacement yet.

Lesson 1 Quiz

The Architecture of Invisibility · 5 questions

1. According to Max Schrems's 2018 GDPR complaints, what specific consent problem did Facebook and Google exhibit?

Correct. Schrems argued the platforms offered no genuine opt-out — acceptance was bundled with service access, which GDPR defines as invalid consent.

Incorrect. The issue was not the absence of a policy but the structural impossibility of refusing — a concept Schrems called "forced consent."

2. Which of the following is NOT one of the four components of valid consent identified in the lesson?

Correct. The four components are: informed, voluntary, specific, and revocable. Permanence is the opposite of valid consent — revocability is required.

Incorrect. Review the four components: informed, voluntary, specific, and revocable. "Permanent" consent is actually invalid by definition.

3. What made the 2022 Google Photos policy update ethically problematic from a consent standpoint?

Correct. This is the "secondary use" problem — data collected for one purpose (photo storage) was retroactively repurposed for AI training without new, specific consent.

Incorrect. The core issue was secondary use: using old data for a new purpose (AI training) that the original consent did not cover.

4. Why does the Common Crawl dataset raise consent concerns for AI development?

Correct. Legality and consent are different things. Scraping public web pages may be legal, but the authors of those pages were never asked whether they wanted to contribute to commercial AI training.

Incorrect. The lesson distinguishes legality from consent: the scraping may be legal, but consent from individual authors was never sought or given.

5. Why does AI training make the "right to erasure" (GDPR Article 17) particularly difficult to fulfill?

Correct. This is the fundamental technical tension: GDPR assumes data is stored in a deletable form, but AI training transforms data into model weights where individual contributions cannot be surgically removed.

Incorrect. The issue is technical: trained models encode patterns from data into weights, not raw records. You can't delete one person's contribution the way you'd delete a row in a database.

Lab 1 — Consent Audit

Analyze real consent architecture · AI Ethics M3

Your Task: Anatomy of a Consent Failure

In this lab you'll interrogate the consent frameworks of real AI platforms by discussing specific cases with the AI. Your goal is to develop precise language for describing when and why consent fails in AI data collection — and what valid consent would look like.

Work through at least three exchanges. Identify a specific consent problem, explain which component of valid consent it violates, and propose a concrete design fix.

Start here: Describe what "forced consent" means in the context of a major AI platform's data policy, and explain which of the four consent components it violates most directly.

Consent Ethics Lab

Welcome to Lab 1. I'm your AI ethics discussion partner focused on data consent and AI. Tell me about a specific consent mechanism — or failure — you want to analyze. We'll use the four-component framework (informed, voluntary, specific, revocable) to dissect it precisely.

Module 3 · Lesson 2

Surveillance by Inference

AI systems don't just collect what you share — they infer what you didn't.

If a system predicts your political views from your grocery purchases, did you consent to being politically profiled?

In 2012, a Minneapolis father walked into a Target store to complain that his teenage daughter had been receiving coupons for cribs and maternity clothing. He demanded an apology from the manager. Weeks later, he called back to apologize himself: his daughter was pregnant, and she hadn't told him. Target's statistical model had inferred her pregnancy from purchasing patterns — unscented lotion, calcium supplements, cotton balls — before she disclosed it to her own family.

Target's model assigned each customer a "pregnancy prediction score." It had been built on purchase data customers gave for an entirely different purpose: to get discounts on items they already wanted to buy. No one agreed to be pregnancy-scored.

The Gap Between Data Given and Knowledge Extracted

There is a fundamental difference between what you disclose and what a system infers from that disclosure. You might share your location so an app can show local weather. The app — or its advertising partners — can then infer your commute pattern, your workplace, your religious attendance on Sunday mornings, your visits to medical facilities, and your approximate income bracket. None of these were shared; all were inferred.

AI dramatically scales this inference gap. Classical statistics could identify pregnancy from purchasing data with reasonable accuracy. Modern machine learning can infer sexual orientation from facial photographs (a 2017 Stanford study achieved 81% accuracy for men from a single photo), political affiliation from social media likes, and mental health status from typing patterns — none of which require you to disclose these attributes directly.

The consent problem is layered: you consented to share X; the system inferred Y from X without telling you; the inferred Y was then used for decisions affecting you. At what point in this chain was your consent sought for Y?

Documented Case — Facebook Emotional Contagion Study (2014)

In 2014, Facebook published research in the Proceedings of the National Academy of Sciences showing that it had secretly manipulated the emotional tone of approximately 700,000 users' News Feeds for one week in 2012 to study emotional contagion. Users had not consented to participate in a psychological experiment. Facebook argued the manipulation was covered by its data use policy. Cornell University's IRB (which had partial involvement) later acknowledged the study raised serious ethical concerns. The US Senate Commerce Committee opened an inquiry. Facebook's defense — "it was in the terms of service" — became a landmark example of how platform consent frameworks can be stretched far beyond their intended scope.

Sensitive Attribute Inference and Protected Classes

Legal protections around protected characteristics — race, religion, health status, sexual orientation — mean very little if those attributes can be inferred from non-protected data and then acted upon. A lender cannot legally ask about your health. But if an AI model trained on purchasing data can infer chronic illness from pharmacy shopping patterns and use that signal — even implicitly — in a credit decision, the legal protection is bypassed through inference.

This problem has a name: attribute inference attacks. Researchers at MIT and the University of Texas demonstrated in 2013 that Netflix ratings — shared voluntarily for movie recommendations — could be combined with public IMDb data to de-anonymize users and infer their political views and sexual orientation. The data Netflix users shared was innocuous; what was extracted was not.

Regulators are beginning to respond. The EU AI Act (2024) requires "high-risk" AI systems to document the data they process and the inferences they produce. But enforcement of inference-based discrimination is still nascent — regulators can audit what data is collected; auditing what is inferred is far harder.

Key Terms

Inference GapThe difference between information you explicitly share and the attributes a system derives from that information without your knowledge.

Attribute Inference AttackThe use of innocuous disclosed data to reconstruct sensitive personal attributes that were never shared.

Proxy DiscriminationUsing a legally permissible variable (e.g., zip code) that correlates with a protected class to achieve discriminatory outcomes without technically using the protected attribute.

Contextual IntegrityHelen Nissenbaum's concept that privacy is violated when information flows outside the context in which it was originally shared — even if the information was "public."

Contextual Integrity as a Framework

Helen Nissenbaum, a philosopher at Cornell Tech, proposed that the right question isn't "was this information public?" but "does this use match the norms of the context in which it was shared?" Your medical information shared with a doctor flows appropriately to other treating physicians but not to employers. Your grocery purchases shared for discounts flow appropriately to inventory management but not to pregnancy scoring.

AI systems routinely violate contextual integrity by aggregating data from many contexts — medical, commercial, social — into a single model that serves entirely different purposes. The information was shared in specific contexts with specific implicit norms; the aggregation breaks all of them simultaneously.

The 2023 FTC report on commercial surveillance cited contextual integrity violations as a primary harm of the data broker industry, noting that AI-powered profiling companies now combine data from over 5,000 distinct sources to build individual profiles. Each source had its own consent context; none foresaw the profile.

Lesson 2 Quiz

Surveillance by Inference · 5 questions

1. What made Target's 2012 pregnancy prediction model an ethical consent issue, not just a technical one?

Correct. The ethical issue is the inference gap — data shared for one purpose (coupons) was transformed into sensitive personal profiling (pregnancy status) without consent for that secondary use.

Incorrect. The issue is the inference gap: customers consented to share purchase data for discounts, not to be scored on whether they were pregnant. No medical records were involved.

2. What did the 2013 Netflix de-anonymization research demonstrate about user data?

Correct. This is the attribute inference attack: innocuous data (ratings) plus public data (IMDb) enabled researchers to infer political views and sexual orientation and re-identify supposedly anonymous users.

Incorrect. The study showed that "anonymized" data isn't safe when combined with other sources — movie ratings could be used to infer sexual orientation and political views of specific individuals.

3. Helen Nissenbaum's concept of "contextual integrity" holds that privacy violations occur when:

Correct. Contextual integrity is about context-matching: medical data shared with a doctor flows appropriately to other physicians, not to employers. The context, not just the data, defines appropriate flow.

Incorrect. Contextual integrity focuses on whether data flows match the norms of the original sharing context — not on encryption, payment, or retention duration.

4. What was Facebook's primary defense when confronted about its 2014 emotional contagion experiment conducted without explicit user consent?

Correct. Facebook's "terms of service" defense became a landmark example of how consent frameworks can be stretched — claiming that broad platform policies substitute for specific informed consent to psychological experimentation.

Incorrect. Facebook argued the study was covered by users' acceptance of its data use policy. This "it's in the ToS" defense became a widely criticized example of consent framework abuse.

5. "Proxy discrimination" in AI systems refers to:

Correct. Proxy discrimination uses a "safe" variable (like zip code or purchase history) that correlates with race, religion, or other protected classes to discriminate while technically avoiding the prohibited attribute.

Incorrect. Proxy discrimination means using a non-protected variable (like zip code) that correlates with a protected class (like race) to achieve discriminatory outcomes without technically violating anti-discrimination law.

Lab 2 — Inference Ethics

Mapping inference gaps in real AI systems · AI Ethics M3

Your Task: Trace the Inference Chain

In this lab you'll practice identifying inference gaps — where AI systems derive sensitive attributes from innocuous disclosed data. Choose a real AI application (recommendation systems, credit scoring, health apps, ad targeting) and map what is disclosed, what is inferred, and what consent gap exists.

Complete at least three exchanges. Your analysis should name: (1) the data disclosed, (2) the attribute inferred, (3) the purpose the inference serves, and (4) whether contextual integrity is violated.

Start here: Pick a real AI system you use or know about. What data do users give it, and what might it infer that users don't realize?

Inference Ethics Lab

Welcome to Lab 2. I'm here to help you trace inference chains in real AI systems. Tell me about a specific system — what data users share with it, and what concerns you about what it might infer. We'll use contextual integrity and the inference gap framework to analyze it together.

Module 3 · Lesson 3

The Right to Be Forgotten vs. The Memory of Machines

Legal frameworks built for databases collide with the architecture of trained neural networks.

Can you delete yourself from a model that has already learned from you?

On March 31, 2023, Italy's data protection authority — the Garante — ordered OpenAI to immediately stop processing Italian users' data and temporarily blocked ChatGPT in the country. The Garante cited multiple violations: no age verification, no legal basis for mass data collection for training, and the impossibility of correcting inaccurate personal information that the model had, in effect, memorized. ChatGPT had been producing false biographical claims about real Italian citizens — information it could not simply "correct" by updating a database record, because the incorrect information was baked into model weights.

OpenAI's response included new privacy controls and an opt-out for Italian users. The service was restored in April. But the episode crystallized a structural problem no engineering patch could fully solve: a trained language model is not a database. It does not store facts; it stores patterns. Removing a fact from a pattern requires changing the pattern — which means, at minimum, retraining.

Machine Unlearning: The Technical Problem

The GDPR's "right to erasure" (Article 17) and the California Consumer Privacy Act's "right to delete" were written with databases in mind. In a relational database, a DELETE command removes a row. In a trained neural network, there is no equivalent operation. The network's knowledge is distributed across billions of parameters; no single parameter encodes a single person's data.

The emerging field of machine unlearning attempts to solve this. Researchers at Google, Stanford, and elsewhere have developed techniques to reduce a model's reliance on specific training examples — retraining from a checkpoint, gradient ascent on "forgotten" data, and influence function methods that identify which parameters were most affected by particular training examples. But none of these methods offer the clean equivalence of deletion that GDPR assumes, and all are computationally expensive.

A 2023 paper from researchers at the University of Washington and Stanford found that even after applying machine unlearning techniques, models retained measurable residual knowledge about "deleted" individuals at rates between 3% and 28% depending on the method — far from the complete erasure a data subject might expect when exercising a legal right.

Documented Case — Clearview AI (2020–2023)

Clearview AI scraped over 30 billion facial images from social media platforms and built a facial recognition model sold to law enforcement. When Vermont's Attorney General filed suit in 2020 and Illinois sought to enforce its Biometric Information Privacy Act, Clearview faced a fundamental problem: even if it deleted a person's photos from its database, the neural network trained on those photos had already learned that person's facial geometry. The model's "memory" of a face persists even if the training image is deleted. Clearview settled multiple suits, agreeing to stop selling to private companies in the US, but conceded it could not "un-train" its model on specific individuals. Courts in Australia and the UK ordered the deletion of collected data; Clearview complied with database deletions while acknowledging the model itself could not be similarly purged.

Memorization in Large Language Models

In 2023, researchers at Google DeepMind and collaborating institutions published a study demonstrating that ChatGPT (GPT-3.5 Turbo) could be prompted to reproduce verbatim training data — including personal information — through a technique as simple as asking the model to repeat a word indefinitely. The model would eventually "diverge" into memorized text, producing names, email addresses, phone numbers, and private content that had appeared in its training set.

This memorization problem has direct consent implications. People whose personal information appeared in web pages that were scraped into training data never consented to that information being permanently encoded into a model that could reproduce it on request. The information isn't stored in a deletable database; it's encoded in weights and can be elicited by anyone with API access.

The FTC opened an inquiry into OpenAI in July 2023 partly on these grounds, asking for documentation of what personal data the models were trained on and what steps were taken to prevent harmful outputs of personal information. OpenAI produced over a thousand pages of documentation.

Key Terms

Right to ErasureGDPR Article 17 right for individuals to request deletion of their personal data — technically complex to fulfill for AI models.

Machine UnlearningTechniques to reduce a trained model's reliance on specific training examples without full retraining; no current method guarantees complete deletion.

MemorizationA model's tendency to reproduce specific training examples verbatim, particularly for rare or repeated content, making personal data potentially extractable.

Differential PrivacyA mathematical framework for training models in a way that limits how much any individual training example influences the output, reducing memorization risk.

Regulatory Responses and Their Limits

The EU AI Act (fully in force by 2026) requires providers of general-purpose AI models to publish summaries of training data and to comply with copyright and data protection law — including erasure requests. But the Act stops short of specifying how erasure must be technically achieved, delegating that to future guidance from the European AI Office.

In the United States, the FTC has signaled through its "Algorithmic Accountability" framework that companies may be required to delete not just training data but the models trained on illegally collected data — a position sometimes called "algorithmic disgorgement." In a 2022 settlement with Everalbum, a photo-sharing app that had trained facial recognition models without consent, the FTC required deletion of both the training images and the models built from them. This was the first time a US regulator required model deletion as a remedy.

Machine unlearning, differential privacy during training, and federated learning (training on data that never leaves users' devices) are the primary technical approaches being developed to make consent-compatible AI training possible at scale. None is yet a complete solution.

Lesson 3 Quiz

The Right to Be Forgotten vs. The Memory of Machines · 5 questions

1. Why did Italy's Garante block ChatGPT in March 2023?

Correct. The Garante's action combined multiple violations: no legal basis for training data collection, absent age verification, and a technically novel problem — false information about real people baked into model weights, not correctable like a database entry.

Incorrect. The Garante cited the absence of legal basis for mass training data collection, no age verification, and the inability to correct false personal information encoded in model weights.

2. What did the 2023 Google DeepMind memorization study demonstrate about ChatGPT?

Correct. The study showed that asking the model to repeat a word indefinitely caused it to "diverge" into memorized training content, demonstrating that personal information in training data can be extracted by anyone with API access.

Incorrect. The study showed that simple adversarial prompting (like asking the model to repeat a word indefinitely) could cause it to reproduce verbatim training content, including real people's personal information.

3. Why was the Clearview AI case significant for the right to erasure?

Correct. This is the core tension: database deletion ≠ model unlearning. Clearview complied with database deletions but acknowledged the trained model's "memory" of individuals' faces could not be purged by the same mechanism.

Incorrect. The significance was that Clearview acknowledged its trained model could not be purged of specific individuals' data the way a database can be — database deletion and model unlearning are fundamentally different operations.

4. What was historically significant about the FTC's 2022 settlement with Everalbum?

Correct. Algorithmic disgorgement — deleting the model itself, not just the data — set a new regulatory precedent. It signals that illegally collected training data can taint and require deletion of the models built from it.

Incorrect. The significance was the concept of "algorithmic disgorgement" — the FTC required deletion of both the training images and the models trained on them, the first time this remedy was applied by a US regulator.

5. Differential privacy as a training technique is designed primarily to:

Correct. Differential privacy adds calibrated noise to the training process so that the model's outputs are similar whether or not any individual's data was included — reducing memorization and making inference attacks harder.

Incorrect. Differential privacy limits the influence of any single training example on the model, making memorization and extraction of individual data points statistically unlikely — it doesn't encrypt weights or delete data automatically.

Lab 3 — Machine Memory

Exploring the limits of erasure in AI systems · AI Ethics M3

Your Task: Design a Consent-Compatible Training Pipeline

The right to erasure collides with the architecture of trained neural networks. In this lab, you'll explore what a technically realistic consent framework for AI training might look like — and where current approaches fall short. Consider: machine unlearning, differential privacy, federated learning, and data minimization.

Complete at least three exchanges. Identify a specific type of AI system, propose a technical approach to making it more consent-compatible, and evaluate its limitations honestly.

Start here: If you were building a facial recognition system and needed to comply with a user's right to erasure, what technical approach would you use — and what would its limitations be?

Machine Memory Lab

Welcome to Lab 3. I'm your AI ethics partner focused on the technical dimensions of consent and erasure in machine learning. Let's work through what it would actually take to honor a right to erasure in a trained model — and where the engineering runs out before the legal requirement is satisfied.

Module 3 · Lesson 4

Building Consent That Actually Works

From checkbox compliance to genuine autonomy — what meaningful consent for AI requires.

What would it take to design a consent framework that works for systems that learn, infer, and remember?

In April 2021, Apple released iOS 14.5 with a feature called App Tracking Transparency. Every app that wanted to track users across other apps or websites was now required to display a prompt: "Allow [App] to track your activity across other companies' apps and websites?" The choices were binary: "Ask App Not to Track" or "Allow."

The result was dramatic. Within months, industry measurements found that 85% of US users chose not to be tracked when given a clear, friction-free choice. Facebook's parent Meta reported a $10 billion revenue reduction in 2022 that it attributed substantially to ATT. The lesson was blunt: when consent is genuinely voluntary and clearly explained, most people decline. The prior consent architecture had not been designed for genuine refusal to be the common outcome.

What Genuine Consent Architecture Looks Like

The Apple ATT example shows that the design of a consent interface is not neutral. Deliberately confusing opt-out flows, pre-ticked consent boxes, and consent bundled with service access all produce artificially high consent rates. Real consent architecture has several identifiable features:

Granularity: Users can consent to some uses and refuse others — not all-or-nothing. Spotify's privacy settings allow separate choices for personalization, third-party sharing, and research use. This is closer to valid specific consent than a single "I agree."

Timing: Consent is sought before data collection, not retrospectively. Amazon's Alexa Skills now require developers to obtain user consent before accessing voice history — a requirement that was not present in Alexa's initial consent framework.

Plain language: The UK's Information Commissioner's Office has issued guidance requiring that consent requests be "as prominent as possible and separate from other terms." Ireland's DPC fined WhatsApp €225 million in 2021 partly because its privacy notice was insufficiently clear about the legal basis for processing — a case directly about whether users could understand what they were agreeing to.

Genuine revocability: Withdrawal of consent must be as easy as giving it. The GDPR's Article 7(3) makes this explicit, but many platforms still make opt-out buried in settings menus requiring multiple steps.

Documented Case — Mozilla Firefox and the "Privacy-Preserving Ads" Controversy (2023)

In 2023, Mozilla added a feature to Firefox called "Privacy-Preserving Attribution" (PPA) — an API that allowed websites to measure ad conversions without tracking users across sites. The feature was enabled by default for all users without explicit notification. Privacy advocates criticized the move: even if the technology was more privacy-preserving than alternatives, enrolling users in an ad measurement system without opt-in consent violated the principle that consent should be active, not passive. Mozilla subsequently acknowledged the issue and added clearer disclosure. The episode illustrated that good technical intentions do not substitute for consent process — even a privacy improvement can be a consent violation if deployed without transparency.

Dynamic Consent Models for AI

Traditional consent is a one-time event. AI's ongoing learning creates a need for dynamic consent — frameworks where users can review, adjust, and withdraw consent as AI systems evolve and their data is used in new ways. Several implementations exist at scale:

The UK Biobank, which collects genetic and health data from 500,000 participants, uses a dynamic consent model where participants can log in to a portal and update their consent preferences — specifying which research uses they approve, which they withdraw, and receiving notifications when new uses are proposed. This is considered a gold standard in biomedical research.

Google's "My Ad Center" (launched 2022) allows users to see and adjust what topics their ad profile includes, turn off personalization by category, and review what data Google infers about them. While imperfect — it doesn't allow full opt-out from all inference — it is a meaningful step toward granular dynamic consent at consumer scale.

The IEEE's Ethically Aligned Design framework (v2, 2019) recommends that AI systems provide "consent dashboards" giving users visibility into what data is held, what has been inferred, what decisions were made based on that data, and granular controls for each use. No major AI platform yet fully implements this, but it provides a normative target.

Key Terms

Dynamic ConsentA framework where consent is ongoing, revisable, and use-specific — allowing users to update permissions as AI systems and their data uses evolve.

App Tracking Transparency (ATT)Apple's iOS 14.5 feature (2021) requiring explicit opt-in for cross-app tracking; 85% of US users chose to opt out when given a clear choice.

Algorithmic TransparencyThe obligation to disclose how AI systems make decisions that affect users, enabling informed consent to those decision processes.

Privacy by DesignAnn Cavoukian's principle (adopted in GDPR Recital 78) that privacy protections should be embedded into system architecture from the start, not added after deployment.

What Real Reform Requires

The FTC's 2023 report on commercial surveillance identified three structural changes needed to make AI consent frameworks meaningful: first, a federal privacy law with a private right of action (so individuals can sue, not just regulators); second, data minimization requirements that limit what can be collected to what is genuinely necessary; and third, algorithmic transparency mandates that require AI systems to disclose not just what data they collect but what they infer and how those inferences are used in decisions.

The EU AI Act's Article 13 requires that high-risk AI systems be transparent enough that users can make informed decisions — including about whether to interact with the system at all. This is the first major legal framework to treat AI-system-level transparency as a precondition for valid consent.

The path from where consent frameworks stand today to where they need to be for AI is not primarily technical. The technical tools exist: differential privacy, federated learning, machine unlearning, consent dashboards, data minimization. What is lacking is the regulatory mandate and the economic incentive to deploy them at the cost of reduced data collection. Apple's ATT experiment suggests that when given clear choice, most people prefer privacy — and that genuine consent architecture might fundamentally change the economics of AI data collection.

Lesson 4 Quiz

Building Consent That Actually Works · 5 questions

1. What did Apple's App Tracking Transparency (ATT) reveal when deployed in iOS 14.5?

Correct. The 85% opt-out rate demonstrated that prior consent architectures had produced artificially high apparent consent rates through friction and obscurity — not genuine user preference for tracking.

Incorrect. About 85% of US users opted out when given a clear choice, and Meta attributed a $10 billion revenue reduction partly to ATT. The prior high consent rates reflected design, not preference.

2. Why was the 2023 Mozilla Firefox "Privacy-Preserving Attribution" rollout criticized as a consent violation?

Correct. Good technical intentions do not substitute for consent process. Even a privacy improvement can constitute a consent violation if users are enrolled without notification and without an active opt-in.

Incorrect. The criticism was process-based: Mozilla enrolled users by default without notification. Good technical intent doesn't substitute for active, informed consent — even privacy improvements require it.

3. "Dynamic consent" differs from traditional one-time consent primarily in that it:

Correct. Dynamic consent treats consent as an ongoing relationship rather than a one-time transaction — users can update preferences, withdraw specific consents, and receive notifications about new proposed uses.

Incorrect. Dynamic consent means users can update, adjust, and withdraw consent over time as systems and their data uses evolve — not daily re-consent or automated inference of preferences.

4. What did Ireland's Data Protection Commission's €225 million fine against WhatsApp (2021) address?

Correct. The fine was substantially about transparency of the legal basis for processing — if users can't understand the privacy notice, they cannot give informed consent, regardless of whether they clicked "I agree."

Incorrect. The DPC's fine focused on transparency: WhatsApp's privacy notice was insufficiently clear about what users were actually agreeing to — a consent notice that can't be understood is effectively no consent notice at all.

5. According to the EU AI Act's Article 13, what is treated as a precondition for valid consent to interact with high-risk AI systems?

Correct. Article 13 establishes that transparency is not just good practice — it's legally required as the precondition for meaningful consent. Users cannot consent to a system they cannot understand.

Incorrect. Article 13 requires that high-risk AI systems be transparent enough for users to make informed decisions about interacting with them — transparency is a legal precondition for consent, not a post-deployment aspiration.

Lab 4 — Design a Consent Framework

Building genuine consent architecture for AI · AI Ethics M3

Your Task: Design a Consent Dashboard for a Real AI System

In this capstone lab, you'll design a concrete consent framework for a real AI application. Your design must address: what data is collected and for which specific purposes, what is inferred and disclosed, how users revoke consent, and how the system handles existing trained models if consent is withdrawn.

Complete at least three exchanges. Reference at least one real regulatory standard (GDPR, EU AI Act, CCPA, FTC guidelines) and one technical mechanism (differential privacy, federated learning, machine unlearning, data minimization) in your design.

Start here: Choose a real AI system — a health app, a credit scoring model, a hiring algorithm, or a recommendation engine — and begin sketching what a genuine consent dashboard for it would need to include.

Consent Design Lab

Welcome to Lab 4 — the capstone for Module 3. I'm your AI ethics partner for designing practical consent frameworks. Tell me which real AI system you've chosen to analyze, and let's build out what a genuinely consent-compatible version of it would look like — with specific regulatory references and technical mechanisms, not just aspirational goals.

Module 3 Test

The Consent You Never Gave · 15 questions · Pass at 80%

1. Max Schrems's 2018 GDPR complaints against Google and Facebook on day one of the regulation centered on which specific consent problem?

Correct. Forced consent (bundling access with data collection) was Schrems's core argument — it renders consent involuntary, violating GDPR's requirement that refusal must be a genuine option.

Incorrect. Schrems argued these platforms used "forced consent" — no access without data collection acceptance — making consent structurally involuntary under GDPR Article 7.

2. Which of the following is a valid component of legally meaningful consent?

Correct. Specificity is one of the four required components of valid consent. Blanket consent to "anything we might ever do" does not meet the standard of specific, informed agreement.

Incorrect. Valid consent must be informed, voluntary, specific, and revocable. "Permanent," "comprehensive," and "presumptive" are all characteristics of consent frameworks that fail the legal standard.

3. The Common Crawl dataset raises a consent concern because:

Correct. Legality and consent are distinct. Scraping public web pages may be legal, but content creators were never asked whether they wished to serve as training data for commercial AI systems.

Incorrect. The consent issue is that billions of individuals contributed words to the web in specific contexts and never consented to their writing being used as commercial AI training data — regardless of the scraping's legality.

4. Target's 2012 pregnancy prediction model violated consent norms by:

Correct. This is the secondary use problem at its most personal: purchase data collected for one purpose (discounts) was transformed into sensitive health profiling without consent for that purpose.

Incorrect. The problem was secondary use: data shared for coupons was used to infer pregnancy — a sensitive attribute customers never agreed to have derived from their purchases.

5. Helen Nissenbaum's "contextual integrity" principle holds that privacy violations occur when:

Correct. Contextual integrity focuses on whether the flow of information matches the norms of its original context — your medical history shared with a doctor flows appropriately to other physicians, not to employers.

Incorrect. Contextual integrity is about matching information flows to the norms of the context where data was originally shared — not about borders, retention periods, or licensing fees.

6. Facebook's defense of its 2014 emotional contagion experiment (conducted on 700,000 users without explicit consent) was that:

Correct. The "it's in the ToS" defense became a landmark example of how platform consent frameworks can be stretched far beyond their intended scope to cover activities — like psychological experimentation — users could not have anticipated.

Incorrect. Facebook argued the experiment was covered by its data use policy. This became a landmark case for how broad terms of service cannot substitute for specific, informed consent to psychological research.

7. Italy's Garante temporarily blocked ChatGPT in March 2023. Which of the following was NOT cited as a reason?

Correct. The Garante's actual grounds were: no legal basis for training data collection, absent age verification, and the impossibility of correcting false personal information in model weights. Financial fraud was not cited.

Incorrect. The Garante cited: absent legal basis for training data collection, no age verification, and inability to correct false personal information baked into model weights — not financial fraud.

8. "Machine unlearning" refers to:

Correct. Machine unlearning attempts to reduce a model's dependence on specific training examples — but 2023 research showed residual knowledge persists at 3–28% even after applying these techniques.

Incorrect. Machine unlearning refers to technical methods for reducing a model's reliance on specific training data points — an imperfect approximation of erasure that does not guarantee complete deletion.

9. The 2022 FTC settlement with Everalbum was historically significant because it:

Correct. Algorithmic disgorgement means illegally collected data taints the models built from it — both must be deleted. This was the first time the FTC applied this remedy, setting a major precedent.

Incorrect. The settlement's significance was requiring model deletion alongside training data deletion — "algorithmic disgorgement" — the first time this remedy was applied by a US federal regulator.

10. Apple's App Tracking Transparency feature found that approximately what percentage of US users opted out of cross-app tracking when given a clear choice?

Correct. The 85% opt-out rate demonstrated that high prior apparent consent rates reflected design friction and opacity, not genuine user preference — a stark indictment of consent architectures that made refusal difficult.

Incorrect. About 85% of US users opted out when given a clear, friction-free choice. This rate revealed that prior high consent figures were products of confusing design, not genuine preference.

11. The UK Biobank's dynamic consent model is considered a gold standard because it:

Correct. The UK Biobank model treats consent as an ongoing relationship rather than a one-time form — participants can update, specify, and withdraw consent for particular uses over time.

Incorrect. The UK Biobank allows participants to actively manage their ongoing consent through a portal — updating preferences, specifying approved uses, and receiving notifications — treating consent as a living relationship.

12. Differential privacy as applied to AI training primarily addresses which consent-related risk?

Correct. Differential privacy adds calibrated noise to limit individual data influence, making it statistically improbable that specific individuals' information can be memorized or extracted from the model.

Incorrect. Differential privacy limits the influence of individual training examples on model outputs, reducing memorization risk — it doesn't prevent collection, generate consent forms, or encrypt data for individuals.

13. Ireland's DPC imposed a €225 million fine on WhatsApp in 2021 primarily because:

Correct. Transparency of legal basis is a prerequisite for informed consent. If users cannot understand a privacy notice, their click of "I agree" does not constitute genuine consent — regardless of what they technically agreed to.

Incorrect. The fine was about transparency: WhatsApp's notice didn't clearly explain the legal basis for processing, making genuine informed consent impossible regardless of users' formal acceptance.

14. The 2023 Google DeepMind memorization study showed that personal data in AI training sets can be extracted because:

Correct. The "divergence" technique — asking the model to repeat a word indefinitely — caused it to reproduce memorized training content including real people's names, addresses, and phone numbers.

Incorrect. The study showed that adversarial prompting (like asking the model to repeat a word indefinitely) can cause LLMs to reproduce verbatim training data, including personal information of real individuals.

15. Under the EU AI Act's Article 13, what is treated as a legal precondition for valid consent to high-risk AI systems?

Correct. Article 13 establishes system-level transparency as a consent prerequisite — users cannot meaningfully consent to a system they cannot understand. This is the first major legal framework to codify this principle.

Incorrect. Article 13 requires high-risk AI systems to be transparent enough for users to make informed decisions about interacting with them — transparency is legally required as the foundation of valid consent.