Module 7 · Lesson 1

Training Data and Where Bias Begins

What a system learns depends entirely on what it was shown — and what was left out.

Why does a facial-recognition system trained mostly on one type of face fail on others?

In January 2019, Joy Buolamwini and Timnit Gebru published "Gender Shades," a landmark audit of commercial face-analysis systems from IBM, Microsoft, and Face++. They tested each system on a dataset of 1,270 faces balanced across skin tone and gender. The darker-skinned female faces were misclassified at error rates up to 34.7 percentage points higher than lighter-skinned male faces. The systems were not broken. They had simply learned from training datasets built overwhelmingly from lighter-skinned individuals.

The training data was the bias. The model faithfully reproduced it.

How Training Data Shapes What a Model Learns

A visual AI model learns by looking at thousands or millions of labeled images. If those images skew toward certain faces, objects, or settings, the model builds a world model that reflects that skew. It is not making moral judgments — it is doing statistics on the data it was given.

ImageNet, the dataset that powered the modern deep-learning revolution starting around 2012, contains roughly 14 million images scraped largely from the internet. Internet photographs are not a neutral sample of humanity. They over-represent the Global North, younger adults, and indoor consumer settings. Models trained on ImageNet inherit those emphases silently.

A 2019 study by Zhao et al. found that the MS-COCO dataset (used widely for object and scene recognition) showed women associated with "cooking" images at rates far exceeding their actual presence in the photos — a bias amplified by the model during training, not just reflected from it.

Real Case — IBM Face Recognition Audit (2018–2019)

After the Gender Shades paper, IBM updated its system and reported error-rate reductions for darker-skinned faces. Microsoft similarly revised its offering. Face++ results improved more slowly. This demonstrated that bias is not inevitable — but it requires deliberate audit and correction. Without external audits like Buolamwini and Gebru's, the disparities would have persisted unnoticed inside commercial products used by employers, governments, and police.

Three Categories of Training Data Bias

Representation bias occurs when certain groups or categories appear far less often in training data than in the real world. A model that sees 95% lighter-skinned faces will have poor internal representations for darker-skinned ones.

Label bias occurs when the annotations attached to images reflect human prejudice. If human annotators label the same facial expression as "aggressive" on one face and "assertive" on another based on race, the model learns that association as fact.

Historical bias occurs even in perfectly representative data when the real-world patterns being captured are themselves the product of historical inequity. A model trained to predict "likely job candidate" from photos will re-encode discrimination if the historical hiring data reflects discriminatory hiring practices.

Representation bias —Some groups are underrepresented in training data, so the model performs worse on them.

Label bias —Human annotators apply inconsistent or prejudiced labels, which the model learns as ground truth.

Historical bias —Training data reflects real-world inequity, so the model perpetuates it statistically.

Why This Matters Beyond Face Recognition

Medical imaging AI trained on datasets from large academic hospitals in wealthy countries may perform poorly on images from community clinics in lower-income regions, where equipment, lighting, and patient demographics differ. A 2019 study in Nature Medicine showed that dermatology AI trained on images from lighter-skinned populations achieved significantly lower accuracy on darker skin tones — a gap with direct consequences for cancer detection.

The critical insight is that bias enters at data collection, compounds at labeling, and magnifies at model deployment. Auditing only the final deployed model — without examining each upstream step — means catching only the last symptom of a problem that started much earlier.

Lesson 1 Quiz

Training Data and Where Bias Begins

What did the Gender Shades study (Buolamwini & Gebru, 2019) primarily find?

Correct. The audit tested IBM, Microsoft, and Face++ on a balanced dataset of 1,270 faces and found dramatic error-rate disparities, worst for darker-skinned females.

Not quite. Review the Gender Shades findings — large disparities existed across all three commercial vendors tested.

Which type of bias occurs when human annotators apply inconsistent labels to images based on their own prejudices?

Correct. Label bias occurs when the annotations attached to training images reflect human prejudice, which the model then learns as statistical fact.

Not quite. Label bias specifically refers to inconsistent or prejudiced annotation — distinct from who appears in the data or what historical patterns the data reflects.

A model trained to predict job candidates from historical hiring data re-encodes past discrimination. This is an example of:

Correct. Historical bias occurs when training data accurately reflects real-world patterns that are themselves the result of systemic inequity.

Not quite. Historical bias is the term for when data correctly mirrors a world shaped by past injustice — making the data statistically accurate but ethically problematic.

Lab 1: Tracing Bias to Its Source

Conversation lab — explore how training data creates visual AI bias

Your Task

In this lab you'll interrogate how training data creates downstream bias in visual AI systems. Use the AI tutor to work through real scenarios and deepen your understanding of where bias originates.

Starter prompt: "A hospital wants to deploy a skin-lesion detection AI trained primarily on images from European patients. Walk me through where bias could enter — at data collection, labeling, and deployment — and what the real-world consequences might be."

AI Tutor — Bias Origins

Module 7 · Lab 1

Welcome to Lab 1. I'm here to help you trace the origins of training-data bias in visual AI. Try the starter prompt above, or ask me about any real case where a visual AI system showed unexpected failures. What would you like to explore?

Module 7 · Lesson 2

Facial Recognition and the Criminal Justice System

When an algorithm's error has the power to deprive someone of their freedom.

What happens when a facial recognition system confidently identifies the wrong person — and police act on it?

Between 2019 and 2023, at least six Americans were wrongfully arrested after facial recognition algorithms misidentified them — all six were Black men. Robert Williams of Detroit was arrested in January 2020 after an AI system matched his driver's license photo to a blurry surveillance image of a shoplifter. He was held for 30 hours before police acknowledged the match was incorrect. In 2021, Nijeer Parks of New Jersey spent ten days in jail on similar grounds. In both cases, the human investigators treated the algorithm's output as primary evidence rather than as an investigative lead requiring verification.

Why False Positives in Criminal Justice Are Different

In most AI applications, a false positive is an inconvenience. A spam filter lets through a newsletter. A recommendation engine suggests an irrelevant product. In criminal justice, a false positive can mean handcuffs, a cell, a lost job, family separation, and lifelong stigma — even if charges are later dropped.

The NIST Face Recognition Vendor Testing (FRVT) program, whose results became public in a major 2019 report, tested 189 algorithms from 99 developers. The majority showed measurably higher false-positive rates for Black and Asian faces compared to white faces, and for women compared to men. Some algorithms showed false-positive rates for Black women that were 100 times higher than for white men.

These are not rounding errors. At the scale of a city's surveillance network processing millions of faces per day, a 100x disparity translates into dramatically more false alerts targeting one demographic than another.

Real Case — Detroit Police Department, 2020

Robert Williams was arrested at his home in front of his children after the Detroit Police Department's facial recognition system matched his DMV photo to a shoplifting suspect. The match was made by an algorithm; a detective then confirmed it by comparing two photos — an unscientific "looks close enough" process. The ACLU represented Williams and the case became a landmark in the debate over facial recognition use by law enforcement. Detroit subsequently restricted, though did not ban, its use of facial recognition for probable cause.

How Investigator Reliance Amplifies AI Error

A consistent pattern across wrongful-arrest cases is automation bias — the tendency of human decision-makers to over-trust algorithmic outputs. When an AI system produces a confident-seeming match, investigators often spend less effort looking for contradicting evidence. The algorithm's output becomes a framing device that shapes how all subsequent information is interpreted.

The RAND Corporation and the Georgetown Law Center on Privacy and Technology have both documented that many US police departments using facial recognition had no written policies governing its use, no requirements for corroborating evidence, and no obligation to disclose to defendants that the identification was AI-assisted.

False positive —The system incorrectly identifies a match when none exists; in facial recognition, this means flagging an innocent person.

Automation bias —The tendency for humans to over-rely on automated system outputs, reducing their own critical scrutiny.

Policy Responses (Real, Documented)

San Francisco banned government use of facial recognition in 2019. Portland, Oregon banned it for both government and private commercial use in 2020. The EU's AI Act, finalized in 2024, places facial recognition in public spaces into its highest-risk category and bans most real-time use. These are direct legislative responses to documented misidentification harm.

The core problem is not that the technology makes mistakes — all technology does. The problem is that the mistakes are distributed unequally along racial lines, and that institutional processes have been built around the technology without accounting for that disparity or ensuring adequate human oversight before consequential action is taken.

Lesson 2 Quiz

Facial Recognition and the Criminal Justice System

According to the NIST FRVT 2019 report, which demographic showed some of the highest false-positive rates in tested facial recognition algorithms?

Correct. The NIST report found that some algorithms showed false-positive rates for Black women up to 100 times higher than for white men.

Not quite. The NIST FRVT data specifically highlighted dramatically elevated false-positive rates for Black women in many of the 189 algorithms tested.

Robert Williams was wrongfully arrested in Detroit in 2020. What was the core failure in the process?

Correct. The detective "confirmed" the match by visually comparing two photos — an informal process that reflected automation bias rather than independent verification.

Not quite. The critical failure was how investigators used the output — they relied on it as near-certain identification rather than a lead to be independently verified.

What is "automation bias" in the context of facial recognition investigations?

Correct. Automation bias causes human decision-makers to treat algorithm outputs as more reliable than they are, framing all subsequent evidence through the lens of that output.

Not quite. Automation bias is a human psychological tendency, not a technical flaw — it describes how people respond to algorithmic confidence, not how algorithms compute.

Lab 2: Facial Recognition and Justice

Conversation lab — examine real-world wrongful-identification cases

Your Task

You'll dig into the intersection of facial recognition accuracy disparities and criminal justice consequences. Examine how the Robert Williams and Nijeer Parks cases expose systemic gaps in how AI evidence is used.

Starter prompt: "Compare the Robert Williams and Nijeer Parks wrongful arrest cases. What were the similarities in how facial recognition was used? What policy safeguards, if any, would have prevented these outcomes?"

AI Tutor — Facial Recognition & Justice

Module 7 · Lab 2

Welcome to Lab 2. I can help you analyze documented wrongful-arrest cases involving facial recognition misidentification, examine the role of automation bias, and explore policy responses. Use the starter prompt or bring your own question. What aspect would you like to explore first?

Module 7 · Lesson 3

Errors in Medical and Commercial Visual AI

Bias is not only a social justice problem — it is a patient safety problem and a consumer equity problem.

When a medical imaging AI performs well on some patients and poorly on others, who bears the cost of that disparity?

In 2019, a study published in Nature Medicine evaluated a deep-learning system for dermatological diagnosis trained on a dataset of 129,450 images. The system achieved diagnostic accuracy comparable to board-certified dermatologists — but the training images were overwhelmingly from patients with lighter skin tones. When tested on images with diverse skin tones, performance degraded measurably. A missed melanoma on a darker-skinned patient is not a data-quality problem. It is a life-threatening failure of deployment without adequate validation.

The Pulse Oximeter Problem: A Pre-AI Parallel

Visual AI bias in medicine did not emerge from nowhere. The pulse oximeter, a device that reads blood oxygen levels by shining light through the skin, was developed and calibrated primarily on lighter-skinned individuals. A 2020 study in the New England Journal of Medicine found that pulse oximeters were nearly three times more likely to miss low oxygen levels in Black patients compared to white patients — a hardware bias with direct consequences during the COVID-19 pandemic, when blood oxygen monitoring was critical.

Visual AI that analyzes medical images repeats this pattern. Training on non-representative data → validation on similar non-representative data → deployment without adequate equity testing → harm to underrepresented populations who receive worse diagnostic support.

Real Case — Amazon Rekognition and Darker Skin

In 2018, the ACLU tested Amazon's commercial Rekognition API by matching 535 members of Congress against a database of 25,000 publicly available mugshots. The system produced 28 false matches — disproportionately for members of Congress who were people of color. Amazon disputed aspects of the test methodology but acknowledged that confidence thresholds matter and that the system's performance varied by skin tone. Amazon subsequently paused police sales of Rekognition in 2020, following widespread concern about racial bias and misuse.

Commercial Computer Vision: Hiring and Emotional AI

Beyond medicine and law enforcement, commercial visual AI has expanded into hiring. Companies including HireVue have sold video interview analysis products that claim to assess candidate suitability from facial expressions, voice tone, and micro-movements. The scientific basis for such "emotional AI" is contested — a 2019 review in Psychological Science in the Public Interest by Lisa Feldman Barrett and colleagues found that facial expressions do not reliably encode discrete emotions in a way that generalizes across cultures, individuals, or contexts.

When these systems encode assumptions about which facial expressions indicate confidence or competence, they build in cultural and demographic bias that can systematically disadvantage candidates who do not fit the implicit model of the "ideal hire" that the training data encodes. In 2021, HireVue discontinued its facial analysis component following sustained criticism from AI ethics researchers and regulators.

Validation gap —When a model is tested only on populations similar to its training set, its true performance on other populations remains unknown until deployment.

Emotional AI —Systems that claim to infer emotional state from visual signals; scientifically contested and prone to demographic bias.

The FDA and Medical AI Equity

The U.S. Food and Drug Administration's 2022 action plan for AI/ML-based Software as a Medical Device (SaMD) explicitly included equity in its framework. Developers of medical imaging AI are increasingly expected to provide demographic performance breakdowns — not just aggregate accuracy figures — before approval. This is a direct institutional response to documented racial disparities in medical AI performance.

The thread connecting these cases is the same: a system performs well on the population most like its training data, and the people furthest from that template receive the worst outcomes. The harm is not random — it is structured by who had access to the institutions that generated the training data in the first place.

Lesson 3 Quiz

Errors in Medical and Commercial Visual AI

The 2020 New England Journal of Medicine study on pulse oximeters found what problem relevant to AI bias?

Correct. The case shows that calibrating a sensing technology on non-representative populations produces a hardware bias — directly parallel to how training data bias works in visual AI.

Not quite. The finding was a roughly threefold difference in missed low-oxygen readings for Black patients — a bias introduced at the calibration/training stage, not through deliberate design.

Why did HireVue discontinue its facial analysis component in 2021?

Correct. Researchers including Lisa Feldman Barrett had published peer-reviewed challenges to the foundational premise — that facial expressions reliably encode discrete emotions across individuals and cultures.

Not quite. HireVue discontinued the feature voluntarily after sustained scientific and regulatory scrutiny of whether facial expressions actually encode stable, generalizable emotional states.

What does "validation gap" mean in the context of medical imaging AI?

Correct. A validation gap means the model's limitations remain invisible until real-world deployment exposes patients from underrepresented groups to unreliable diagnostic support.

Not quite. A validation gap specifically refers to testing only on populations that resemble the training set — creating a false picture of generalized accuracy.

Lab 3: Medical and Commercial AI Errors

Conversation lab — probe the validation gap and equity failures in real deployments

Your Task

Examine how validation gaps in medical AI and unscientific claims in commercial emotional AI create real-world harm. Use the AI tutor to think through how equity auditing should work before deployment.

Starter prompt: "A healthcare company wants to deploy a chest X-ray analysis AI trained on images from three major US academic medical centers. Design a validation protocol that would specifically test for demographic performance disparities before deployment."

AI Tutor — Medical & Commercial Visual AI

Module 7 · Lab 3

Welcome to Lab 3. I can help you think through equity validation protocols for medical AI, discuss what the FDA's AI/ML SaMD framework requires, or dig into the science behind emotional AI claims. Use the starter prompt or bring your own scenario. What would you like to tackle?

Module 7 · Lesson 4

Auditing, Accountability, and Fixing Visual AI Bias

Bias in visual AI is not inevitable — but fixing it requires deliberate work at every stage, by someone with authority to act.

What does it actually take to audit a visual AI system for bias, and who is responsible for doing it?

The Gender Shades paper did not just document bias — it triggered a market response. Within months of publication, IBM released a significantly improved facial analysis system with reduced error-rate disparities. Microsoft updated its Face API and published its own demographic performance breakdown. The audit worked as a forcing function precisely because it was external, independent, methodologically rigorous, and public. The companies had the capability to improve their systems before the audit. They lacked the external pressure to prioritize doing so.

What a Genuine Bias Audit Includes

An audit is only as useful as the questions it asks. The NIST AI Risk Management Framework (AI RMF), released in 2023, and the NIST Special Publication 1270 on AI bias identify several mandatory components of a meaningful bias evaluation for visual AI systems:

Disaggregated performance metrics: Overall accuracy is insufficient. The system must be evaluated separately on relevant demographic subgroups — by race, gender, age, skin tone, and any other dimension where disparity may harm users.

Representative test sets: The evaluation dataset must include adequate representation of all relevant subgroups, not just the majority. A test set of 10,000 images that is 95% one demographic cannot detect disparities for the other 5%.

Real-world deployment monitoring: Pre-deployment testing is necessary but not sufficient. Systems must be monitored after deployment because real-world conditions — different cameras, different lighting, different demographic compositions — differ from controlled test conditions.

Real Case — COMPAS Recidivism Algorithm (Northpointe)

While COMPAS is not a visual AI system, its 2016 audit by ProPublica established the methodology that visual AI auditors now follow. ProPublica obtained COMPAS scores for 7,000+ defendants in Broward County, Florida, and found that Black defendants were nearly twice as likely to be falsely flagged as future criminals, while white defendants were more likely to be falsely flagged as low-risk when they later reoffended. The COMPAS case demonstrated that bias auditing requires access to ground-truth outcome data — not just algorithmic outputs — and that vendor claims about fairness cannot substitute for independent empirical testing.

Who Is Responsible — and What the Law Is Starting to Require

Responsibility for visual AI bias is distributed, contested, and increasingly regulated. The developer who trains the model, the vendor who sells it, the organization that deploys it, and the regulator who sets the rules are all implicated — and all have historically found ways to defer responsibility to each other.

The EU AI Act (adopted 2024) assigns legal responsibility to the deployer for high-risk applications in biometrics, education, employment, and law enforcement. It requires pre-market conformity assessments, bias testing, and ongoing incident reporting. The US has moved more slowly — the White House AI Bill of Rights (2022) and NIST AI RMF (2023) are voluntary frameworks, but the Equal Employment Opportunity Commission has indicated existing employment discrimination law already applies to algorithmic hiring tools.

New York City's Local Law 144 (effective 2023) requires employers using automated employment decision tools to conduct bias audits and publish the results — the first such local ordinance in the US with real enforcement teeth.

Disaggregated metrics —Accuracy figures broken down by demographic subgroup, rather than reported as a single overall average that can mask large disparities.

Conformity assessment —Under the EU AI Act, a required pre-market evaluation that high-risk AI systems must pass before deployment.

What Individuals Can Ask and Do

As AI systems increasingly affect employment, healthcare, and public safety, individuals have emerging rights: the right to know an automated system was used in a decision affecting them (required in some jurisdictions), the right to human review of algorithmic decisions, and the right to challenge outcomes under existing anti-discrimination law. Knowing these rights exist — and that the organizations deploying AI are increasingly required to document bias testing — is itself a form of practical power.

The arc of this module is from diagnosis to accountability. Bias enters at data collection and labeling. It causes harm at deployment — in police arrests, in medical diagnoses, in job rejections. It can be measured through rigorous independent audit. It can be reduced through deliberate data work, better validation, and ongoing monitoring. And it can be governed through law — but only when regulators, developers, deployers, and users all understand what is at stake. You now do.

Lesson 4 Quiz

Auditing, Accountability, and Fixing Visual AI Bias

What made the Gender Shades paper effective as a forcing function for change in commercial facial recognition?

Correct. The combination of independence, rigor, and public visibility created market pressure that moved vendors to improve systems they had the capability — but not the incentive — to improve earlier.

Not quite. The paper's power came from being an independent, external, publicly released audit — the vendors lacked the prior external pressure to prioritize fixing the disparities.

New York City's Local Law 144 (2023) requires employers using automated hiring tools to do what?

Correct. Local Law 144 is notable as the first US local ordinance with real enforcement teeth requiring bias audits and public disclosure for automated employment tools.

Not quite. Local Law 144 specifically requires bias audits and public publication of audit results — a transparency and accountability measure, not a ban or transfer requirement.

Why is a single aggregate accuracy figure insufficient when auditing a visual AI system for bias?

Correct. A system that is 95% accurate overall could still have 40% error rates for a minority demographic — and the aggregate figure would hide that entirely.

Not quite. Aggregate accuracy hides subgroup disparities. A 99% accurate system on the majority group can still fail catastrophically on underrepresented groups — you need disaggregated metrics to see it.

Lab 4: Designing a Bias Audit

Conversation lab — build an audit framework for a real-world visual AI deployment

Your Task

Apply what you've learned across all four lessons to design a bias audit for a real-world visual AI deployment. The AI tutor will challenge your reasoning and push you to consider accountability gaps.

Starter prompt: "A city government wants to deploy a visual AI system to analyze body camera footage from police officers — automatically flagging 'escalation events' for supervisor review. Design a bias audit framework for this system before deployment, drawing on the NIST AI RMF principles and the lessons from facial recognition misidentification cases."

AI Tutor — Bias Auditing

Module 7 · Lab 4

Welcome to Lab 4 — the capstone lab for this module. I'll help you build a rigorous bias audit framework drawing on the Gender Shades methodology, NIST AI RMF, and the lessons from wrongful arrest and medical AI cases. Use the starter prompt to begin, or bring your own high-stakes visual AI scenario. What would you like to audit?

Module 7 Test

Bias and Errors in Visual AI — 15 questions · Pass at 80%

1. The Gender Shades paper (Buolamwini & Gebru, 2019) tested facial analysis systems from which three vendors?

Correct. The three vendors audited were IBM, Microsoft, and Face++, on a dataset of 1,270 balanced faces.

Not quite. The three vendors were IBM, Microsoft, and Face++.

2. What was the maximum error-rate disparity found in the Gender Shades study between the best and worst demographic groups?

Correct. The disparity reached 34.7 percentage points — a massive gap between performance on darker-skinned women vs. lighter-skinned men.

Not quite. The study found disparities up to 34.7 percentage points between the highest and lowest-performing demographic groups.

3. "Representation bias" in a training dataset means:

Correct. Representation bias is about proportion — some populations are underrepresented, so the model has weaker internal representations for them.

Not quite. Representation bias specifically refers to unequal frequency of different groups in training data — not annotation quality or consent.

4. Robert Williams was wrongfully arrested in Detroit in January 2020. How long was he held before the error was acknowledged?

Correct. Williams was held for approximately 30 hours before police acknowledged the facial recognition match was incorrect.

Not quite. Williams was held for about 30 hours. Nijeer Parks spent ten days in jail in a separate NJ case.

5. The NIST Face Recognition Vendor Testing (FRVT) 2019 report tested how many algorithms?

Correct. The breadth of the NIST evaluation — 189 algorithms from 99 developers — makes its finding of near-universal demographic disparities especially significant.

Not quite. NIST tested 189 algorithms from 99 developers, making it the most comprehensive facial recognition bias audit of its time.

6. A city bans government use of facial recognition technology in public spaces. Which city did this first, in 2019?

Correct. San Francisco banned government use of facial recognition in 2019, the first major US city to do so.

Not quite. San Francisco was the first major US city to ban government facial recognition, in 2019.

7. The 2019 Nature Medicine dermatology AI study found its system performed worst on which patient group?

Correct. The training dataset's underrepresentation of darker skin tones directly reduced diagnostic accuracy for those patients.

Not quite. The training images skewed heavily toward lighter-skinned patients, leaving the system less reliable for those with darker skin tones.

8. What did the 2020 New England Journal of Medicine study find about pulse oximeters during the COVID-19 pandemic?

Correct. The calibration bias introduced when devices were designed using lighter-skinned participants produced life-threatening measurement gaps during COVID-19.

Not quite. The study found a roughly threefold disparity in missed low-oxygen readings for Black patients — a direct consequence of non-representative calibration data.

9. The ACLU's 2018 test of Amazon Rekognition matched photos of U.S. Congress members to a mugshot database. What did it find?

Correct. The ACLU test produced 28 false matches, with people of color disproportionately misidentified — a demonstration of demographic error disparity in a commercial system.

Not quite. The test produced 28 false matches, and people of color in Congress were disproportionately among those falsely matched.

10. HireVue discontinued its facial analysis component in 2021 primarily because:

Correct. The foundational scientific premise — that facial expressions reliably indicate emotional states in a generalizable way — was challenged by rigorous peer-reviewed work, undermining the product's basis.

Not quite. The discontinuation followed sustained scientific scrutiny questioning whether facial expressions encode stable, cross-cultural emotional signals at all.

11. What is the primary requirement of New York City's Local Law 144 (2023)?

Correct. Local Law 144 requires bias audits with public disclosure — making it the first US ordinance with real enforcement teeth on algorithmic hiring tools.

Not quite. Local Law 144 mandates bias audits and public reporting of results — it does not ban AI hiring tools, it requires accountability for their use.

12. Under the EU AI Act (2024), real-time facial recognition in public spaces is classified as:

Correct. The EU AI Act places real-time remote biometric identification in public spaces in its highest-risk category and generally prohibits it.

Not quite. The EU AI Act places real-time biometric identification in its highest-risk category and bans most real-time public use.

13. "Disaggregated metrics" in an AI bias audit means:

Correct. Disaggregated metrics break down performance by subgroup — the only way to detect disparities that overall accuracy figures can hide.

Not quite. Disaggregation means analyzing results separately per demographic group — essential because a high overall accuracy can mask poor performance on minority subgroups.

14. The ProPublica COMPAS audit (2016) contributed to visual AI bias methodology by demonstrating that:

Correct. ProPublica's methodology — matching algorithmic predictions against actual outcomes for 7,000+ defendants — established the template for rigorous, outcome-based bias auditing.

Not quite. The COMPAS audit's key methodological contribution was requiring ground-truth outcomes to evaluate fairness, not just the algorithm's internal scores or the vendor's claims.

15. Which statement best summarizes the central lesson of Module 7?

Correct. This summarizes the full arc: origin → harm → auditability → reducibility → governance.

Not quite. The module shows that bias is not inevitable — it has documented origins, measurable impacts, and proven paths to reduction through rigorous auditing and institutional accountability.