Module 8 · Lesson 1

When Vision Systems Fail: Documented Harms

The failures weren't hypothetical. They happened to real people, in real courtrooms, at real borders.

What does a pattern of documented failures tell us about what must change?

In January 2020, Robert Williams was arrested in his driveway in front of his daughters by Detroit police. The charge: robbery. The evidence: a facial recognition match to a blurry surveillance image. The match was wrong. Williams was held for 30 hours before detectives admitted the AI had misidentified him. He became the first documented U.S. case of wrongful arrest driven by facial recognition — but not the last.

The Pattern Behind the Cases

Williams' case was followed by those of Michael Oliver (2019, Detroit) and Nijeer Parks (2019, New Jersey) — both Black men wrongfully arrested after facial recognition matched them to crimes they did not commit. All three were eventually exonerated. All three experienced the system's failure in an acutely racialized way: commercially available facial recognition tools consistently show higher error rates for darker-skinned faces, particularly dark-skinned women.

The MIT Media Lab's 2018 Gender Shades study by Joy Buolamwini and Timnit Gebru quantified this precisely: commercial facial analysis tools from IBM, Microsoft, and Face++ misclassified darker-skinned women at rates up to 34.7 percentage points higher than lighter-skinned men. The systems had not been tested rigorously on diverse populations before deployment.

Documented Case — Detroit, 2020

Robert Williams' arrest was the result of Detroit Police Department using DataWorks Plus facial recognition software. A detective had submitted a still image from a store surveillance video to the Michigan State Police database. The system returned Williams as a candidate match. No human examiner independently verified the match before an arrest warrant was issued. The ACLU later filed a formal complaint on Williams' behalf.

Airports, Borders, Benefits

The harms extend well beyond policing. In 2019, a UK Home Office visa photo system rejected thousands of applications from British citizens of East and South Asian descent, flagging their photos as having "eyes closed" — a biometric system misreading normal facial variation. The department quietly revised the rejection guidance after public pressure.

In the United States, U.S. Customs and Border Protection expanded biometric facial matching at airports to cover nearly all international travelers by 2023, with a stated accuracy goal of 99%. However, audits by the Government Accountability Office found that CBP had not systematically measured false match rates disaggregated by race, age, or gender across its full operational deployment — meaning the claimed accuracy figures were not verified under real-world, demographically diverse conditions.

Meanwhile, automated benefit systems using document and face matching denied or delayed claims for individuals whose IDs didn't match facial recognition standards — including elderly claimants and people with certain disabilities affecting facial appearance.

Why Failures Cluster

These harms share a structure. Training data imbalances mean models learn faces from populations that were easier to photograph and label — historically, lighter-skinned, younger, male faces. Threshold calibration decisions — how confident the system must be before returning a match — are often made without accounting for different false-positive rates across demographic groups. And deployment without independent audit means errors go undetected until someone is in handcuffs or denied a visa.

False positive rate disparityWhen a system incorrectly identifies a person as matching a target at significantly different rates across demographic groups, causing differential harm.

Threshold calibrationThe confidence score a recognition system must reach before returning a positive identification — a design choice with direct impact on error rates and who bears the cost of errors.

Demographic disaggregationMeasuring system performance separately across race, gender, age, and other groups to reveal disparities that aggregate accuracy figures conceal.

Core Insight

When a system's errors are not randomly distributed but cluster along demographic lines, the harm isn't just technical — it's discriminatory. Accountability begins with measuring who pays the price when the system is wrong.

Lesson 1 Quiz

When Vision Systems Fail: Documented Harms

Robert Williams' 2020 wrongful arrest in Detroit was driven by which technology failure?

Correct. Detroit police used DataWorks Plus facial recognition; a false candidate match was accepted without independent examiner verification, leading to Williams' arrest and 30-hour detention.

Not quite. The ACLU-documented case established that facial recognition software returned a false match, and investigators did not verify it independently before arresting Williams.

The 2018 Gender Shades study by Buolamwini and Gebru found that commercial facial analysis tools had the highest error rates for which group?

Correct. The study found error rates up to 34.7 percentage points higher for darker-skinned women compared to lighter-skinned men across tools from IBM, Microsoft, and Face++.

Not correct. The study specifically identified darker-skinned women as experiencing the highest misclassification rates across all three commercial platforms tested.

A Government Accountability Office audit of CBP's airport facial matching found that the agency had not:

Correct. The GAO audit found CBP had not systematically verified its accuracy claims under real-world, demographically diverse operational conditions.

Not quite. The GAO's concern was specifically that CBP's stated accuracy figures weren't validated with demographic disaggregation across its real-world deployment population.

Lab 1: Analyzing a Documented Failure

Discuss the structure of visual AI harms with your AI lab assistant

Your Task

You've read about three wrongful arrests and the Gender Shades study. Use this lab to explore the structural factors that allowed these harms to occur — and what earlier interventions might have prevented them.

Start here: "What does the Williams case reveal about the gap between a system's claimed accuracy and its real-world error distribution?"

AI Lab Assistant

Documented Failures

Welcome to Lab 1. We're examining the structural anatomy of documented visual AI failures — the Williams wrongful arrest, the Gender Shades findings, the CBP audit gaps. What aspect would you like to dig into first?

Module 8 · Lesson 2

Algorithmic Auditing: What It Is and Why It's Hard

Measuring a system fairly requires access, methodology, and independence — three things organizations routinely resist granting.

What would it actually take to audit a deployed visual AI system, and who should be allowed to do it?

In 2021, the National Institute of Standards and Technology published FRVT (Face Recognition Vendor Testing) results showing dramatic performance disparities across 189 algorithms from 99 developers. The agency had procured algorithms, run them against standardized test sets, and published disaggregated error rates. It was the most rigorous public audit of commercial facial recognition ever conducted — and it revealed that many algorithms vendors had sold to law enforcement performed far worse on Black and Asian faces than on white ones.

The report also revealed something more uncomfortable: some systems that performed well in lab conditions degraded significantly when tested on the mugshot and visa photos that represented real-world government databases — poor lighting, inconsistent image quality, aging subjects.

What a Rigorous Audit Requires

The NIST FRVT framework demonstrates what genuine algorithmic auditing looks like: a controlled test environment, demographically diverse ground-truth datasets with verified identity labels, standardized error metrics reported by subgroup, independent execution by an entity with no financial stake in the outcome, and public disclosure of results.

Most deployed systems have received none of this. Vendors typically conduct their own internal evaluations and publish summary statistics ("99.9% accuracy") without disclosing test set composition. Buyers — police departments, border agencies, employers — rarely have the technical capacity to challenge these claims. Third-party auditors are often denied API access or sufficient data samples to run independent tests.

Audit Type

Internal (Vendor-Run)

Most common. Conducted by the developing organization. Results often not published. Test sets may not reflect deployment demographics. Subject to publication bias.

Audit Type

Regulatory (Government)

NIST FRVT is the best example. Requires algorithm submission and standardized test execution. Limited to submitted versions — does not test post-deployment updates.

Audit Type

Independent (Academic / NGO)

Gender Shades, the ACLU's surveillance studies. Often rely on publicly accessible APIs or scraped data. Vendors may restrict access once audits reveal problems. High impact but limited scope.

Audit Type

Adversarial (Red Team)

Structured attempts to find failure modes, not just measure average performance. Increasingly required by EU AI Act for high-risk systems. Rare in facial recognition deployment contexts to date.

The Access Problem

In 2020, after Joy Buolamwini's research revealed performance disparities, IBM announced it would exit the facial recognition market entirely, citing concerns about mass surveillance and racial bias. Microsoft stated it would not sell facial recognition to police until federal regulation was in place. These were genuine responses — but they also illustrated a structural gap: the audits that revealed the problems depended on access that companies can revoke at any time.

Researchers attempting to audit Amazon's Rekognition system found that after they published studies showing performance disparities, Amazon changed its API in ways that made direct comparison to previous results difficult. Whether this was product improvement or access restriction is contested, but the episode illustrates why audit rights must be legally protected rather than granted at vendor discretion.

EU AI Act — High-Risk System Requirements

Under the EU AI Act (passed 2024), real-time remote biometric identification systems used in publicly accessible spaces are classified as prohibited or high-risk depending on application. High-risk systems must maintain technical documentation, enable conformity assessments, register in an EU database, and be subject to market surveillance. Notably, the Act requires accuracy metrics to be reported across demographic groups — codifying what NIST's voluntary tests demonstrated was possible.

What Good Audit Metrics Look Like

A technically sound audit of a visual recognition system measures at minimum: false match rate (FMR) — how often the system incorrectly identifies a non-target as the target; false non-match rate (FNMR) — how often a true match is missed; and failure to acquire rate — how often the system cannot process an image at all. Each must be reported by demographic subgroup, not just as an aggregate. The NIST FRVT 1:1 report showed that some algorithms had FMRs 100 times higher for Black women than for white men at the same decision threshold.

False match rate (FMR)The probability that a system incorrectly declares a match between two images of different individuals — the error that drove the wrongful arrests in Lesson 1.

NIST FRVTThe U.S. National Institute of Standards and Technology's Face Recognition Vendor Testing program — the most comprehensive independent evaluation of commercial facial recognition algorithms, with results published publicly since 2018.

Audit access rightsLegal or contractual entitlements allowing independent researchers or regulators to test AI systems — currently absent in most jurisdictions, leaving audits dependent on vendor cooperation.

Lesson 2 Quiz

Algorithmic Auditing: What It Is and Why It's Hard

What did NIST's FRVT 2019 report reveal about many facial recognition algorithms sold to law enforcement?

Correct. The NIST FRVT analysis of 189 algorithms found widespread performance disparities by race, with false match rates sometimes 100 times higher for Black women than white men at the same threshold.

Not quite. NIST found systematic disparities — particularly elevated false match rates for Black and Asian faces — across a large proportion of the commercial algorithms tested.

Why is independent algorithmic auditing structurally difficult for facial recognition systems?

Correct. The access problem is structural: vendors control API access, procurement agencies often lack technical capacity, and audit rights aren't legally protected in most jurisdictions.

Not correct. The difficulty is primarily about access and legal rights — vendors can change or restrict their APIs after audits reveal problems, as happened with Amazon Rekognition research.

Under the EU AI Act, what is required for high-risk biometric systems regarding accuracy reporting?

Correct. The EU AI Act codifies demographic disaggregation of accuracy metrics as a requirement for high-risk systems — translating what NIST showed was methodologically possible into a legal obligation.

Not quite. The EU AI Act specifically requires disaggregated accuracy metrics — meaning reported separately for demographic subgroups — for high-risk biometric systems.

Lab 2: Designing an Audit Protocol

Work through the design of a rigorous facial recognition audit with your AI lab assistant

Your Task

A city police department is considering deploying a commercial facial recognition system for cold case investigations. They've received a vendor report claiming 98.5% accuracy. You are advising the city on what independent audit requirements to demand before deployment approval.

Start here: "What are the first three questions I should ask about that 98.5% accuracy figure before accepting it?"

AI Lab Assistant

Audit Design

I'm ready to help you build an audit protocol. You're advising a city considering facial recognition for cold case work — the vendor claims 98.5% accuracy. Let's figure out what that number actually means and what you need to verify. What would you like to start with?

Module 8 · Lesson 3

Regulatory Frameworks: From Moratoria to Mandates

Cities banned it. The EU regulated it. The U.S. federal government debated it. What each approach actually achieved — and left undone.

What does the spread of real regulatory responses reveal about what governance of visual AI requires?

In May 2019, San Francisco became the first city in the United States to ban government use of facial recognition technology. The ordinance, passed by the Board of Supervisors, applied to city agencies — including police — and required that any surveillance technology acquisition be approved by the board. It was followed by Oakland, Somerville, Boston, Portland, and more than a dozen other cities through 2020–2022.

The bans were significant but narrow: they applied to government actors within specific jurisdictions and said nothing about private employers, landlords, or retail stores. A San Francisco resident could be banned from use of facial recognition by the SFPD while their workplace used it to log attendance and their grocery store used it for loss prevention.

The EU's Tiered Approach

The EU AI Act (formally adopted June 2024) established the most comprehensive binding framework for AI governance globally. For facial recognition specifically, it created a tiered structure: real-time biometric identification in publicly accessible spaces is prohibited for law enforcement except in tightly defined exceptions (searching for missing children, preventing specific imminent threats, prosecuting certain serious crimes). Post-hoc biometric identification is classified as high-risk and requires conformity assessment, registration, and ongoing monitoring.

Critically, the Act placed obligations not just on deployers but on providers — the companies building and selling AI systems. Providers of high-risk systems must conduct fundamental rights impact assessments, maintain technical documentation enabling audit, and register systems in an EU database before placing them on the market. Penalties for violation reach €30 million or 6% of global annual turnover.

2019

San Francisco Ban. First U.S. city ordinance prohibiting government facial recognition. Sparks wave of municipal legislation.

2020

IBM, Microsoft, Amazon pause or exit facial recognition sales to police following George Floyd protests and mounting research showing bias. Voluntary, not legally binding.

2021

EU Artificial Intelligence Act proposal. European Commission publishes draft; biometric surveillance provisions generate intense lobbying from tech industry.

2022

Illinois BIPA enforcement. Illinois Biometric Information Privacy Act — enacted 2008 — sees major litigation. Facebook settles for $650M, Clearview AI faces state suits.

2023

U.S. federal stall. Several proposed federal bills (including the Facial Recognition and Biometric Technology Moratorium Act) fail to advance. No federal law enacted.

2024

EU AI Act adopted. World's first comprehensive AI law enters into force with phased implementation timeline. Biometric prohibitions apply from August 2026.

The Illinois Model: Consent and Notice

Illinois' Biometric Information Privacy Act (BIPA), passed in 2008, predates the current AI governance debate but proved surprisingly powerful. BIPA requires any private entity collecting biometric data — including facial geometry — to: obtain informed written consent before collection; provide a publicly available retention and destruction schedule; and prohibit sale or profit from biometric data. Crucially, it creates a private right of action, meaning individuals can sue violators without waiting for a government enforcement decision.

This mechanism produced the largest AI-related settlements in U.S. history by 2023 — including Facebook's $650 million settlement over its Tag Suggestions feature (which scanned faces in photos to identify users) and ongoing litigation against employers who used facial recognition time clocks without employee consent.

What Moratoria Cannot Do

Municipal bans and voluntary corporate moratoriums share a structural weakness: they leave private-sector deployment unregulated while AI identification capabilities continue to improve. During the period when major tech companies paused sales to police, smaller vendors — Clearview AI being the most documented example — continued supplying facial recognition data to law enforcement in jurisdictions with no prohibition, scraping billions of images from public social media to build the largest private face database in existence.

Clearview AI: The Private-Sector Gap

Clearview AI built a database of approximately 30 billion facial images scraped from social media without user consent and sold access to law enforcement agencies. By 2023, it had been used by over 3,100 agencies in the U.S. alone. It violated terms of service of every major platform it scraped. It was fined or banned in multiple countries (UK: £7.5M fine, Canada: ordered to delete Canadian data, Australia: found to have breached privacy law). It continued operating in the U.S. in the absence of federal legislation.

Private right of actionA legal mechanism allowing individuals to sue directly for violations of a statute, rather than relying solely on government enforcement — the feature that made Illinois BIPA effective where other privacy laws stalled.

Fundamental rights impact assessmentAn EU AI Act requirement for high-risk AI deployers to evaluate potential harms to fundamental rights before deployment — analogous to environmental impact assessments for infrastructure projects.

Lesson 3 Quiz

Regulatory Frameworks: From Moratoria to Mandates

What was the key structural limitation of municipal facial recognition bans like San Francisco's 2019 ordinance?

Correct. Municipal bans regulated government use within city limits but said nothing about employers, retailers, or private entities — leaving large parts of daily life unprotected.

Not quite. The bans applied to city government agencies but left private-sector facial recognition entirely untouched — including workplace and retail uses in the same city.

What made Illinois' Biometric Information Privacy Act (BIPA) unusually effective compared to other state privacy laws?

Correct. BIPA's private right of action enabled direct litigation by individuals, producing major settlements including Facebook's $650M payout — without relying on government enforcement decisions.

Not correct. BIPA's power came from its private right of action — individuals could file suit themselves, not just wait for regulators to act. This produced the largest AI-related settlements in U.S. history.

Clearview AI's continued operation despite bans and fines in multiple countries demonstrated what about facial recognition governance?

Correct. Clearview AI was fined or ordered to cease operations in the UK, Canada, and Australia but continued U.S. operations in the absence of federal law — illustrating the fragmentation problem of jurisdiction-by-jurisdiction governance.

Not quite. Clearview's case specifically illustrates the jurisdictional gap problem: bans in some countries while operating freely in others, particularly in the U.S. where no federal law applied.

Lab 3: Comparing Regulatory Approaches

Explore the trade-offs between different governance models with your AI lab assistant

Your Task

You've examined moratoria, the EU tiered model, and BIPA's private right of action. A state legislature has asked you to advise on the single most impactful provision they could include in a facial recognition governance bill. Use this lab to work through the options.

Start here: "If I could only pick one provision — mandatory audit, private right of action, or deployment moratorium — which would have the broadest protective effect and why?"

AI Lab Assistant

Regulatory Design

Good framing. You're advising a state legislature and need to identify the single highest-impact provision for a facial recognition governance bill. Let's work through the options — mandatory audits, private right of action, and deployment moratoria all have different leverage points. What's your initial thinking?

Module 8 · Lesson 4

Technical Accountability: Dataset Cards, Model Cards, and Human Review

Transparency tools, human-in-the-loop requirements, and participatory design — the technical and procedural levers that make systems more answerable.

What concrete practices make a visual AI system more accountable — not just on paper, but in the field?

In 2019, researchers at Google published a paper proposing Model Cards for Model Reporting — standardized documents that would accompany ML models and disclose: intended use cases, evaluation results by subgroup, performance limitations, and ethical considerations. The proposal emerged directly from the Gender Shades work and conversations about how facial analysis systems had been deployed without disclosing demographic performance disparities.

By 2023, Model Cards had become a standard practice on HuggingFace, where most model uploads now include them — though compliance is voluntary and card quality varies enormously. Dataset Cards, a parallel initiative, document the composition, provenance, and known limitations of training datasets.

Dataset Transparency: The Training Data Problem

Most documented visual AI failures trace back to training data. The faces a model learns from determine what it learns to see accurately. Large Face DB, MS-Celeb-1M, and other widely used facial recognition training datasets were scraped from the web without consent and without demographic auditing. When researchers began examining these datasets — notably the Excavating AI project by Kate Crawford and Trevor Paglen in 2019 — they found troubling patterns: faces labeled with offensive or stereotyped descriptors, heavy skew toward white male faces, images collected without subject knowledge.

IBM subsequently released its Diversity in Faces dataset, explicitly designed to include balanced representation across Fitzpatrick skin types, face shapes, and age groups, with transparent documentation of collection methodology. Microsoft later retracted MS-Celeb-1M entirely after researchers identified that it included images of private individuals collected without consent.

Accountability Tool

Model Cards

Documents intended use, out-of-scope uses, evaluation metrics by demographic subgroup, performance limitations. Proposed by Mitchell et al. (Google), 2019. Widely adopted on HuggingFace.

Accountability Tool

Dataset Cards (Datasheets)

Documents dataset composition, collection methodology, consent practices, known biases, recommended uses. Proposed by Gebru et al. (Microsoft Research), 2018. Now standard on HuggingFace.

Accountability Tool

Human-in-the-Loop Requirements

Mandatory human review before consequential action on AI output. Baltimore, Detroit, and NYC have enacted laws requiring human examiner confirmation before facial recognition results are used as evidence.

Accountability Tool

Participatory Design

Involving affected communities in system design and governance decisions. Rare in commercial practice; piloted in some civic tech contexts. ACLU has documented cases where community input changed deployment decisions.

Human Review in Practice: Detroit's Policy

Following the Williams, Oliver, and Parks wrongful arrests, Detroit City Council passed an ordinance in 2021 requiring that facial recognition results used by police must be reviewed by a trained human examiner before generating a lead, and that the technology cannot be the sole basis for an arrest. Officers must have additional corroborating evidence. The system must be used only for violent crimes, not misdemeanors. Results must be logged and subject to annual audit.

This policy doesn't eliminate the risk of wrongful arrest — human examiners make errors too, and confirmation bias can cause an examiner to accept a weak match — but it creates a multi-layered decision chain and a documented record. Accountability requires a trail: someone must be identifiable as having made each consequential decision.

The NYC Local Law 144 Model — Hiring Algorithms

New York City's Local Law 144 (effective 2023) requires employers using automated employment decision tools — including image-based assessment tools — to conduct annual bias audits by independent auditors, publish summary results publicly, and notify candidates that such tools are being used. It is the first U.S. law mandating third-party auditing of AI hiring tools. Compliance has been inconsistent, but the law establishes the principle that consequential algorithmic decisions require demonstrated fairness, not assumed fairness.

The Limits of Technical Fixes

Technical accountability tools — model cards, datasheets, audits — are necessary but not sufficient. A Model Card can accurately describe a system's demographic performance disparities, and an organization can choose to deploy it anyway. An audit can document bias, and a buyer can ignore the findings. Transparency creates the conditions for accountability but does not guarantee it. The complementary requirement is consequence: mechanisms — legal, financial, reputational — that make deploying a biased system costly rather than merely documented.

This is why the most effective accountability architectures combine technical transparency (Model Cards, audit reports) with legal enforcement (private rights of action, mandatory audit requirements) and procedural safeguards (human-in-the-loop requirements, logging, community oversight boards). Each layer compensates for the limits of the others.

The Accountability Stack

Effective visual AI accountability operates at four levels simultaneously: technical (diverse training data, demographic disaggregation of metrics, adversarial testing); organizational (internal review processes, human-in-the-loop requirements, audit trails); legal (mandatory third-party audits, private rights of action, penalties for disparate harm); and participatory (affected communities have voice in deployment decisions and access to findings).

Model CardA standardized documentation format disclosing a machine learning model's intended use, evaluation results by demographic subgroup, performance limitations, and ethical considerations — proposed by Google researchers in 2019.

Datasheet for DatasetsA documentation standard for training datasets disclosing composition, collection methodology, consent practices, and known biases — enabling accountability for the data that shapes model behavior.

Human-in-the-loopA procedural requirement that a human must review and confirm AI output before consequential action is taken — a safeguard that creates accountability trails and reduces the risk of automated harm.

Lesson 4 Quiz

Technical Accountability: Dataset Cards, Model Cards, and Human Review

Model Cards were proposed in 2019 directly in response to which research finding?

Correct. Model Cards emerged from conversations within Google and the research community directly following the Gender Shades findings — the need to disclose performance disaggregated by demographic group before deployment.

Not quite. Model Cards were proposed as a direct response to the Gender Shades research — the finding that facial analysis systems had been sold and deployed without disclosing that they performed far worse on certain demographic groups.

Detroit City Council's 2021 facial recognition ordinance required which specific safeguard?

Correct. Detroit's post-wrongful-arrest ordinance required trained human examiner review of any facial recognition output before it could generate a police lead, and prohibited use as a sole basis for arrest.

Not correct. Detroit didn't ban the technology — it required a human-in-the-loop: trained examiner review before a lead is generated, plus corroborating evidence beyond the AI output before an arrest.

Why are technical accountability tools like Model Cards described as "necessary but not sufficient"?

Correct. Transparency creates the conditions for accountability but doesn't guarantee it — the complementary requirement is consequence: mechanisms that make deploying a biased system costly, not just documented.

Not quite. The core limitation is that transparency without enforcement is optional. A Model Card can accurately document disparate performance, and a deployer can choose to ignore it. Effective accountability requires both disclosure and consequence.

Lab 4: Building the Accountability Stack

Design a multi-layer accountability architecture for a real deployment scenario

Your Task

A mid-sized city's transit authority wants to deploy facial recognition to identify individuals on a watch list for violent incidents at stations. You've been asked to design the full accountability stack — technical, organizational, legal, and participatory layers — for this deployment. Use the lab to work through what each layer should require.

Start here: "Walk me through what a Model Card for this transit facial recognition system must disclose, and what the most important metrics are for the demographic audit."

AI Lab Assistant

Accountability Architecture

You're designing a full accountability stack for a transit authority facial recognition deployment. This is exactly the kind of high-stakes context where all four accountability layers matter — technical, organizational, legal, and participatory. Let's start with the Model Card and audit metrics. What would you want to know first about how the system was built and tested?

Module 8 Test

Building a More Accountable Visual AI · 15 questions · Pass at 80%

1. Robert Williams was wrongfully arrested in 2020 after Detroit police used facial recognition. What critical procedural failure compounded the AI error?

Correct. The lack of independent human verification before acting on the AI output was the critical procedural failure that turned a false positive into a wrongful arrest.

Not quite. The compounding failure was that no trained human examiner independently verified the AI-generated candidate match before Detroit police sought and executed an arrest warrant.

2. The Gender Shades study (Buolamwini and Gebru, 2018) found commercial facial analysis tools had error rates for darker-skinned women up to how many percentage points higher than for lighter-skinned men?

Correct. The Gender Shades study documented up to 34.7 percentage point error rate disparities across tools from IBM, Microsoft, and Face++.

Not correct. The Gender Shades study found disparities as large as 34.7 percentage points between darker-skinned women and lighter-skinned men in commercial facial analysis tools.

3. NIST's FRVT program is significant because it represents which type of audit?

Correct. NIST FRVT is an independent government testing program — vendors submit algorithms, NIST runs standardized evaluations, and results including demographic breakdowns are published publicly.

Not correct. NIST FRVT is a regulatory/government audit: vendors submit algorithms, NIST independently tests them on standardized datasets, and publishes results including demographic disaggregation.

4. False Match Rate (FMR) measures which type of error in a facial recognition system?

Correct. FMR is the false positive rate — the probability the system incorrectly declares a match between images of different individuals. This is the error type most directly linked to wrongful identification.

Not correct. FMR is the false positive rate: the probability the system incorrectly declares a match between two images of different people — the error type that drove the documented wrongful arrests.

5. San Francisco's 2019 facial recognition ban was followed by at least how many other U.S. cities enacting similar measures through 2022?

Correct. Oakland, Somerville, Boston, Portland, and more than a dozen other cities followed San Francisco's lead in banning government use of facial recognition through 2020–2022.

Not quite. More than a dozen U.S. cities enacted similar municipal bans following San Francisco, including Oakland, Somerville, Boston, and Portland.

6. What was Illinois BIPA's key enforcement advantage over most other U.S. state privacy laws?

Correct. BIPA's private right of action enabled individuals to file suit directly, producing major settlements including Facebook's $650M payout — without relying solely on government enforcement decisions.

Not correct. BIPA's distinguishing feature was its private right of action — affected individuals could sue directly without waiting for a government agency to act, making enforcement far more reliable.

7. Clearview AI's continued U.S. operation despite fines and orders to cease in the UK, Canada, and Australia illustrated which governance failure?

Correct. Clearview's continued U.S. operation while facing bans and fines elsewhere demonstrates how jurisdictional gaps allow actors to operate wherever regulation is absent — illustrating the fragmentation problem.

Not correct. Clearview's case is a textbook example of the jurisdictional gap: bans in some countries while operating freely in others, specifically exploiting the absence of U.S. federal facial recognition law.

8. The EU AI Act classifies real-time remote biometric identification in publicly accessible spaces for law enforcement as:

Correct. The EU AI Act prohibits real-time biometric identification in public spaces for law enforcement except in tightly defined exceptions — searching for missing children, preventing imminent specific threats, or prosecuting certain serious crimes.

Not correct. The EU AI Act treats real-time biometric ID in public spaces as prohibited or high-risk depending on application, not permitted under standard licensing. Narrow exceptions exist for specific, defined threat scenarios.

9. Model Cards were proposed by researchers at which organization in 2019?

Correct. The Model Cards proposal (Mitchell et al., 2019) came from Google researchers and emerged from conversations following the Gender Shades findings about undisclosed demographic performance disparities.

Not correct. Model Cards were proposed by Mitchell et al. at Google in 2019, in the context of the research community's response to the Gender Shades findings.

10. Microsoft's retraction of the MS-Celeb-1M dataset was prompted by what finding?

Correct. Researchers investigating MS-Celeb-1M found it included images of private individuals scraped without consent, prompting Microsoft to retract it — illustrating the provenance and consent problems in facial recognition training data.

Not quite. MS-Celeb-1M was retracted after researchers found it included images of private individuals collected without their knowledge or consent — a training data provenance and ethics problem.

11. Detroit's 2021 facial recognition ordinance (post-wrongful arrest) was limited to which types of cases?

Correct. Detroit's ordinance restricted facial recognition use to violent crime investigations, explicitly prohibiting its use for misdemeanors — a scope limitation intended to match the technology's risk to the seriousness of the crime.

Not correct. Detroit's ordinance specifically restricted facial recognition use to violent crimes and prohibited it for misdemeanor investigations — recognizing that the risk-benefit calculation differs by offense type.

12. New York City's Local Law 144 (2023) was significant because it was the first U.S. law to require:

Correct. Local Law 144 mandated third-party bias audits of automated employment decision tools annually, with public disclosure of results — establishing the principle that algorithmic hiring tools must demonstrate fairness, not assume it.

Not correct. NYC Local Law 144 was specifically about AI hiring tools — it required annual independent bias audits and public disclosure of results, making it the first U.S. law mandating third-party auditing of employment AI.

13. The "Excavating AI" project (Crawford and Paglen, 2019) investigated facial recognition training datasets and found:

Correct. Crawford and Paglen's Excavating AI investigation found troubling labeling practices, demographic skew toward white male faces, and images collected without subjects' knowledge — foundational problems in the training data underpinning many commercial systems.

Not correct. Excavating AI found the opposite: offensive labeling, demographic skew, and unconsented image collection were pervasive in widely-used facial recognition training datasets.

14. Why is transparency through Model Cards described as insufficient on its own to ensure accountability?

Correct. Transparency creates conditions for accountability but doesn't guarantee it. Without enforcement mechanisms — legal liability, financial penalties, community oversight — disclosed harms can simply be accepted and ignored.

Not quite. The core problem is consequence: a deployer can read an accurate Model Card documenting demographic performance disparities and deploy the system anyway. Transparency requires enforcement mechanisms to become accountability.

15. The "accountability stack" framework in this module argues that effective visual AI accountability requires layers at which four levels?

Correct. The four layers are: technical (diverse data, disaggregated metrics, adversarial testing), organizational (human-in-the-loop, audit trails), legal (mandatory audits, private rights of action), and participatory (community voice in deployment decisions).

Not correct. The accountability stack described in this module operates at four levels: technical, organizational, legal, and participatory — each compensating for the limits of the others.