In 1890, the Eastman Kodak Company introduced the hand-held camera and, almost immediately, lawyers began arguing about something no one had anticipated: could a photograph taken of a person in public be published without their consent? Louis Brandeis and Samuel Warren wrote "The Right to Privacy" that same year, warning that "instantaneous photographs" threatened to expose private life in ways the law had never contemplated. The technology arrived first; the ethics and law scrambled to follow for the next four decades, culminating in portrait-rights statutes that varied wildly by state well into the 1930s.
The same scramble is happening now, but compressed into years rather than decades. Between 2016 and 2023, algorithmic systems moved from novelty to infrastructure: they set bail amounts in Broward County, Florida; filtered job applications at Amazon; scored creditworthiness at tens of millions of banks; and flagged welfare fraud in the Netherlands β where a 2020 court ruling found the government's SyRI risk-scoring system violated human rights law and ordered it shut down. Each of these deployments was decided by engineers and product managers working under commercial pressure, without any agreed ethical framework governing what they were building.
This course will not make you an AI engineer. It will make you a sharper reader of the systems that govern modern life β someone who can identify when an automated decision deserves scrutiny, who bears accountability when it goes wrong, and what structural checks can constrain the worst outcomes. Four lessons, four labs, one module test. The goal is practical literacy, not false comfort.
If you finish every module, here's who you become:
Every defendant who passed through Broward County, Florida's criminal courts between 2013 and 2016 received a COMPAS score β a number between one and ten rating their likelihood of reoffending. Judges used those scores in bail and sentencing decisions. In 2016, investigative journalists at ProPublica analyzed 7,000 cases and published their findings under the headline "Machine Bias." The core finding: COMPAS was nearly twice as likely to falsely flag Black defendants as future criminals, and nearly twice as likely to falsely label white defendants as low risk. The algorithm's creator, Northpointe Inc., disputed the methodology. The Wisconsin Supreme Court, in State v. Loomis (2016), ruled that COMPAS scores could be used in sentencing as long as they were not the "determinative" factor. No court has ever had access to the algorithm's source code β it remains a trade secret.
The COMPAS case distills nearly every tension this module will address: opacity (no one could inspect the model), disparate impact (statistically unequal outcomes by race), accountability gaps (Northpointe bears no legal liability for wrongful sentences), and automation bias (judges deferring to a number they couldn't interrogate). None of these problems required bad intentions to materialize. They emerged from design choices made before a single defendant ever stood in court.
An automated decision-making system (ADMS) is any process in which a computational model produces an output that directly or substantially influences a consequential outcome for a person β without that person having real-time recourse to a human reviewer. The key word is consequential: a spam filter is automated, but missing an email is recoverable. A bail algorithm is automated, and the consequences of its errors fall on people who may spend months in pretrial detention they cannot afford to challenge.
Three components are present in virtually every ADMS. First, a training dataset β the historical records from which the model learns patterns. Second, a feature set β the variables the model actually uses (age, zip code, prior arrests, credit history). Third, an objective function β the thing the model is mathematically optimized to predict (recidivism, default probability, click-through rate). The ethical problems of AI decision-making cluster around these three components more than anywhere else.
In 2018, researchers at MIT and Microsoft published an audit of three major commercial facial analysis systems β Microsoft Azure, IBM Watson, and Face++ β showing that error rates for darker-skinned women ran as high as 34.7%, while error rates for lighter-skinned men were below 1%. The researchers, Joy Buolamwini and Timnit Gebru, named this disparity "intersectional accuracy gaps." Their paper, "Gender Shades," became the most-cited work in algorithmic fairness research that year. The cause was straightforward: all three systems had been trained on datasets that were disproportionately composed of light-skinned male faces, because that is what was easiest to scrape from publicly available sources in the early 2010s.
The lesson is not that datasets are always biased, but that they always reflect the conditions under which they were collected. If a hospital system trains a diagnostic model on patient records from 2000β2015, it inherits whatever systematic disparities existed in who received diagnoses and what treatments were documented. A model predicting loan default trained on FICO data inherits decades of redlining-era credit exclusions. The model does not invent discrimination; it compresses historical discrimination into a number and applies it at scale.
In 2019, researchers at UC Berkeley published a study in the Proceedings of the National Academy of Sciences showing that a health-care algorithm used by major US hospital systems β affecting an estimated 200 million people β systematically under-prioritized Black patients for high-risk care programs. The algorithm used past health care spending as a proxy for illness severity. Because Black patients had historically received less care, they had lower spending records, so the model rated them as less sick even when their clinical indicators were equivalent or worse.
Proxy discrimination occurs when a model uses a variable that is not a protected characteristic β race, gender, religion β but that correlates so strongly with a protected characteristic that it effectively substitutes for it. Zip code is the canonical example: in US cities shaped by residential segregation, zip code correlates powerfully with race. A model that uses zip code as a feature for loan approval, insurance pricing, or criminal risk assessment can reproduce race-based discrimination without ever containing a "race" variable.
In 2014, the Federal Trade Commission published a report examining the relationship between big data and discrimination, warning that zip code, purchasing behavior, and even browsing history could function as proxies for protected class membership. In 2016, ProPublica's follow-up investigation into car insurance pricing in six states found that major insurers charged higher premiums in minority neighborhoods than in white neighborhoods with identical risk profiles, apparently because of zip-code-based modeling. No insurer's algorithm contained the word "race."
A model optimizes for what it is told to optimize for, and nothing else. This seems obvious until you realize how difficult it is to specify what you actually want in mathematical terms. Amazon ran an internal machine-learning recruiting tool from 2014 to 2017 that scored rΓ©sumΓ©s on a scale of one to five. The model was trained on rΓ©sumΓ©s submitted over the previous decade β a dataset that skewed heavily male, reflecting tech industry demographics. By 2015, Amazon's engineers discovered the system had learned to penalize rΓ©sumΓ©s containing the word "women's" β as in "women's chess club" β and had downgraded graduates of two all-women's colleges. Amazon disbanded the project in 2017 without ever deploying the tool publicly, a fact that became public only through a Reuters investigation in October 2018.
The objective function was "predict which candidates Amazon will hire." The model correctly learned that pattern. The problem is that "candidates Amazon historically hired" was not the same as "best qualified candidates." The optimization target was precisely defined but conceptually wrong. This gap β between the quantity you can measure and the outcome you actually care about β is sometimes called Goodhart's Law in the context of AI: when a measure becomes a target, it ceases to be a good measure.
Every ADMS embeds three sets of human value judgments: which data to collect, which features to use, and what outcome to optimize for. These are not technical choices. They are ethical choices made by specific people under specific constraints β and the machine will execute them at scale without further deliberation.
You will analyze a real automated decision-making system with the AI assistant below. Choose one of the following systems and interrogate it: the COMPAS recidivism scoring system, the Amazon rΓ©sumΓ© screener, or the health-care risk-scoring algorithm identified in the 2019 UC Berkeley PNAS study.
For your chosen system, work through: (1) what training data it likely used and what biases that data may have embedded, (2) what its objective function was and how that objective may have diverged from the actual goal, and (3) who bears accountability when the system produces a harmful outcome.
In the months after ProPublica published "Machine Bias," Northpointe and a group of academic statisticians published rebuttals arguing that COMPAS was, in fact, fair β because among defendants who scored as high risk, Black and white defendants reoffended at statistically similar rates. ProPublica's team agreed with this fact but maintained the system was unfair β because the false positive rate (being labeled high risk when you would not reoffend) was dramatically higher for Black defendants. Both sides were correct by their own definition of fairness. The conflict was not about data; it was about which definition of fairness ought to govern.
In December 2016, three independent research groups β at Cornell, Google, and the Max Planck Institute β published papers within weeks of each other demonstrating a formal impossibility result: calibration (equal accuracy across groups) and error-rate parity (equal false positive and false negative rates across groups) cannot both be satisfied simultaneously when base rates differ between groups. This became known in the research literature as the "fairness impossibility theorem." No algorithm, however well-designed, can be fair by all reasonable definitions at once. This is not a temporary problem awaiting a better model. It is a permanent mathematical constraint.
Academic and industry researchers have proposed more than twenty distinct formal definitions of algorithmic fairness since 2016. The most frequently debated cluster around five core concepts:
Demographic parity requires that the algorithm's positive outcome rate be equal across demographic groups. If 30% of white loan applicants are approved, 30% of Black loan applicants must also be approved. This definition ignores whether individuals within each group differ in their actual default risk.
Equalized odds, proposed by Hardt, Price, and Srebro at NeurIPS 2016, requires that both the true positive rate and the false positive rate be equal across groups. It is stricter than demographic parity because it conditions on the actual outcome.
Calibration (also called predictive parity) requires that a score of, say, 7 out of 10 mean the same probability of the predicted event across all groups. If the score means a 70% reoffense probability for white defendants, it should also mean 70% for Black defendants.
Individual fairness, formalized by Dwork et al. in 2012, requires that similar individuals receive similar outcomes. It sidesteps group comparisons entirely β but requires agreement on what "similar" means, which is itself a value-laden judgment.
Counterfactual fairness, proposed by Kusner et al. in 2017, asks: would the outcome change if the individual's protected characteristic were different, holding all else equal? This approach grapples with the fact that changing one attribute β like race β counterfactually would change many others in a racially structured society.
When base rates differ between groups β as they do for recidivism, loan default, and many other predicted outcomes β you cannot simultaneously achieve calibration and equalized false positive rates. Choosing one definition means accepting worse performance on another. This is not a technical problem. It is a political and moral choice about whose errors society is willing to tolerate.
In 2018, the New York City government passed Local Law 49, creating a task force to study the use of automated decision systems in city agencies. The task force's 2019 report β a landmark document in municipal AI governance β found that no city agency had documented which fairness definition, if any, it had used in procuring or deploying algorithmic tools. Vendors provided accuracy statistics; they did not provide fairness audits.
The same year, NIST (the National Institute of Standards and Technology) began a years-long project to evaluate facial recognition algorithms, eventually publishing results showing that nearly all commercial systems had higher false match rates for Black women, East Asian men, and older adults than for young white men β with false match rates in some cases 10 to 100 times higher. The NIST results were not disputed. What remained disputed was whether those error differentials were acceptable and, if not, who had the authority to say so.
In the European Union, the 2021 AI Act moved toward resolving the governance question by designating certain AI applications β including AI used in criminal justice, credit scoring, and hiring β as "high-risk systems" subject to mandatory conformity assessments, transparency requirements, and human oversight obligations. The Act does not specify which fairness definition applies; it requires that developers document which definition they have chosen and justify it. This shifts the burden from implicit to explicit β which is itself a significant change.
Any time an organization deploys an automated decision system and claims it is "fair," the first question to ask is: fair by which definition? The second question is: who made that choice, when, and with whose input? In the absence of explicit answers, the choice was made implicitly β by an engineer's default settings, a vendor's benchmark, or no one in particular.
In 2020, a research team at Stanford published a study in Science examining a child welfare screening tool used in Allegheny County, Pennsylvania. The system, called the Allegheny Family Screening Tool (AFST), scored families referred to child protective services to help prioritize investigations. The researchers found that the tool performed with statistical parity across racial groups on some metrics but not others β and that the choice of which metric to prioritize had been made by the county's data analytics director, a single official, without a public deliberation process.
The Allegheny case is instructive not because the tool was irresponsible β it was, by many accounts, one of the more carefully designed public-sector AI systems in the United States β but because it illustrates that even careful deployment embeds value choices that were never put to a democratic vote. The families whose lives the tool affected had no mechanism to challenge the definition of fairness that governed it.
You are advising a county government that is evaluating a pretrial risk assessment tool. The tool predicts whether a defendant will miss their court date. Two fairness definitions are in conflict: calibration (the scores mean the same probability across racial groups) and equalized false positive rates (equal rates of incorrect high-risk labels across groups). The base rates of prior court appearance differ between groups due to historical enforcement patterns.
Work with the AI assistant to: (1) explain why both definitions cannot be satisfied simultaneously in this case, (2) articulate the real-world consequences of prioritizing one over the other, and (3) identify who should legitimately make this choice and through what process.
Between 2014 and 2020, the Dutch government operated the System Risk Indication β SyRI β a centralized data-fusion platform that combined seventeen categories of government data (tax records, benefits claims, employment records, housing registrations) to generate risk scores for welfare fraud. Municipalities could request SyRI analyses of entire neighborhoods; the outputs were lists of individuals scored as high-risk for investigation. Crucially, the scored individuals were never told they had been scored, could not inspect the algorithm, and had no mechanism to challenge their placement on an investigation list. In February 2020, a Dutch court ruled that SyRI violated Article 8 of the European Convention on Human Rights β the right to private life β partly because the government could not explain how the algorithm worked even to the court. The system was shut down. No individual official was found liable. No vendor faced penalties.
The SyRI case illustrates what legal scholars call the accountability gap in automated decision-making: when a harm occurs, responsibility disperses across the chain of actors β the government ministry that contracted the system, the vendor that built it, the municipality that deployed it, the caseworkers who acted on its outputs β in ways that allow each actor to credibly claim limited responsibility while no single actor bears the full burden of accountability.
The accountability gap in AI systems is structural, not incidental. It arises from three features of how complex algorithmic systems are built and deployed. First, distributed development: modern AI systems are built from open-source libraries, third-party APIs, fine-tuned foundation models, and proprietary components β each developed by different organizations under different governance frameworks. No single actor has full visibility into the whole system.
Second, contractual diffusion: service agreements between AI vendors and deploying organizations typically limit vendor liability to the direct cost of the software contract, exclude consequential damages, and place responsibility for deployment decisions on the customer. When a hospital uses a third-party diagnostic AI, the contract may specify that the hospital bears sole responsibility for clinical decisions β even if those decisions were driven by the AI's outputs.
Third, automation bias creates a paradox: the more consequential the decision, the more likely humans are to defer to the algorithmic output, yet the more likely those humans are to claim the decision was "ultimately human" when accountability is sought. Judges who routinely followed COMPAS scores but occasionally overrode them could truthfully claim they made their own decisions. The algorithm provided cover without providing accountability.
When an Uber autonomous test vehicle struck and killed Elaine Herzberg in Tempe, Arizona β the first fatal crash involving a self-driving car β investigators found the vehicle's safety software had detected Herzberg 6 seconds before impact but classified her as an "unknown object" and then as a "vehicle" before finally classifying her correctly as a pedestrian. Emergency braking had been disabled to prevent erratic vehicle behavior. The human safety driver was watching a streaming video. Uber faced no criminal conviction. The safety driver was charged with negligent homicide in 2020. The question of whether Uber's engineers bore criminal responsibility remained contested through 2023 without resolution.
Existing legal frameworks for accountability were built around human decision-makers and identifiable physical products β not statistical models whose outputs are probabilistic and whose inner workings may be opaque even to their creators. Three frameworks have been attempted with limited success.
Products liability applies when a product is defective by design or manufacture. Courts in several jurisdictions have considered whether an AI system that produces discriminatory outputs is a "defective product," but the probabilistic nature of algorithmic outputs makes the causation analysis complex: the system did not malfunction; it performed exactly as designed, and the design reflected contested value choices.
Negligence requires a duty of care, a breach of that duty, causation, and harm. The challenge in AI contexts is establishing both duty (did the vendor owe a duty to end users it never contracted with?) and causation (was the harm caused by the algorithm's output, or by the human who acted on that output?).
Anti-discrimination law, specifically disparate impact doctrine under US civil rights law, prohibits employment practices that produce statistically significant disparate outcomes across protected groups unless the employer can show business necessity. In 2021, Illinois passed the AI Video Interview Act, requiring employers using AI to analyze video interviews to disclose their use and provide a mechanism for job applicants to request the removal of their biometric data β one of the first US laws imposing affirmative obligations on AI deployers toward affected individuals.
The EU's GDPR (General Data Protection Regulation, effective 2018) includes a right not to be subject to solely automated decisions that produce significant effects, and a right to an explanation of such decisions. In practice, "explanation" has been interpreted minimally by most companies β often amounting to a general description of the model's purpose rather than an account of why this individual received this outcome. The gap between legal requirement and practical implementation remains wide.
In 2021, the US Federal Trade Commission published guidance titled "Aiming for Truth, Fairness, and Equity in Your Company's Use of AI," listing practices the agency considered potentially unfair or deceptive β including using biased data, failing to test for disparate impact, and deploying AI without meaningful human oversight. The FTC's guidance stopped short of creating new legal rights but signaled the agency's intent to use existing Section 5 unfair practices authority against egregious AI deployments.
The most concrete structural intervention to date is the EU AI Act's "conformity assessment" requirement for high-risk systems: before deployment, vendors must document intended use, training data characteristics, performance metrics disaggregated by relevant groups, residual risk, and human oversight mechanisms. These requirements do not guarantee accountability after harm, but they create a paper trail that makes accountability claims more tractable.
A municipal hospital has deployed a third-party AI diagnostic support tool that flagged a patient incorrectly as low-risk for sepsis. The patient deteriorated without intervention and suffered permanent organ damage. The tool was built by a startup using a foundation model licensed from a large AI company. The hospital purchased the tool after a procurement review. The treating physician saw the AI output and did not order additional tests.
Work with the AI assistant to map every actor in this chain, identify what accountability claim each actor could make, and identify where β if anywhere β accountability actually lodges. Then propose at least one structural change that would close the accountability gap.
In 2016, Facebook employed approximately 4,500 human content reviewers globally to oversee content moderation decisions. By 2021, the company had grown that workforce to approximately 15,000 β while operating platforms used by 3.5 billion people. Each reviewer was responsible for assessing hundreds of pieces of content per shift, in languages they were often not native speakers of, under time-pressure metrics that penalized slow decisions. Algorithmic systems made initial classification decisions; humans reviewed appeals and edge cases. When the Facebook Oversight Board β an independent body established in 2020 β reviewed cases referred to it, it overturned Facebook's decisions in more than 80% of cases in its first year. Human oversight existed at every formal level of the system. It was also structurally incapable of meaningfully reviewing more than a fraction of a percent of consequential decisions.
The Facebook moderation case illustrates the core tension in oversight design: adding human reviewers to a system operating at machine scale does not produce meaningful oversight unless those reviewers have the time, information, authority, and incentive to actually override algorithmic recommendations. In the absence of those conditions, human review provides liability cover without providing genuine accountability.
In 2021, the EU AI Act proposed a spectrum of human oversight requirements calibrated to risk level. At the highest risk level β which includes biometric identification systems, credit scoring, employment screening, and criminal justice tools β the Act requires that human overseers be able to "fully understand the system's capabilities and limitations," "monitor its operation," and "override or refuse" its outputs. Crucially, the Act requires that human overseers have "the necessary competence, training and authority" to perform these functions.
Academic researchers studying human-AI decision-making teams have documented consistent findings on the conditions under which humans actually override algorithmic recommendations. A 2019 study by Ben Green and Yiling Chen at Harvard found that when human decision-makers were given algorithmic risk scores for recidivism, they did not simply defer to the algorithm β but they also did not improve on it. Instead, they exhibited a systematic pattern: they deferred to the algorithm on cases where the algorithmic prediction aligned with their intuition, and overrode it in cases where it did not, producing outcomes no better than the algorithm alone while introducing additional inconsistency from case to case.
The presence of a human in a decision process does not guarantee meaningful oversight. Research by Deanna Messervey and colleagues (2019) found that decision-makers under time pressure with high caseloads routinely approved algorithmic recommendations without substantive review β a phenomenon they called "check-box compliance." The human fulfilled the formal oversight requirement while providing no substantive check on the algorithm's outputs.
Transparency is frequently proposed as the primary mechanism for enabling accountability: if affected individuals can understand why an algorithmic decision was made, they can challenge it; if the public can inspect algorithmic systems, they can demand corrections. Both propositions are weaker in practice than they appear in theory.
In 2020, ProPublica launched its "Machine Bias" follow-up, examining three cities' use of predictive policing software: PredPol (now Geolitica) in Santa Cruz, New Orleans, and Los Angeles. In Los Angeles, the LAPD used the system for years without disclosing its use to the public or the city council. When disclosure was finally made, the department released aggregated accuracy statistics but declined to provide the data inputs or model specifications, citing the vendor contract. Santa Cruz became the first US city to ban predictive policing outright in 2020, citing civil rights concerns. New Orleans terminated its contract after learning that the city had entered a secret arrangement with Palantir Technologies without council approval, a fact reported by The Verge in 2018.
Transparency requirements face two structural obstacles. First, vendors frequently claim trade secret protection for model architectures and training data, which courts have generally upheld β leaving affected individuals unable to inspect the systems that govern them. Second, even when model details are technically disclosed, meaningful interpretation requires statistical expertise that most affected individuals and their advocates do not have. Transparency that requires a PhD to decode is not, in practice, transparency for affected communities.
Algorithmic auditing β systematic external evaluation of an AI system's inputs, outputs, and processes β has emerged as a proposed middle ground between full transparency and opacity. Joy Buolamwini's Algorithmic Justice League and the AI Now Institute have developed audit frameworks. In 2021, the FTC's policy guidance endorsed regular auditing by qualified third parties as a best practice. New York City's Local Law 144 (2023) became the first US law to require bias audits of AI hiring tools, with public disclosure of results β a model that may spread to other jurisdictions.
Research on effective human-AI oversight converges on several structural requirements that go beyond simply placing a human in the approval chain. First, overseers need calibrated uncertainty information β not just the algorithm's recommendation but a clear representation of the confidence level and the conditions under which the model is known to perform poorly. Second, overseers need adequate time β high-volume review under time pressure consistently degrades decision quality to or below the algorithm's baseline. Third, overseers need access to information the algorithm did not use β if the human reviewer sees only what the algorithm saw, override becomes nearly impossible to justify.
The 2022 US Executive Order on Improving Government's Investigative and Review Capabilities for AI incorporated several of these principles into federal guidance, requiring that AI systems used in government benefit and services decisions include "meaningful human review" defined as review by someone with independent access to the underlying case file β not just the algorithmic output. Whether this guidance will be consistently implemented and enforced remains an open question as of 2024.
You are advising a state benefits agency that is deploying an AI system to flag benefit applications for additional review. The system will process approximately 50,000 applications per month. The agency has proposed a "human oversight" mechanism in which 12 reviewers spend 2 minutes per flagged application checking the AI's recommendation before approving or denying.
Work with the AI assistant to: (1) identify the specific ways this oversight mechanism fails to meet the criteria for meaningful oversight, (2) propose a redesigned oversight structure that addresses those failures, and (3) estimate the resource and process implications of your proposed design. Draw on the research and cases from Lesson 4.