Module 7 · Lesson 1

What Does Autonomy Mean for a Machine?

Defining the spectrum from automation to moral agency — and why the distinction matters

At what point does a machine's decision stop being our decision?

At 9:58 p.m. on a darkened stretch of North Mill Avenue, a Volvo XC90 operated by Uber's Advanced Technologies Group struck and killed Elaine Herzberg as she walked her bicycle across the road. The vehicle's LIDAR detected her 5.6 seconds before impact. Its software classified her, in sequence, as an unknown object, then a vehicle, then a bicycle — never stabilizing on "pedestrian." An automatic emergency braking system had been disabled by Uber engineers to reduce what they called "erratic behavior." A human safety operator was inside the vehicle, eyes on a phone. The system did not alert her in time. Nobody pressed the brakes.

The National Transportation Safety Board's 2019 report found that the system had no contingency for objects outside its anticipated categories. The "autonomous" vehicle could act — but it could not reason about what it did not expect. That gap, between the ability to act and the capacity to reason responsibly, is where this module begins.

The Autonomy Spectrum

Autonomy in machines exists on a continuum. At the lowest end sits simple automation: a thermostat that turns heat on when temperature drops below a threshold. It executes a fixed rule with no perception of context. At the highest end sits a hypothetical moral agent: a system that perceives its environment, evaluates options against internalized ethical principles, acts, and reflects on the consequences of its action.

Most AI systems deployed today occupy the middle ranges. The Society of Automotive Engineers (SAE) formalized this for vehicles with its J3016 standard, defining six levels of driving automation from Level 0 (no automation) to Level 6 — a hypothetical full automation level. The Uber ATG vehicle was nominally operating at SAE Level 3, where the system handles driving tasks but the human must be ready to intervene. The tragedy in Tempe revealed that Level 3 creates a dangerous ambiguity: the machine acts with authority, the human retains responsibility, but the handoff between them is unreliable.

Philosophers distinguish between behavioral autonomy (the system acts without moment-to-moment human direction) and moral autonomy (the system bears genuine responsibility for its choices). Contemporary AI systems achieve increasing degrees of behavioral autonomy while possessing zero moral autonomy in any philosophically defensible sense. They cannot be punished, feel remorse, or update their values through lived experience.

Key Distinction

A system can be behaviorally autonomous — acting without human oversight in the moment — while still being morally heteronomous, meaning its values and constraints were entirely set by designers. The system's "choices" are always traceable back to human decisions made earlier in its design process.

Three Frameworks for Thinking About Machine Agency

Intentional Stance (Dennett, 1987): Daniel Dennett argued that we can usefully treat any system — thermostat, chess engine, or person — as if it has beliefs, desires, and intentions, as a predictive shortcut. This does not mean the system actually has those mental states. When we say a chess engine "wants to protect its queen," we are using the intentional stance to predict behavior, not asserting inner experience.

Degrees of Moral Patiency vs. Agency: Luciano Floridi and J.W. Sanders proposed in their 2004 paper "On the Morality of Artificial Agents" that moral agency need not be binary. They suggested AI systems can be evaluated on three criteria: interactivity (does the system respond to its environment?), autonomy (does it act without direct human control?), and adaptability (does it modify its behavior based on experience?). By these criteria, many current AI systems qualify as low-degree moral agents — not because they have inner lives, but because their behaviors have moral consequences without continuous human direction.

Responsibility Gap (Matthias, 2004): Andreas Matthias identified what he called the "responsibility gap" — the fact that as machine learning systems become more autonomous and less predictable, there may be harms for which no human individual is clearly responsible. The programmer could not have foreseen the specific decision. The operator did not make it. The manufacturer disclaimed it contractually. The gap is not a technical failure. It is a structural feature of how autonomous systems distribute — and sometimes dissolve — accountability.

Real Case · Knight Capital Group, August 1 2012

Knight Capital Group's trading algorithm executed 4 million trades in 45 minutes after engineers accidentally deployed old test code. The system lost $440 million — more than Knight's net income for the prior year. No single human made any of those 4 million decisions. The responsibility gap was financial, immediate, and total. Knight Capital ceased to exist as an independent firm within days.

Why This Module Matters Now

Autonomous systems are making or substantially influencing decisions in at least six domains where those decisions carry serious moral weight: lethal military targeting, criminal sentencing, medical diagnosis, child welfare risk scoring, autonomous vehicle navigation, and financial credit allocation. In each domain, the speed and scale at which these systems operate makes meaningful human review of individual decisions practically impossible. We are building the infrastructure of distributed moral action faster than we are building the philosophical and legal frameworks to govern it.

The lessons in this module trace four dimensions of that challenge: what autonomy means (this lesson), how autonomous systems fail and who bears responsibility (Lesson 2), how we design meaningful human oversight (Lesson 3), and how emerging international frameworks attempt to regulate systems that no single jurisdiction fully controls (Lesson 4).

Behavioral AutonomyA system's capacity to act without moment-to-moment human direction; does not imply moral responsibility.

Moral AgencyThe capacity to make choices that are genuinely one's own, grounded in values the agent holds and can reflectively endorse.

Responsibility GapThe structural situation in which an autonomous system causes harm that cannot be straightforwardly attributed to any individual human decision-maker.

Intentional StanceDennett's concept: treating a system as if it has beliefs and desires to predict behavior, without claiming it actually possesses those mental states.

Lesson 1 Quiz

What Does Autonomy Mean for a Machine? — 5 questions

1. In the 2018 Uber ATG fatality in Tempe, Arizona, what specific technical decision by engineers directly contributed to the crash?

Correct. Uber engineers had disabled the Volvo's automatic emergency braking system. The NTSB found this was a deliberate engineering choice to reduce the system's tendency toward sudden stops, creating a critical gap when the pedestrian was detected.

Not quite. The NTSB's 2019 report specifically identified the deliberate disabling of automatic emergency braking as a key contributing factor. The LIDAR was operational — it detected Herzberg 5.6 seconds before impact.

2. Andreas Matthias's concept of the "responsibility gap" refers to which structural problem?

Correct. Matthias identified a structural gap: as ML systems become more autonomous and less predictable, harms occur for which no programmer, operator, or manufacturer bears clear individual responsibility.

Not quite. Matthias's responsibility gap is specifically about accountability attribution — when an autonomous system causes harm, the responsibility cannot be cleanly assigned to any individual human decision-maker.

3. Daniel Dennett's "intentional stance" is best described as:

Correct. Dennett's intentional stance is a pragmatic predictive tool — not an ontological claim. We adopt it because it works, not because it reveals something about the inner life of the system.

Not quite. The intentional stance is explicitly not a claim that systems have genuine mental states. It is a predictive heuristic: treating systems as if they have beliefs and desires to better forecast their behavior.

4. Knight Capital Group's August 2012 algorithmic trading disaster illustrates which concept most directly?

Correct. Knight Capital's system executed 4 million trades in 45 minutes with no human directing individual decisions — a textbook example of the responsibility gap creating catastrophic consequences with no clear individual responsible party.

Not quite. The Knight Capital case is the clearest financial example of Matthias's responsibility gap: massive harm, no single human decision-maker directing the specific trades that caused it.

5. According to Floridi and Sanders (2004), which three criteria determine whether a system qualifies as a moral agent in their graduated framework?

Correct. Floridi and Sanders proposed that moral agency be evaluated on interactivity (responds to environment), autonomy (acts without direct human control), and adaptability (modifies behavior through experience) — bypassing the need to resolve consciousness questions.

Not quite. Floridi and Sanders deliberately avoided consciousness-based criteria. Their three criteria were: interactivity, autonomy, and adaptability — allowing degrees of moral agency to be assessed functionally.

Lab 1 — Mapping the Autonomy Spectrum

Discuss real systems and where they fall on the spectrum from automation to moral agency

Your Task

In this lab you will engage with an AI tutor to analyze real autonomous systems and classify their position on the autonomy spectrum. Consider both behavioral autonomy (how much it acts without human direction) and moral autonomy (whether it can bear responsibility).

Explore at least three different systems — such as a spam filter, a credit scoring algorithm, a self-driving car, or a lethal autonomous weapon system — and discuss where each sits on the spectrum and why the distinction matters ethically.

Start by naming one autonomous system you encounter in daily life and asking: does it have behavioral autonomy, moral autonomy, both, or neither?

Autonomy Spectrum Tutor

AI Ethics M7

Welcome to Lab 1. I'm here to help you map real AI systems onto the autonomy spectrum — from simple automation to approaching moral agency. Tell me about an autonomous system you encounter in everyday life. Where do you think it sits? Does it make decisions without human direction? Could it ever bear responsibility for those decisions?

Module 7 · Lesson 2

How Autonomous Systems Fail — and Who Pays

Failure modes, accident causation, and the distribution of accountability across design chains

When a machine harms someone, who should answer for it?

In 2016, ProPublica published an analysis of the COMPAS recidivism risk-scoring algorithm used in Broward County, Florida to inform bail, sentencing, and parole decisions. Examining 7,000 individuals arrested in 2013 and 2014, ProPublica found that Black defendants were nearly twice as likely as white defendants to be falsely flagged as high-risk for future violent crime — and white defendants were more likely to be incorrectly flagged as low-risk when they did go on to reoffend.

Northpointe, the company that built COMPAS, responded that the algorithm was equally accurate across racial groups as measured by a different metric: calibration. Both claims were mathematically correct. But satisfying one fairness criterion while violating another is not a technical accident — it is a reflection of a structural impossibility. When base rates of arrest differ across groups, you cannot simultaneously achieve equal false positive rates, equal false negative rates, and equal calibration. The algorithm did not create this tension. American criminal justice history did. The algorithm encoded it and applied it at scale.

Taxonomies of Autonomous System Failure

Autonomous system failures cluster into four broad categories, each with distinct accountability implications:

1. Specification Failures: The system optimizes for the wrong objective. Amazon's internal AI recruiting tool, scrapped in 2018, was trained on a decade of résumés from a male-dominated industry. It learned to penalize résumés containing the word "women's" (as in "women's chess club") and to downgrade graduates of all-women's colleges. No engineer specified "discriminate by gender." The objective function simply reflected the historical data, and the historical data reflected historical hiring bias. Garbage in, injustice out.

2. Distribution Shift Failures: The system encounters conditions outside its training distribution. The Uber ATG vehicle had not been trained on pedestrians who do not cross at crosswalks or whose classification oscillates between categories. Healthcare AI trained on data from large academic medical centers may fail when deployed in rural community hospitals with different patient demographics and documentation practices.

3. Adversarial Failures: Deliberate manipulation by external actors. A 2019 study by researchers at McAfee demonstrated that adding a small strip of tape to a speed limit sign could cause a Tesla Model S to misread a 35 mph sign as 85 mph. The physical world can be modified to fool AI perception systems in ways invisible to human observers.

4. Emergent Failures from System Interaction: Individual systems behave as designed, but their interaction produces unanticipated and harmful outcomes. The 2010 "Flash Crash," in which the Dow Jones Industrial Average fell nearly 1,000 points in minutes before partially recovering, was caused by automated trading systems reacting to each other's outputs in a feedback loop that no individual system's designers had modeled.

The Fairness Impossibility (Chouldechova, 2017)

Alexandra Chouldechova formally proved that when prevalence of an outcome differs across groups, a classifier cannot simultaneously achieve: (1) equal false positive rates, (2) equal false negative rates, and (3) equal calibration. Any choice among these fairness criteria is a moral and political choice, not a technical one. Embedding COMPAS in criminal justice without acknowledging this is not technical neutrality — it is moral abdication disguised as objectivity.

Who Bears Responsibility — The Design Chain

When an autonomous system causes harm, accountability is distributed across a chain of actors, each of whom made decisions that contributed to the outcome:

Data collectors determined which data to gather, how to label it, and whose experience would be represented. Algorithm designers chose the objective function, the model architecture, and the training methodology. Deployers decided the context of use, the population affected, and the degree of human oversight retained. Regulators established — or failed to establish — requirements for testing, transparency, and accountability. Users placed trust in outputs without necessarily understanding their limitations.

The legal scholar Frank Pasquale has noted that in many of these chains, no single actor made a decision they would recognize as "I am choosing to harm this person." Each made a locally reasonable choice. The harm is an emergent property of the chain, not the intent of any individual link.

Case · Boeing 737 MAX MCAS, 2018–2019

The Maneuvering Characteristics Augmentation System (MCAS) on the Boeing 737 MAX caused two fatal crashes — Lion Air Flight 610 (October 2018, 189 dead) and Ethiopian Airlines Flight 302 (March 2019, 157 dead) — by repeatedly pushing the nose down based on faulty angle-of-attack sensor data. The system could override pilot inputs. Pilots had not been informed it existed. A 2020 House Transportation Committee report found that Boeing and the FAA had traded independence for efficiency, with the FAA delegating safety certification to Boeing itself. The responsibility chain ran from sensor design through software architecture through certification delegation through pilot training decisions. Every actor had authorized their specific decision. No one had owned the whole.

Precautionary Accountability Principles

Several frameworks have emerged to address distributed accountability. The EU AI Act (2024) introduces risk-tiered obligations: high-risk AI systems — including those used in criminal justice, employment, and critical infrastructure — must maintain human oversight, be explainable to affected parties, and undergo conformity assessments before deployment. Accountability rests with the deployer in the EU framework, regardless of whether the deployer built the system.

The IEEE's Ethically Aligned Design framework recommends that autonomous systems be designed with traceability: the ability to reconstruct which decision pathways led to a specific output, and which humans authorized those pathways at design time. Without traceability, post-hoc accountability is impossible — you can identify that harm occurred without being able to identify where in the design chain it originated.

Specification FailureWhen an autonomous system optimizes for a proxy objective that diverges from the true intended goal, often encoding historical biases.

Distribution ShiftFailure that occurs when a system encounters real-world conditions outside the statistical range of its training data.

Fairness ImpossibilityChouldechova's formal result that multiple fairness criteria cannot be simultaneously satisfied when base rates differ across groups.

TraceabilityThe capacity to reconstruct which decision pathways led to a system's output and which humans authorized those pathways at design time.

Lesson 2 Quiz

How Autonomous Systems Fail — 5 questions

1. ProPublica's 2016 analysis of COMPAS found which specific racial disparity in the algorithm's false positive rate?

Correct. ProPublica's analysis found Black defendants were nearly twice as likely to be falsely flagged as future violent offenders — a false positive rate disparity that Northpointe acknowledged but argued was consistent with equal calibration across groups.

Not quite. ProPublica found that Black defendants — not white defendants — bore the higher false positive rate, being nearly twice as likely to be incorrectly flagged as high risk for violent reoffending.

2. What is Chouldechova's (2017) key formal finding about algorithmic fairness?

Correct. Chouldechova proved mathematically that these three fairness criteria are mutually incompatible when outcome prevalence differs between groups — making any choice among them a moral and political decision.

Not quite. Chouldechova's formal proof showed a fundamental incompatibility: when base rates differ, you cannot simultaneously have equal false positive rates, equal false negative rates, and calibration. Choosing any one sacrifices the others.

3. What type of failure does Amazon's 2018 AI recruiting tool illustrate?

Correct. Amazon's tool was trained on a decade of résumés submitted to a male-dominated company — a proxy for historical hiring outcomes, not for job performance potential. It learned to replicate the bias in its training data.

Not quite. Amazon's case is a classic specification failure: the system optimized for a proxy objective (resemblance to historical successful hires) rather than the actual goal (identifying the best candidates regardless of gender).

4. In the Boeing 737 MAX crashes of 2018–2019, what did the MCAS system do that contributed to the accidents?

Correct. MCAS repeatedly activated based on a single faulty angle-of-attack sensor, pushing the nose down against pilot resistance. Critically, pilots had not been informed the system existed, making effective intervention impossible.

Not quite. MCAS repeatedly pushed the nose down based on faulty angle-of-attack sensor data and could override pilot inputs. The system's existence had not been disclosed to pilots, eliminating their ability to respond appropriately.

5. The 2010 "Flash Crash" in financial markets is best categorized as which type of autonomous system failure?

Correct. The Flash Crash was caused by automated trading systems reacting to each other's outputs. Each system behaved within its individual design parameters, but their interactions produced a collectively catastrophic feedback loop no designer had modeled.

Not quite. The Flash Crash is the archetypal emergent failure: multiple automated trading systems each behaving as individually designed, but creating a dangerous feedback loop through their interactions that no single system's designers had anticipated.

Lab 2 — Dissecting Failure Chains

Analyze real autonomous system failures and trace accountability through the design chain

Your Task

Work with the AI tutor to dissect at least two real autonomous system failures. For each case, identify: (1) what type of failure it represents, (2) which actors in the design chain made consequential decisions, and (3) which fairness criteria were violated and which were preserved.

Consider cases like COMPAS, the Boeing 737 MAX MCAS, Amazon's recruiting tool, or the 2010 Flash Crash. You may also introduce a case you are familiar with.

Choose a failure case and describe what went wrong. Then ask: what type of failure is this, and who in the design chain bears responsibility?

Failure Chain Analyst

AI Ethics M7

Welcome to Lab 2. I'll help you dissect real autonomous system failures and trace accountability through the full design chain. Pick any case we studied — COMPAS, the Boeing 737 MAX, Amazon's recruiting tool, the 2010 Flash Crash — or bring one you know. What type of failure was it, and where in the design chain did the critical decisions get made?

Module 7 · Lesson 3

Meaningful Human Control

What it takes for human oversight to be real — not theatrical — in high-stakes autonomous systems

Is a human "in the loop" if they cannot meaningfully understand or override what the machine decides?

The USS Vincennes was equipped with the Aegis Combat System, one of the most sophisticated autonomous tracking and weapons systems of its era. On July 3, 1988, Aegis tracked Iran Air Flight 655, a civilian Airbus A300, as it ascended on a scheduled commercial route. The system correctly identified the aircraft's Mode III transponder — the civilian code — but operators in the combat information center, operating under stress during a simultaneous naval engagement, misread or misreported its altitude data. They believed the aircraft was descending toward the ship in an attack profile.

Captain Will Rogers III ordered the aircraft shot down. All 290 passengers and crew died. The human was technically "in the loop" — he pressed no automated button, he gave the verbal order. But the information environment that surrounded him had been shaped by Aegis's classification outputs, by the stress of simultaneous combat, and by an organizational culture that prioritized rapid action over deliberation. He was in the loop. He was not in control.

The Theater of Human Oversight

Human-in-the-loop (HITL) requirements are the most common policy response to concerns about autonomous system accountability. They appear in the EU AI Act's high-risk provisions, in U.S. Department of Defense Directive 3000.09 on autonomous weapons, and in dozens of corporate AI governance frameworks. The logic is intuitive: if a human must approve consequential decisions, humans retain moral and legal responsibility.

The problem is that "human in the loop" can describe radically different realities. Researchers Daniele Amerini and others have documented what is sometimes called automation bias: the tendency of human operators to accept algorithmic recommendations without independent analysis, particularly when operating under time pressure, cognitive load, or when disagreeing with the algorithm requires visible, justifiable dissent. When a parole board member can see that COMPAS scored a defendant 8/10 for recidivism risk, studies suggest they are substantially more likely to deny parole than if they had not seen the score — regardless of other case information. The human is in the loop. The algorithm is running the loop.

A related phenomenon is the speed-accuracy tradeoff: many autonomous systems operate faster than human deliberation is possible. A high-frequency trading algorithm executes in microseconds. A drone swarm target identification system may present and close a targeting window in seconds. A cybersecurity intrusion detection system flags thousands of anomalies per hour. In each case, a human can technically approve or override — but the operational tempo makes genuine deliberation practically impossible.

Automation Bias — Research Evidence

A 2014 study by Parasuraman and Manzey in Human Factors found that human operators commit significantly more errors when an automated aid is present and wrong than when no automated aid is present at all — because operators stop independently verifying decisions the machine has already made. The presence of automation can degrade human oversight quality even while technically satisfying oversight requirements.

Four Conditions for Meaningful Human Control

The philosopher Heather Roff and roboticist Richard Moyes have proposed four conditions that must be met for human control over an autonomous system to be genuinely meaningful rather than nominal:

1. Understanding: The operator must comprehend what the system is doing and why — at a level of detail sufficient to evaluate whether the action is appropriate. This excludes black-box systems where outputs are unexplainable to any human in the decision chain.

2. Ability to Intervene: The operator must have a genuine, operable mechanism to stop or modify the system's action before it causes harm. A review process that requires more time than the action window allows is not meaningful intervention capacity.

3. Authority: The operator must have organizational and legal authority to override the system's recommendation without personal career or legal risk for doing so. If overriding an algorithm requires justification that is institutionally costly, the override mechanism is formally available but practically inaccessible.

4. Accountability: The operator must genuinely bear responsibility for the outcome — not liability that can be transferred to the algorithm's developer or to the ambiguity of the decision chain. Accountability that is diffuse across a chain is accountability that belongs to no one.

Case · HART Algorithm, Durham Constabulary, 2013–2017

Durham Police in England deployed the Harm Assessment Risk Tool (HART) to predict whether suspects should be placed in custody or released. The algorithm generated predictions officers could see on their screens. A 2018 independent review by Marion Oswald found that officers rarely overrode HART predictions, even when they possessed additional contextual information the algorithm could not access. Officers reported feeling that overriding the algorithm required formal justification they were reluctant to provide. The review concluded that HART satisfied nominal human-in-the-loop requirements while functionally operating as if decisions were fully automated.

Design for Meaningful Control

Several design interventions have been proposed to make human oversight substantively rather than nominally real. Forcing functions require operators to input independent assessments before seeing algorithmic outputs, preventing anchoring to the algorithm's recommendation. Red-teaming and adversarial testing probe whether operators can detect and override errors — and trains them to maintain healthy skepticism of system outputs. Explanation requirements mandate that systems provide not just a prediction but the key factors driving it, at a level human operators can evaluate.

The European Commission's High-Level Expert Group on AI recommended in 2019 that oversight mechanisms be designed with attention to operationally realistic conditions — meaning oversight requirements should be tested under the actual time pressures, cognitive loads, and organizational cultures in which they will be applied, not in idealized laboratory conditions.

Automation BiasThe tendency of human operators to accept algorithmic recommendations without independent verification, particularly under time pressure or cognitive load.

Human-in-the-Loop (HITL)A system design in which a human must approve or can override consequential decisions — but which may be nominal rather than meaningful depending on operational conditions.

Forcing FunctionA design mechanism that requires humans to form independent judgments before seeing algorithmic outputs, reducing anchoring bias.

Meaningful Human ControlOversight that satisfies the conditions of understanding, intervention ability, authority, and accountability — as opposed to nominal human presence in a decision process.

Lesson 3 Quiz

Meaningful Human Control — 5 questions

1. In the USS Vincennes incident, what does the phrase "in the loop but not in control" mean in terms of human oversight?

Correct. Captain Rogers technically issued the order — no automated system fired autonomously. But the information he received was mediated and partially incorrect, the operational tempo was extreme, and organizational culture encouraged rapid action. His "choice" was profoundly constrained.

Not quite. The Aegis system did not fire automatically — Rogers gave the order. The problem was that the information environment created by Aegis's outputs and the stress of simultaneous combat made genuine deliberation practically impossible despite nominal human authorization.

2. What does research on automation bias (Parasuraman and Manzey, 2014) suggest about human performance when automated aids are present?

Correct. This is the paradox of automation bias: a wrong automated aid is worse than no aid at all, because operators stop independently verifying decisions the machine has already made — leading to undetected errors they would have caught unaided.

Not quite. Parasuraman and Manzey found that operators perform worse when an automated aid is wrong than when no aid is present — because the presence of automation reduces independent verification. A wrong algorithm is sometimes more dangerous than no algorithm.

3. According to Roff and Moyes's framework, which of the following would NOT constitute meaningful human control?

Correct. Roff and Moyes's framework requires that override authority be genuinely exercisable without institutional cost. A formally available override that is practically inaccessible due to organizational culture fails the authority condition of meaningful control.

Not quite. Roff and Moyes's authority condition requires that operators can actually exercise override power. If overriding requires costly justification in an environment where doing so is professionally risky, the override is nominally available but practically inaccessible.

4. The HART algorithm review in Durham found that officers rarely overrode the system's predictions. What was the primary identified reason?

Correct. The independent review found that organizational culture made overriding HART costly in terms of documentation and justification. Officers with additional relevant information often deferred to the algorithm rather than bear the institutional cost of overriding it.

Not quite. The review found that officers could formally override HART but rarely did — because doing so required formal justification they were reluctant to provide. The override mechanism existed. The practical authority to use it was institutionally constrained.

5. What is a "forcing function" in the context of designing meaningful human oversight of AI systems?

Correct. Forcing functions are design interventions that prevent operators from seeing algorithmic outputs until they have independently assessed the situation — countering the anchoring effect that makes automation bias so persistent.

Not quite. A forcing function in this context is a design choice that requires operators to form their own independent assessment before the system reveals its recommendation — structurally preventing anchoring to the algorithm's output.

Lab 3 — Designing Oversight That Works

Evaluate proposed human control mechanisms against Roff and Moyes's four conditions

Your Task

Work with the AI tutor to evaluate proposed human oversight mechanisms for real autonomous systems. For each mechanism, apply the four conditions of meaningful human control: understanding, ability to intervene, authority, and accountability.

Consider situations like: a radiologist reviewing an AI cancer diagnosis tool, a judge who has seen a COMPAS risk score, an air traffic controller monitoring an autonomous collision avoidance system, or a loan officer with an algorithmic credit decision on screen.

Describe one of these oversight situations. Does the human have understanding, genuine intervention ability, real authority, and clear accountability? Which conditions are satisfied? Which are not?

Oversight Design Evaluator

AI Ethics M7

Welcome to Lab 3. I'll help you evaluate whether proposed human oversight mechanisms actually satisfy the four conditions of meaningful control: understanding, ability to intervene, authority, and accountability. Pick a real scenario where a human is supposed to oversee an AI decision — and let's test whether that oversight is genuinely meaningful or merely nominal.

Module 7 · Lesson 4

Regulating Systems That Cross Borders

International frameworks, jurisdictional gaps, and the geopolitics of autonomous AI governance

Can any national law govern a system trained in one country, deployed in another, and affecting people in a third?

Clearview AI scraped billions of facial images from public websites — Instagram, Facebook, news archives, LinkedIn — without consent and built a facial recognition database marketed primarily to law enforcement. By 2020 it had contracts with over 600 U.S. law enforcement agencies. The system was an autonomous classifier operating across jurisdictions: images of people from Canada, Australia, the UK, and the EU were in its training data; its deployment decisions were made in the United States.

What followed was a fragmented global enforcement action. The UK's Information Commissioner's Office fined Clearview £7.5 million in 2022 and ordered deletion of UK residents' data. Italy's Garante fined it €20 million. Australia's Privacy Commissioner found it in breach of the Privacy Act 1988. The Canadian privacy authorities concluded it violated PIPEDA and demanded it cease operations in Canada. Clearview refused to pay most fines, arguing it had no legal presence in those jurisdictions. As of 2024, the company continued to operate in the United States, where no equivalent federal privacy law existed. The same technology, the same data, the same harms — but nine different legal outcomes across nine jurisdictions. This is the regulatory gap writ large.

The Three Levels of Regulatory Response

International responses to autonomous AI systems have operated at three levels, each with distinct scope and enforceability:

National Legislation: The EU AI Act (formally adopted June 2024) is the most comprehensive binding framework. It takes a risk-tiered approach: prohibited practices (social scoring by public authorities, real-time biometric surveillance in public spaces with narrow exceptions), high-risk applications (education, employment, criminal justice, critical infrastructure — subject to conformity assessments and human oversight requirements), and lower-risk applications (chatbots) requiring transparency disclosures. The Act has extraterritorial reach by design: systems deployed in the EU market must comply regardless of where their developers are headquartered — a deliberate mirror of GDPR's enforcement model. China enacted its AI Generation Regulations in 2023, requiring content labeling, algorithmic transparency registrations, and security assessments for generative AI systems — with different emphasis than the EU's rights-based framework but similar extraterritorial ambitions for systems affecting Chinese users.

International Guidelines (Non-Binding): The OECD Principles on AI (2019, endorsed by G20) establish five principles: inclusive growth and sustainable development, human-centered values and fairness, transparency and explainability, robustness, security and safety, and accountability. These are influential but unenforceable. The UNESCO Recommendation on the Ethics of AI (2021), adopted by 193 member states, addresses AI's impact on human rights, the environment, and epistemology — but carries no compliance mechanism. Recommendations travel faster than legislation; enforcement follows neither.

Sectoral and Platform-Level Rules: In practice, much autonomous system governance happens at the platform and sector level — not through legislation but through terms of service, API policies, and self-regulatory frameworks. OpenAI's usage policies, Google's prohibited use cases for Gemini, and Apple's App Store policies each constrain AI deployment for hundreds of millions of users in ways that no international treaty currently does. The limitation is that these frameworks are set by the platforms themselves, can be changed unilaterally, and reflect commercial interests alongside ethical ones.

The Brussels Effect

Legal scholar Anu Bradford coined the term "Brussels Effect" to describe how EU regulations effectively become global standards — because multinationals find it more efficient to apply the world's strictest standard everywhere than to maintain separate compliance tracks. GDPR drove global privacy policy updates far beyond EU borders. The EU AI Act is likely to produce a similar effect on AI governance, particularly for high-risk AI systems sold to enterprise clients who face their own EU market exposure.

Lethal Autonomous Weapons — The Governance Frontier

The sharpest test of international autonomous AI governance is lethal autonomous weapons systems (LAWS) — weapons that can select and engage targets without human authorization for individual strikes. The UN's Group of Governmental Experts on LAWS has convened annually since 2014 under the Convention on Certain Conventional Weapons (CCW) framework. As of 2024, no binding international agreement exists. The United States, Russia, China, Israel, and South Korea — all states with substantial autonomous weapons development programs — have collectively blocked binding restrictions while endorsing non-binding "responsible use" guidelines.

The International Committee of the Red Cross (ICRC) recommended in 2023 that states adopt new legally binding rules specifically prohibiting autonomous weapons that (a) cannot be used in compliance with International Humanitarian Law, (b) apply force against persons without human judgment over individual targeting decisions, or (c) have unpredictable effects. The ICRC argument is that IHL requirements — distinction between combatants and civilians, proportionality, precaution — require value judgments that cannot be encoded in current AI systems.

Several documented deployments have already occurred without this governance in place. The Kargu-2 Turkish-made autonomous drone was reportedly used in the Libya conflict in 2020. An autonomous Israeli Harop loitering munition has been sold to multiple states. The U.S. Navy's autonomous patrol vessels operate in contested waters. Each deployment tests the claim that existing IHL is sufficient to govern systems the law was never written to address.

Case · U.S. DoD Directive 3000.09 (2012, revised 2023)

The U.S. Department of Defense's governing policy on autonomous weapons requires "appropriate levels of human judgment over the use of force." This deliberately vague language — "appropriate levels" — was intentional: it allows flexibility across future weapon systems while satisfying nominal accountability requirements. Critics including Human Rights Watch and the Campaign to Stop Killer Robots have argued that "appropriate human judgment" without a requirement for individual targeting authorization is insufficient to satisfy IHL obligations. The DoD maintains that the policy is adequate. No international body has authority to adjudicate the disagreement.

What Effective Governance Requires

Scholars across political science, law, and ethics broadly agree that effective governance of autonomous systems across jurisdictions requires at least four elements that current frameworks lack. Common definitional standards: what counts as "autonomous," what constitutes "meaningful human control," and what risks qualify as "high-risk" must be defined consistently enough that regulatory arbitrage — moving development to lower-standard jurisdictions — does not hollow out stronger frameworks. Mutual recognition or harmonization: the EU AI Act, China's regulations, and the U.S. Executive Order on AI (October 2023) each create distinct compliance obligations; multinationals face compliance stacks rather than a coherent framework. Enforcement mechanisms with extraterritorial reach: the Clearview AI case illustrates that fines are ineffective against entities with no physical presence in the fining jurisdiction. Inclusive representation: the states most affected by autonomous weapons, predictive policing systems, and credit scoring algorithms are often least represented in the bodies that set governance standards for them.

Brussels EffectThe phenomenon by which EU regulations become effective global standards as multinationals apply the world's strictest compliance standard everywhere rather than maintain separate tracks.

LAWSLethal Autonomous Weapons Systems — weapons capable of selecting and engaging targets without human authorization for individual strike decisions.

Regulatory ArbitrageThe practice of situating development or operations in jurisdictions with weaker regulatory requirements to avoid stronger-jurisdiction obligations.

IHLInternational Humanitarian Law — the laws of armed conflict, including requirements of distinction, proportionality, and precaution that the ICRC argues cannot be met by current autonomous weapons systems.

Lesson 4 Quiz

Regulating Systems That Cross Borders — 5 questions

1. In the Clearview AI enforcement actions of 2021–2023, what strategic argument did Clearview use to avoid compliance with European and other national regulators?

Correct. Clearview's central argument against paying European fines was that it lacked legal presence in those jurisdictions, making enforcement practically impossible — illustrating the core jurisdictional gap in international AI governance.

Not quite. Clearview's primary argument was jurisdictional: it refused most fines on the grounds that having no legal establishment in the UK, Italy, or Australia meant those regulators lacked enforceable authority over it.

2. What does Anu Bradford's "Brussels Effect" describe in the context of AI regulation?

Correct. Bradford's Brussels Effect is a market-driven phenomenon: multinationals with EU market exposure apply EU compliance standards globally because building separate compliance tracks for lower-standard jurisdictions is more expensive than uniform adherence to the strictest standard.

Not quite. The Brussels Effect is not about political coercion or formal treaty adoption. It describes how market forces cause EU regulations to effectively globalize: multinationals apply the EU's strict standard everywhere because divergent compliance tracks are operationally costly.

3. The ICRC's 2023 recommendation on lethal autonomous weapons argues that IHL requirements cannot be met by current AI systems. What specific IHL principles does it cite?

Correct. The ICRC's argument is that IHL's core operational principles — distinction, proportionality, and precaution — require contextual value judgments that current autonomous systems cannot reliably make.

Not quite. The ICRC specifically cited the three core operational principles of IHL: distinction between combatants and civilians, proportionality in attack, and precaution in targeting — arguing these require human moral judgment that current AI systems cannot exercise.

4. The EU AI Act categorizes AI systems into risk tiers. Which application is classified as PROHIBITED under the Act?

Correct. The EU AI Act explicitly prohibits social scoring by public authorities — the practice of rating citizens based on their behavior, social status, or personal characteristics in ways that lead to unjustified differential treatment.

Not quite. Social scoring by public authorities is one of the explicitly prohibited practices under the EU AI Act. Medical diagnosis AI and employment screening AI are classified as high-risk (subject to requirements, not banned). Undisclosed chatbots face transparency requirements, not prohibition.

5. Which of the following best describes "regulatory arbitrage" in the context of autonomous AI systems?

Correct. Regulatory arbitrage is the strategic location of development in permissive jurisdictions. It is a central challenge for AI governance: strong national regulations can be circumvented by moving development offshore, unless international standards achieve sufficient harmonization.

Not quite. Regulatory arbitrage in AI governance refers to the practice of situating development or operations in jurisdictions with weaker regulatory standards to avoid the compliance costs imposed by stricter jurisdictions — a core challenge for any unharmonized international framework.

Lab 4 — International Governance Gaps

Analyze real jurisdictional conflicts and propose governance mechanisms for cross-border AI systems

Your Task

Work with the AI tutor to examine real cases where autonomous systems have crossed jurisdictional boundaries and existing governance frameworks have failed to address the harms. Analyze what governance mechanisms would need to exist to address each gap.

Consider: the Clearview AI multi-jurisdiction enforcement failure, the LAWS governance vacuum at the UN, or the challenge of applying EU AI Act standards to U.S.-developed models. What would a workable international framework need to include?

Identify a specific cross-border AI governance gap — a case where current frameworks fail to address real harms. What structural element is missing, and what would an effective mechanism look like?

International Governance Analyst

AI Ethics M7

Welcome to Lab 4. I'll help you analyze the real gaps in international AI governance — cases where autonomous systems cross borders and existing frameworks fail to protect people. Tell me about a specific governance gap you want to explore: the Clearview enforcement problem, the LAWS treaty vacuum, the EU-US regulatory divergence, or another case you have in mind. What structural element is missing, and what would an effective solution require?

Module 7 Test

Autonomous Systems and Moral Agency — 15 questions · Pass at 80%

1. What did the NTSB's 2019 report on the Uber ATG Tempe fatality identify as a key contributing engineering decision?

Correct. The NTSB found Uber engineers had deliberately disabled the Volvo's automatic emergency braking system, leaving the vehicle with no automated response when the pedestrian was detected.

Incorrect. The NTSB's key finding was that Uber engineers had disabled the automatic emergency braking system — a deliberate design choice, not a mapping or sensor configuration issue.

2. The "responsibility gap" identified by Andreas Matthias refers to:

Correct. Matthias identified a structural gap: as AI systems become more autonomous and less predictable, there emerge harms that cannot be clearly attributed to any individual human decision-maker in the design or deployment chain.

Incorrect. The responsibility gap is specifically about accountability attribution — when no individual human's decision can be clearly identified as the cause of harm produced by an autonomous system.

3. Floridi and Sanders' graduated moral agency framework evaluates AI systems on which three criteria?

Correct. Floridi and Sanders deliberately bypassed consciousness-based criteria and proposed interactivity (responds to environment), autonomy (acts without direct human control), and adaptability (modifies behavior through experience).

Incorrect. Floridi and Sanders used functional rather than phenomenological criteria: interactivity, autonomy, and adaptability — allowing degrees of moral agency to be assessed without resolving consciousness questions.

4. Amazon's AI recruiting tool was scrapped in 2018. What type of autonomous system failure does it exemplify?

Correct. Amazon's tool was trained on historical hiring data from a male-dominated industry, learning to associate male characteristics with hiring success — a classic specification failure where the proxy objective diverged from the true goal.

Incorrect. This is a specification failure. The system was optimizing for the wrong thing — resemblance to historical hires rather than actual candidate quality — and that historical data encoded gender bias.

5. Chouldechova's (2017) formal proof established which impossibility result about algorithmic fairness?

Correct. Chouldechova proved that differing base rates make simultaneous satisfaction of multiple fairness criteria mathematically impossible — meaning any choice among them is a moral and political decision, not a technical one.

Incorrect. Chouldechova's proof specifically addresses the incompatibility of fairness criteria when base rates differ: you cannot simultaneously have equal false positive rates, equal false negative rates, and calibration across groups.

6. The 2010 "Flash Crash" is best classified as which type of autonomous system failure?

Correct. The Flash Crash was caused by automated trading systems reacting to each other's outputs. Each individual system behaved as designed; the catastrophic market drop was an emergent property of their interaction.

Incorrect. The Flash Crash is the archetypal emergent failure: multiple automated systems each operating within design parameters, but creating a destructive feedback loop through their real-time interactions.

7. In the USS Vincennes incident (1988), what does the case illustrate about human oversight of autonomous systems?

Correct. Vincennes demonstrates that nominal human authorization (Rogers gave the order) is not the same as meaningful human control when the information environment is mediated by an autonomous system and deliberation time is effectively zero.

Incorrect. Vincennes illustrates the gap between nominal and meaningful human control: the human technically authorized the action, but the conditions for genuine deliberation — accurate information, sufficient time, freedom from pressure — were not present.

8. Which of the four conditions for meaningful human control (Roff and Moyes) addresses the problem of override mechanisms that are formally available but practically inaccessible due to organizational culture?

Correct. The authority condition requires that operators can actually exercise override power without institutional or career cost. When overriding an algorithm requires costly justification in a risk-averse culture, authority is formally present but practically absent.

Incorrect. This is the authority condition: the requirement that operators have genuine organizational and legal authority to override the system without institutional cost — not merely a formal override mechanism that carries practical risk to use.

9. Automation bias research (Parasuraman and Manzey) shows that human operators with a faulty automated aid perform worse than operators with no aid. Why?

Correct. Automation bias operates by reducing independent verification — operators trust the machine's output and skip the checks they would otherwise perform. A wrong automated aid thus produces errors that go undetected.

Incorrect. Automation bias works through reduced verification: when an automated system produces an output, operators tend to accept it without independent checking — causing them to miss errors they would have caught through their own process.

10. The Boeing 737 MAX MCAS system's two fatal crashes are best understood as which type of accountability failure?

Correct. The House Transportation Committee found responsibility distributed across Boeing engineering, FAA certification delegation, and pilot training decisions — a classic distributed design chain failure in which every actor had authorized their specific decision but no one owned the whole system's safety.

Incorrect. MCAS represents distributed design chain failure: sensor design, MCAS software architecture, the FAA's delegation of certification to Boeing, and pilot training decisions all contributed. No single actor's individual decision was the single cause.

11. What was the outcome of the Clearview AI enforcement actions across multiple jurisdictions, and what governance failure does it illustrate?

Correct. Clearview refused to pay most fines and argued it had no legal presence in fining jurisdictions. It continued to operate in the U.S. where no equivalent federal law existed — illustrating the core jurisdictional enforcement gap in AI governance.

Incorrect. Clearview refused most fines and continued operating, demonstrating that without physical legal presence in a jurisdiction, regulatory fines lack enforcement mechanisms — a critical gap in international AI governance.

12. The EU AI Act's "prohibited" category includes which of the following applications?

Correct. Social scoring by public authorities — rating citizens based on their behavior or personal characteristics in ways that lead to unjustified differential treatment — is explicitly prohibited under the EU AI Act.

Incorrect. The EU AI Act explicitly prohibits social scoring by public authorities. University admissions AI and content recommendation systems fall in different risk tiers and face requirements rather than prohibition.

13. What distinguishes "behavioral autonomy" from "moral autonomy" in autonomous systems?

Correct. Current AI systems achieve increasing degrees of behavioral autonomy (acting without moment-to-moment human direction) while possessing zero moral autonomy — they cannot hold values they reflectively endorse, feel remorse, or bear genuine responsibility.

Incorrect. Behavioral autonomy is about the absence of moment-to-moment human control. Moral autonomy requires that the agent's choices be genuinely its own — grounded in values it holds and can reflectively endorse. Current AI systems have the former and lack the latter.

14. The ICRC's 2023 recommendation on lethal autonomous weapons argues they cannot comply with IHL. Which three IHL principles does it cite as requiring human moral judgment that current AI cannot exercise?

Correct. The ICRC's argument is that distinction, proportionality, and precaution each require contextual value judgments — assessing civilian presence, weighing military advantage against civilian harm, taking all feasible precautions — that current autonomous systems cannot reliably perform.

Incorrect. The ICRC specifically cited the three core operational principles: distinction (combatants from civilians), proportionality (military advantage vs. civilian harm), and precaution in attack — each requiring human contextual judgment.

15. What does "regulatory arbitrage" mean in the context of international AI governance, and which case from this module best illustrates it?

Correct. Regulatory arbitrage in AI governance is the strategic use of permissive jurisdictions. Clearview AI's continued U.S. operation while refusing European compliance obligations is the clearest example from this module.

Incorrect. Regulatory arbitrage is about exploiting jurisdictional differences — operating in permissive jurisdictions to avoid stricter ones. Clearview AI's refusal to comply with European orders while operating freely in the U.S. is the textbook example from this module.