L1
·
Quiz
·
Lab
L2
·
Quiz
·
Lab
L3
·
Quiz
·
Lab
L4
·
Quiz
·
Lab
Module Test
Module 5 · Lesson 1

Why Arms Control Is Hard for AI

Nuclear treaties relied on counting warheads. You cannot count an algorithm.
What makes AI fundamentally different from every weapons system arms controllers have tried to constrain before?

When U.S. and Soviet negotiators concluded START I in July 1991, verification rested on a deceptively simple premise: missiles are physical objects. Inspectors could count SS-18 silos from satellites, walk into Votkinsk to watch rail-mobile launchers, weigh warheads on certified scales. The treaty's 700-page annex of definitions worked because thermonuclear warheads cannot be e-mailed. Thirty years later, the code that controls an autonomous targeting system can cross a border in milliseconds.

The Verification Problem

Every successful arms control treaty in history has depended on what scholars call national technical means — primarily reconnaissance satellites — supplemented by on-site inspections. Both methods assume the object being counted has a stable physical signature: a missile's heat bloom, a warhead's distinctive re-entry vehicle shape, a submarine's acoustic profile.

AI systems shatter that assumption. A trained neural network is, at its core, a large array of floating-point numbers. It occupies no unique geography. One model can be copied infinitely at near-zero marginal cost. A state that agrees to limit "autonomous lethal targeting systems" could comply on paper while distributing identical weights across civilian cloud infrastructure the moment inspectors depart.

The 2019 U.S. National Intelligence Strategy explicitly named adversarial AI as an emerging threat but offered no verification mechanism. The OECD AI Principles (May 2019), endorsed by 42 countries, established norms around transparency and human oversight — but contained zero enforcement provisions. The gap between aspiration and verification is the central challenge of AI arms control.

The Dual-Use Dilemma

The same computer vision model that guides an autonomous drone can sort medical images for cancer detection. The same reinforcement-learning algorithm that trains a cyber-intrusion agent trains a robotic surgery system. Unlike highly enriched uranium, there is no detectable physical difference between a "weapons" AI and a "civilian" AI.

Lessons from Nuclear, Chemical, and Biological Arms Control

Three historical regimes offer partial lessons. The Nuclear Non-Proliferation Treaty (1968) succeeded partly because fissile material production requires massive, visible infrastructure — enrichment cascades, reactors — that satellites can detect. The Chemical Weapons Convention (1993) relies on declared facilities and short-notice inspections; it has largely worked for state parties because industrial-scale chemical weapons production leaves physical evidence. The Biological Weapons Convention (1972) is the cautionary tale: it banned an entire class of weapons but has no verification protocol at all, and multiple states — including the Soviet Union's Biopreparat program, disclosed after 1991 — violated it massively.

AI is closer to bioweapons than to nuclear weapons in its verification difficulty. Both involve dual-use knowledge; both can be developed in small, concealed facilities; both lack a single detectable physical signature. The BWC failure is directly instructive.

Key Distinction

Nuclear arms control counts objects. Chemical weapons control monitors processes. Biological weapons control attempts — and largely fails — to monitor knowledge. AI arms control must confront the same knowledge-control problem, but at digital speed and global scale.

The Speed-of-Action Problem

A second fundamental challenge is temporal. Cold War arms control assumed that decision-making occurred on human timescales — minutes at minimum, typically hours or days. The 1963 Hotline Agreement between Washington and Moscow assumed humans would be in the loop during crises.

AI systems can execute kill chains in milliseconds. In November 2019, the U.S. Defense Advanced Research Projects Agency's AlphaDogfight Trials demonstrated an AI pilot defeating experienced F-16 pilots 5-0 in simulated combat. The winning agent reacted approximately eight times faster than a human. A conflict involving autonomous systems operating at machine speed could escalate from first contact to existential exchange before any human has an opportunity to intervene — let alone invoke crisis communication protocols established for human decision-makers.

This compresses the window arms control must protect. Treaties designed around human reaction times may be functionally irrelevant in a conflict conducted by autonomous systems.

Verification gap The inability of existing inspection and monitoring methods to confirm compliance with limits on AI capabilities, because AI has no stable physical signature.
Dual-use problem The same AI system or underlying model can serve both civilian and weapons purposes, making restriction by capability extremely difficult.
Machine-speed escalation The risk that conflicts involving autonomous systems progress faster than human decision-makers can intervene to halt or de-escalate.

Lesson 1 Quiz

Why Arms Control Is Hard for AI — 4 questions
1. What property made nuclear weapons relatively easier to count and verify under START I compared to AI systems?
Correct. START I verification relied on counting physical objects — silos, launchers, re-entry vehicles — using satellites and on-site inspections. AI code has no equivalent physical signature.
Not quite. The key distinction is physical detectability: missiles occupy fixed geography and have observable signatures; AI weights do not.
2. Which historical arms control regime is most analogous to the AI verification challenge, and why?
Correct. Like bioweapons knowledge, AI capability is dual-use, concealable, and lacks a unique physical signature — the BWC's verification failure is the most relevant cautionary precedent.
The BWC analogy is strongest. Both AI and bioweapons involve knowledge-control problems, dual-use potential, and small concealed development environments.
3. The 2019 OECD AI Principles were endorsed by 42 countries. What is their primary limitation for arms control purposes?
Correct. The OECD Principles set aspirational standards around transparency and human oversight but include no mechanism to verify compliance or punish violations.
The critical gap is enforcement. Voluntary norms without verification machinery cannot reliably constrain states that choose to defect.
4. What did DARPA's 2019 AlphaDogfight Trials demonstrate about the speed-of-action problem?
Correct. The AlphaDogfight result illustrated that AI agents can operate at speeds that compress or eliminate the human decision windows that Cold War arms control assumed would exist.
The AlphaDogfight AI won 5-0 and acted about eight times faster than humans — a direct challenge to the temporal assumptions underlying existing crisis-management protocols.

Lab 1: The Verification Design Challenge

Structured conversation · ~3 exchanges to complete

Your Mission

You are a policy analyst advising a fictional international working group on AI arms control. Your task is to propose at least one novel verification mechanism for a hypothetical treaty limiting autonomous lethal AI systems — and to pressure-test it against the dual-use and speed-of-action challenges covered in Lesson 1.

Start by proposing a verification approach. The AI advisor will challenge it. Defend, refine, or pivot based on the critique. Use real treaty precedents where possible.
AI Policy Advisor
Arms Control Design
Welcome to the working group session. We're tasked with designing a verification mechanism for a hypothetical treaty limiting autonomous lethal AI targeting systems. Before we dive in — what verification approach would you propose, and why do you think it could survive the dual-use objection? Be specific: cite any precedent from existing arms control regimes if you can.
Module 5 · Lesson 2

Existing Frameworks and Their Limits

From the CCW debates to the Bletchley Declaration — what the international community has actually tried.
Which existing multilateral forum comes closest to a binding framework for autonomous weapons, and why has it stalled?

Delegates to the Convention on Certain Conventional Weapons Group of Governmental Experts on Lethal Autonomous Weapons Systems filed into the Palais des Nations for what campaigners had hoped would be a breakthrough session. The Campaign to Stop Killer Robots, a coalition of 270 NGOs, had been lobbying for a legally binding instrument since 2012. After eleven years of meetings, the GGE produced another non-binding set of guiding principles. Russia and the United States blocked a mandate to negotiate a treaty. The session ended with a press release.

The CCW GGE Process

The Convention on Certain Conventional Weapons (CCW), adopted in 1980, restricts or prohibits weapons deemed excessively injurious or indiscriminate. Its Protocol IV banned blinding laser weapons in 1995 — the last successful addition to the CCW. Since 2014, a Group of Governmental Experts (GGE) has met periodically to discuss lethal autonomous weapons systems (LAWS), defined roughly as systems that select and engage targets without human intervention.

The GGE has produced a set of eleven guiding principles, agreed in 2019, that affirm international humanitarian law applies to LAWS and that human responsibility must be maintained. What the GGE has not produced: a definition of LAWS that all states accept, a prohibition on any category of autonomous weapon, or any verification mechanism.

The fundamental impasse is structural. The CCW operates by consensus — any state party can block progress. Russia has argued that autonomous systems are simply automated weapons governed by existing IHL. The United States has resisted a ban it fears would constrain advantageous U.S. systems while being easily evaded by adversaries. China has publicly called for a ban on autonomous systems that "kill" independently but resisted any definition that would apply to its own programs.

The Definition Problem

No multilateral forum has agreed on what "autonomous" means in "lethal autonomous weapons system." Does a Phalanx close-in weapon system — which automatically engages incoming missiles — qualify? What about a loitering munition with a 30-minute time-on-station limit? The inability to agree on scope has paralyzed every normative effort.

The Bletchley Process and AI Safety Summits

In November 2023, the United Kingdom convened the AI Safety Summit at Bletchley Park — the first major multilateral gathering focused specifically on frontier AI risks. Twenty-eight states, including the United States, China, and the EU, signed the Bletchley Declaration, acknowledging that frontier AI poses potential catastrophic risk and committing to share information on safety.

China's signature was diplomatically significant — Beijing had been absent or obstructive at previous technology governance forums. However, the Declaration is explicitly non-binding. It contains no commitments to limit development, share model weights, or submit to inspection. A follow-on summit was held in Seoul in May 2024, producing the Seoul Statement, which added language on government-industry cooperation but again stopped short of binding obligations.

The Bletchley process represents the first attempt to bring major AI-developing states together around shared safety concerns — but critics note it focuses on "frontier AI" risks (primarily large language models and general-purpose AI) rather than specifically military applications or autonomous weapons.

U.S. Executive Action: The Political Declaration

Separately from multilateral forums, the United States in February 2023 issued a Political Declaration on Responsible Military Use of Artificial Intelligence and Autonomy. By late 2023, 52 states had endorsed it. The Declaration commits signatories to develop AI consistent with international humanitarian law, maintain human judgment in nuclear command and control, and take steps to minimize unintended engagements.

The Declaration is not legally binding. It has no verification mechanism. Notable non-signatories include Russia and China. Its practical impact remains unclear, but it represents the most explicit multilateral statement of principle on military AI to date, and it explicitly addresses the nuclear nexus — a critical area given concerns about AI integration into early-warning systems.

FrameworkYearBinding?Verification?Key States Absent
CCW GGE Guiding Principles2019NoNo
OECD AI Principles2019NoNoRussia, China
U.S. Political Declaration2023NoNoRussia, China
Bletchley Declaration2023NoNo
Seoul Statement2024NoNo
Pattern Recognition

Every existing multilateral AI or LAWS framework is non-binding and contains no verification mechanism. This is not an oversight — it reflects the genuine difficulty of verifying compliance and the unwillingness of major military powers to accept binding constraints on capabilities they are actively developing.

CCW GGE The Convention on Certain Conventional Weapons Group of Governmental Experts on LAWS — the primary UN-affiliated forum for autonomous weapons discussions since 2014, operating by consensus.
Bletchley Declaration November 2023 non-binding statement by 28 states acknowledging frontier AI risks; notable for including China's signature.
Political Declaration on Military AI U.S.-led 2023 non-binding statement on responsible military AI use, endorsed by 52 states excluding Russia and China.

Lesson 2 Quiz

Existing Frameworks and Their Limits — 4 questions
1. What structural feature of the CCW has most directly prevented progress on a binding LAWS treaty?
Correct. The CCW consensus rule gives every state party an effective veto. Russia and the U.S. have repeatedly used this to prevent a mandate to negotiate binding obligations.
The consensus rule is the key structural barrier. Both Russia and the U.S. have blocked progress on binding negotiations for different reasons.
2. Why was China's signature on the Bletchley Declaration considered diplomatically significant?
Correct. Beijing's participation was notable given its previous resistance to Western-led technology governance norms — though the Declaration remained non-binding and contained no verification provisions.
China made no binding commitments at Bletchley — the significance was its presence and signature on a non-binding statement acknowledging catastrophic risk, given prior patterns of non-participation.
3. The U.S. Political Declaration on Responsible Military Use of AI and Autonomy explicitly addressed which particularly sensitive issue?
Correct. The Declaration's explicit statement on nuclear command and control was significant — reflecting growing concern about AI being integrated into early-warning and launch-authorization systems.
The Declaration's nuclear provision was one of its most substantive elements, committing signatories to preserve human judgment in nuclear C2 — a direct response to AI integration concerns.
4. What pattern unifies all five AI/LAWS frameworks listed in the lesson table?
Correct. Every existing framework — CCW GGE principles, OECD principles, U.S. Political Declaration, Bletchley Declaration, Seoul Statement — is non-binding and verification-free. This pattern reflects both the technical difficulty and the political unwillingness to accept binding constraints.
The consistent pattern across all five is that none is legally binding and none has a verification mechanism — reflecting both technical difficulty and political resistance from major military powers.

Lab 2: Diagnosing Framework Failure

Structured conversation · ~3 exchanges to complete

Your Mission

You are a researcher preparing a policy brief on why AI arms control efforts have stalled. The AI advisor will play devil's advocate — defending existing frameworks as "better than nothing." Your task is to diagnose the specific structural failures that have prevented binding agreements, using the CCW GGE and Bletchley processes as primary case studies.

Begin by arguing why existing non-binding frameworks are insufficient. Reference the CCW consensus rule, the definition problem, and the pattern of major-power non-participation where relevant.
AI Policy Advisor
Framework Analysis
I'll take the contrarian position: non-binding frameworks like the Bletchley Declaration and the CCW GGE guiding principles are valuable even without verification mechanisms. They build shared vocabulary, establish norms, and create political costs for obvious violations. Why do you think that's wrong — and what would a binding framework actually add that norms alone cannot provide?
Module 5 · Lesson 3

Proposed Approaches: What Experts Are Actually Debating

Compute governance, behavioral norms, red lines, and the case for meaningful human control.
If traditional verification is impossible for AI, what alternative mechanisms might actually work — and what are their specific failure modes?

When the Biden administration issued Executive Order 14110 on AI safety in October 2023, buried in its 111 pages was a requirement that cloud providers report when foreign customers rent computing clusters above a threshold level. The logic was precise: training frontier AI models requires massive compute. You cannot train GPT-4-scale systems on a laptop. If you control the compute, you control access to the capability — without needing to inspect the model itself.

Compute Governance: Monitoring the Hardware Layer

The most technically sophisticated arms control proposal currently circulating in policy circles is compute governance — the idea that since training advanced AI requires specialized semiconductor hardware (primarily Nvidia H100-class GPUs and TPUs), controlling the manufacture, sale, and operation of that hardware is more tractable than controlling software.

The logic has three steps. First, frontier model training is compute-constrained: current leading models require clusters of thousands of specialized chips running for months. Second, these chips are manufactured by a small number of firms (primarily TSMC in Taiwan) using equipment from an even smaller number of suppliers (ASML in the Netherlands dominates extreme ultraviolet lithography). Third, the U.S. export control regime, expanded in October 2022 and October 2023, already restricts export of advanced AI chips to China — demonstrating that compute governance has partial precedent.

Researchers at the Centre for the Governance of AI and Oxford's Future of Humanity Institute have proposed embedding on-chip monitoring hardware in advanced AI accelerators — essentially a trusted reporting module that logs compute usage and reports to an international registry without revealing the content of computations. This would allow verification of whether a state is training models above a threshold size, analogous to IAEA safeguards on nuclear material.

Limitation: The Inference Problem

Compute governance can potentially monitor training — the most resource-intensive phase. But deploying an already-trained model (inference) requires far less compute. A state could train a system covertly before controls are implemented, then run it indefinitely on modest hardware. The October 2022 U.S. chip export restrictions came years after China had already acquired significant compute stockpiles.

Behavioral Red Lines: Prohibiting Specific Applications

A second approach, advocated by legal scholars and humanitarian organizations, focuses not on capabilities but on specific prohibited behaviors. Rather than banning "autonomous AI" (difficult to define), a treaty could prohibit specific applications: autonomous targeting of humans without human confirmation, AI systems in nuclear launch chains, or autonomous cyber attacks on critical civilian infrastructure.

This approach draws on the Chemical Weapons Convention model — rather than banning all chemistry, the CWC bans specific chemicals and their weaponized use. Behavioral red lines are easier to define than capability thresholds and easier to attribute when violated.

The International Committee of the Red Cross has advocated specifically for a rule requiring human control over the decision to use force against persons — effectively banning fully autonomous lethal targeting. This framing aligns with international humanitarian law's requirement of distinction (combatants from civilians) and proportionality, which critics argue autonomous systems cannot reliably perform.

Meaningful Human Control: The IHL Anchor

The concept of meaningful human control (MHC) has emerged as a possible normative anchor. Proposed by the Campaign to Stop Killer Robots and elaborated by scholars including Heather Roff and Richard Moyes, MHC requires that a human operator understands the context of an attack, can intervene to prevent it, and retains moral and legal responsibility for the outcome.

MHC is intentionally vague enough to command broad support while precise enough to exclude "fire-and-forget" fully autonomous systems. States including the Netherlands and Austria have endorsed MHC language in CCW negotiations. The United States has resisted it, arguing that "appropriate human judgment" — its preferred formulation — is more operationally realistic and avoids prohibiting beneficial automation such as missile defense.

The practical gap between MHC and U.S. policy is smaller than it appears: U.S. Department of Defense Directive 3000.09, revised in January 2023, requires senior official approval for any autonomous weapons system that falls outside defined parameters — but explicitly permits autonomous functions within those parameters, which critics argue is functionally indistinguishable from machine-speed autonomous engagement.

Confidence-Building Measures and Risk Reduction

A third category, more modest in ambition, focuses on confidence-building measures (CBMs) — steps short of binding limits that reduce the risk of accidental conflict. These include: notification when deploying autonomous systems in proximity to adversary forces; shared incident reporting channels for AI-related military accidents; agreements to maintain human control over nuclear command and control regardless of AI integration elsewhere; and agreed technical standards for autonomous system behavior in contested environments.

CBMs have a strong historical precedent. The 1972 U.S.-Soviet Incidents at Sea Agreement reduced dangerous naval encounters without limiting either fleet. The 1987 Accident Measures Agreement established procedures for notifying nuclear incidents. Analogous AI-focused CBMs could reduce escalation risk even without constraining development.

The Spectrum of Ambition

AI arms control proposals range from maximally ambitious (binding ban on all autonomous lethal systems, verified by compute monitoring) to modestly practical (notification agreements and shared incident channels). The more ambitious the proposal, the more verification it requires — and the less likely major powers are to accept it. The pragmatic question is where on this spectrum meaningful risk reduction can actually be achieved.

Compute governance Controlling advanced AI chip manufacture, export, and operation as a proxy for controlling frontier AI development capability.
Meaningful human control (MHC) The requirement that a human operator understands context, can intervene, and retains legal responsibility before lethal force is used — the primary normative anchor in LAWS debates.
Confidence-building measures (CBMs) Steps short of binding limits — notifications, incident channels, shared standards — that reduce the risk of accidental conflict without constraining development.

Lesson 3 Quiz

Proposed Approaches — 4 questions
1. What is the core logic of compute governance as an arms control mechanism?
Correct. Since frontier AI training is compute-constrained and chips come from very few manufacturers (primarily TSMC using ASML equipment), the hardware layer is a more verifiable control point than the software layer.
Compute governance targets the hardware layer — the specialized chips required for frontier training — not software. The key insight is that you cannot train a GPT-4-scale system on ordinary hardware.
2. What is the "inference problem" as a limitation of compute governance?
Correct. Once a model is trained, running it (inference) requires far less compute. A state could train systems before restrictions and then operate them on modest hardware indefinitely — compute governance cannot retroactively constrain acquired capability.
The inference problem is temporal: training is compute-intensive and monitorable, but deploying an already-trained model is cheap. China's compute stockpiles accumulated before 2022 restrictions illustrate this.
3. How does the "meaningful human control" concept differ from the U.S. DoD's preferred formulation of "appropriate human judgment"?
Correct. MHC requires contextual understanding and intervention capability — the U.S. formulation allows autonomous engagement within pre-set parameters, which critics argue is functionally autonomous targeting even if a human authorized the parameters in advance.
The distinction matters: MHC requires active human understanding and intervention capability at the point of use. "Appropriate human judgment" can be satisfied by pre-authorization, which critics say permits machine-speed autonomous engagement.
4. Which historical agreement best illustrates the value of confidence-building measures without binding limits?
Correct. The Incidents at Sea Agreement reduced escalation risk through behavioral rules and communication protocols without either side giving up ships or capabilities — a model for AI-focused CBMs.
The 1972 U.S.-Soviet Incidents at Sea Agreement is the classic CBM model — behavioral rules for close naval encounters that reduced risk without constraining fleet size. This is directly analogous to proposed AI notification and incident-reporting agreements.

Lab 3: Designing a Compute Governance Regime

Structured conversation · ~3 exchanges to complete

Your Mission

You are advising a state proposing a multilateral compute governance regime at an international AI safety forum. Your task is to design the key elements of such a regime — thresholds, monitoring mechanisms, and enforcement — while addressing the inference problem and the objection that compute controls are technologically nationalist rather than genuinely safety-oriented.

Propose specific elements of a compute governance treaty: what threshold triggers reporting? Who monitors? How is the inference gap handled? The AI advisor will push back on feasibility and political economy.
AI Policy Advisor
Compute Governance Design
Let's design a compute governance regime. Before you propose specifics, I want to flag the political economy problem: compute governance looks suspiciously like the United States and its allies using safety language to entrench their semiconductor lead over China. How would you design a regime that's genuinely multilateral rather than a disguised technology-containment tool — and what specific compute threshold would trigger reporting obligations? Start there.
Module 5 · Lesson 4

The U.S.-China Dimension and the Path Forward

No AI arms control regime can work without both major powers. What does that conversation actually look like?
What has actually happened in U.S.-China dialogue on AI risk reduction, and what structural barriers remain?

On the sidelines of the APEC summit, Presidents Biden and Xi met for four hours at the Filoli Estate. Among the outcomes: an agreement to resume military-to-military communications suspended after Nancy Pelosi's Taiwan visit — and a commitment to convene government-to-government talks on AI risk. It was the first explicit bilateral commitment by the world's two leading AI powers to discuss the technology's risks. The talks were not arms control. They were not binding. But they were a beginning.

The U.S.-China AI Dialogue: What Happened

Following the Biden-Xi commitment, the first formal U.S.-China AI government-to-government talks were held in Geneva in May 2024. The State Department described them as "substantive and candid." No joint statement was issued. No agreements were announced. But the meeting represented the first structured bilateral exchange on AI safety between the two governments — analogous to the earliest Soviet-American strategic stability talks before any treaties existed.

The structural context is deeply challenging. U.S.-China relations are characterized by strategic competition, mutual distrust over technology transfer, and active military competition in the Taiwan Strait and South China Sea. The October 2022 U.S. chip export controls were explicitly designed to degrade China's AI capabilities — making it difficult to simultaneously ask China to participate in cooperative AI governance arrangements.

China's domestic AI governance framework — including its 2023 Generative AI Regulations and the Cyberspace Administration of China's algorithm governance rules — is sophisticated but focuses primarily on domestic content control and social stability rather than international security. Beijing's stated position favors "multilateral" rather than "U.S.-led" AI governance, making it resistant to frameworks where Washington sets the terms.

Nuclear Lessons for U.S.-China AI Talks

The history of U.S.-Soviet arms control offers an instructive parallel. The first strategic stability talks began in 1969; the first treaty (SALT I) was signed in 1972; the first treaty with real reductions (INF) came in 1987. The process took eighteen years from initial dialogue to binding limits — and proceeded through multiple crises, betrayed agreements, and periods of complete breakdown.

Several specific nuclear precedents are relevant. The 1963 Limited Test Ban Treaty was achievable before comprehensive verification was possible because it banned only atmospheric testing — a behavior detectable by seismic monitoring and radiation sensors. Analogously, some AI behaviors might be verifiable (training above a compute threshold, deploying AI in nuclear C2) before general AI capability limits are tractable.

The 1972 Anti-Ballistic Missile Treaty succeeded partly because both sides calculated that unconstrained ABM deployment would be mutually destabilizing — a shared strategic interest in stability that transcended political tensions. A similar logic may apply to AI: both the U.S. and China have reason to fear accidental escalation from autonomous systems misidentifying threats. That shared interest in avoiding inadvertent war is the foundation any AI arms control must build on.

The Missing Ingredient: Verification Culture

U.S.-Soviet arms control succeeded partly because both sides developed what scholars call a "verification culture" — shared technical understanding, agreed data exchanges, and mutual confidence that violations would be detected. U.S.-China AI talks are starting from a baseline of near-zero mutual transparency on military AI programs. Building verification culture takes time and usually requires small initial agreements that build confidence before ambitious ones become possible.

What a Realistic Near-Term Agenda Looks Like

Most scholars working in this space converge on a realistic near-term agenda that prioritizes achievable risk reduction over transformative arms control. The key elements:

1. Nuclear AI firewall. A bilateral or multilateral agreement that AI systems will not be integrated into nuclear launch authorization chains — and that early-warning systems will maintain human confirmation requirements. This is arguably the most urgent risk because AI false positives in early-warning could trigger inadvertent nuclear launch. The U.S. Political Declaration already endorsed this principle; getting Russia and China to agree bilaterally would be the next step.

2. Incident reporting channel. A dedicated channel for rapid communication when AI-related military incidents occur — comparable to the 1963 Hotline but specifically scoped to autonomous system incidents. This addresses the speed-of-action problem by ensuring human decision-makers can communicate even when systems are operating at machine speed.

3. Shared definitions working group. An ongoing technical forum to develop shared definitions of autonomous systems, agreed taxonomies of autonomy levels, and common vocabulary for negotiations — addressing the definitional paralysis that has stalled CCW talks for a decade.

4. Compute transparency pilot. A voluntary transparency measure in which major AI-developing states report compute usage above a threshold to a neutral registry — not mandatory limits, but data collection that could eventually support verification.

The Long View

Effective AI arms control — if it comes — will likely follow the nuclear model: decades of iterative dialogue, small initial agreements, and treaty architecture built on verified confidence. The window for early agreements may be narrowing as autonomous systems are deployed and strategic interests become more entrenched. The urgency is real; so is the complexity. The gap between them defines the central challenge of AI governance for the next generation of policymakers.

Nuclear AI firewall A proposed commitment that AI systems will not be integrated into nuclear launch authorization chains and that human confirmation requirements will be maintained in early-warning systems.
Verification culture The shared technical understanding, data exchange practices, and mutual confidence that violations would be detected — a prerequisite for functioning arms control that took decades to develop in U.S.-Soviet relations.
Compute transparency pilot A proposed voluntary measure in which states report AI training compute above a threshold to a neutral registry — a confidence-building step preceding formal verification.

Lesson 4 Quiz

The U.S.-China Dimension and the Path Forward — 4 questions
1. What was the significance of the Biden-Xi meeting at the Filoli Estate in November 2023 for AI governance?
Correct. The Filoli meeting produced a commitment to resume mil-to-mil communications and to hold AI risk talks — significant as a first step, not because any binding agreements resulted.
No binding agreements resulted from Filoli. Its significance was the bilateral commitment to structured AI risk dialogue — the first of its kind between the two leading AI powers.
2. Why does the U.S. chip export control policy complicate AI arms control diplomacy with China?
Correct. When the U.S. uses technology controls to strategically disadvantage China, it undermines the credibility of U.S. invitations to participate in "mutually beneficial" AI governance — from Beijing's perspective, it looks like rules designed by a competitor.
The credibility problem is central: simultaneously degrading China's AI capabilities through export controls and asking China to participate in U.S.-shaped governance frameworks creates an obvious tension that Beijing has exploited diplomatically.
3. What characteristic made the 1963 Limited Test Ban Treaty achievable before comprehensive verification was possible — and what is its AI analogy?
Correct. The LTBT was achievable because atmospheric tests are physically detectable without intrusive inspection. Similarly, AI arms control might start with prohibitions on behaviors that leave detectable signatures — like AI integration in nuclear C2 or training above a compute threshold.
The LTBT's key feature was banning a verifiable behavior (atmospheric testing, detectable by sensors) rather than requiring intrusive internal inspection. This is the model for early AI agreements: prohibit what you can detect.
4. Why is the "nuclear AI firewall" considered the most urgent near-term AI arms control priority by most scholars in this field?
Correct. An AI system falsely identifying an incoming nuclear attack — and triggering launch before a human can intervene — represents the most catastrophic plausible AI risk. This makes the nuclear firewall the highest-priority constraint regardless of verification difficulties.
The nuclear AI firewall's urgency comes from catastrophic tail risk: an AI false positive in early warning could trigger inadvertent nuclear exchange. No other AI arms control failure has equivalent consequences.

Lab 4: Negotiating the Nuclear AI Firewall

Structured conversation · ~3 exchanges to complete

Your Mission

You are a U.S. State Department official in a backchannel dialogue with a Chinese counterpart. Your task is to draft language for a bilateral joint statement committing both states to maintain human confirmation requirements in nuclear early-warning systems — without allowing either side to verify the other's internal systems or concede any strategic position.

Draft the key operative clause of such a statement. The AI advisor plays your Chinese counterpart, who insists that any agreed language must not allow U.S. inspection of Chinese nuclear command infrastructure and must be reciprocally applicable to U.S. systems including AI-enhanced early-warning.
AI Policy Advisor
Nuclear AI Firewall Negotiation
I appreciate the U.S. initiative on this issue. However, China has serious concerns. Any joint statement must be genuinely reciprocal — it cannot implicitly allow the United States to define compliance standards in ways that apply to China's systems but not to AEGIS, the Space Fence, or your own AI-enhanced early-warning satellites. Additionally, we will not accept language that could be interpreted as an invitation for U.S. technical personnel to review Chinese nuclear C2 architecture. Please draft proposed operative language — and explain how it satisfies our reciprocity requirement without creating verification provisions we cannot accept.

Module 5 Test

Arms Control for AI — 15 questions · 80% to pass
1. What fundamental property of nuclear weapons made them more amenable to arms control verification than AI systems?
Correct. The physical detectability of missiles, silos, and warheads enabled satellite and on-site verification — AI code has no equivalent signature.
Physical detectability is the key: missiles and warheads occupy fixed geography with observable signatures. AI weights have no equivalent physical presence.
2. Why is the Biological Weapons Convention the most instructive cautionary precedent for AI arms control?
Correct. The BWC's verification failure — most dramatically exposed by the Soviet Biopreparat program — directly parallels the AI verification problem: dual-use, concealable, no physical signature.
The BWC analogy works because both involve knowledge-control problems with no reliable detection method — and the Soviet Biopreparat violations demonstrate what happens when verification is absent.
3. DARPA's 2019 AlphaDogfight AI defeated experienced human F-16 pilots 5-0. What is the primary arms control implication of this result?
Correct. Machine-speed engagement means conflicts involving autonomous systems could progress faster than any human-in-the-loop arms control architecture can respond to.
The key implication is temporal: arms control assumes human reaction times. AI agents operating 8x faster than humans eliminate the decision windows that crisis communication protocols depend on.
4. The CCW's last successful Protocol addition was the 1995 ban on blinding lasers. What structural feature has prevented similar progress on LAWS since 2014?
Correct. Consensus decision-making is the structural barrier — both Russia and the U.S. have exercised their effective vetoes over binding LAWS treaty negotiations for different strategic reasons.
The consensus rule is the key structural barrier. Russia argues LAWS are covered by existing IHL; the U.S. resists binding constraints on advantageous systems. Either state can block progress.
5. What made China's signature on the 2023 Bletchley Declaration diplomatically notable?
Correct. Beijing's participation broke a pattern of non-engagement with Western-shaped AI governance norms — even though the Declaration itself contained no binding commitments or verification measures.
The significance was China's presence and signature, not any substantive commitment. It broke a pattern of non-participation in Western-framed tech governance while making no binding concessions.
6. The U.S. Political Declaration on Responsible Military AI and Autonomy was endorsed by 52 states. Which notable omission most limits its effectiveness?
Correct. A declaration on responsible military AI signed by 52 states but not by Russia and China — the two primary strategic competitors — has limited effect on the most consequential military AI programs.
The critical gap is Russia and China's absence. Fifty-two signatories are significant, but the declaration's impact on global military AI development is constrained by the non-participation of the two states most likely to deploy competing autonomous systems.
7. What is the core logic of on-chip monitoring hardware as proposed for compute governance?
Correct. The proposal is for non-intrusive monitoring: report compute volume to a registry without exposing what is being computed — privacy-preserving verification analogous to nuclear material accounting.
On-chip monitoring would log usage statistics and report to a neutral registry without revealing computation content — a privacy-preserving approach that parallels IAEA material accounting for nuclear fuel.
8. Why does the inference problem undermine compute governance as a comprehensive arms control strategy?
Correct. Compute controls can only constrain future training. Any system trained before controls take effect can be run indefinitely on inexpensive hardware — making compute governance a partial rather than comprehensive solution.
The temporal gap is the problem: training is compute-intensive and monitorable; inference is cheap and already distributed. China's pre-2022 compute stockpiles illustrate how this limits any future control regime.
9. The ICRC has advocated specifically for which prohibition in autonomous weapons debates?
Correct. The ICRC's position focuses on the human decision to use lethal force against persons — rooted in IHL's distinction and proportionality principles, which critics argue autonomous systems cannot reliably implement.
The ICRC specifically targets the decision to use force against persons, not autonomy generally. This framing is grounded in IHL — distinction (combatant vs. civilian) and proportionality are legal requirements that critics say machines cannot satisfy.
10. What is the critical difference between U.S. DoD Directive 3000.09's "appropriate human judgment" standard and meaningful human control as defined by the Campaign to Stop Killer Robots?
Correct. The gap is between active human engagement at point of use (MHC) and pre-authorized autonomous engagement within parameters (DoD 3000.09). Critics argue the latter is functionally autonomous targeting even if humans set the parameters in advance.
The critical distinction: MHC requires active human understanding and intervention ability when force is used. DoD 3000.09 allows parameters to be set in advance, permitting machine-speed autonomous engagement within those parameters — which critics argue defeats the purpose of human control.
11. The 1972 Incidents at Sea Agreement is cited as a model for AI confidence-building measures. What made it effective despite containing no limits on fleet size?
Correct. Behavioral rules for dangerous interactions reduced accident risk without requiring capability limits or verification — directly analogous to proposed AI incident-reporting channels and autonomous system notification agreements.
The INCSEA Agreement worked through behavioral rules and communication, not capability limits. This is the CBM model: reduce the risk of dangerous encounters without either side conceding strategic position.
12. What pattern in U.S.-Soviet nuclear arms control history most directly informs realistic expectations for U.S.-China AI talks?
Correct. Eighteen years from first dialogue to binding reductions — with multiple crises, betrayed agreements, and small initial steps building toward larger treaties — is the realistic template for U.S.-China AI arms control.
The 18-year U.S.-Soviet trajectory from initial talks (1969) to real reductions (INF, 1987) is the relevant historical template — not a quick treaty. Iterative dialogue, small initial agreements, and patience are the model.
13. What characteristic of the 1963 Limited Test Ban Treaty made it achievable before comprehensive verification was possible, and what is its direct AI analogy?
Correct. Ban what you can detect. Atmospheric tests produce seismic signals and radiation detectable by external sensors. The AI analogy is banning behaviors with external signatures — compute-intensive training runs or AI signals in nuclear command communications.
The LTBT's key feature was restricting a verifiable behavior rather than requiring intrusive inspection. "Ban what you can detect" is the strategic principle — applied to AI, it means targeting behaviors with observable signatures rather than attempting to verify internal capabilities.
14. Why do scholars consider the nuclear AI firewall the most urgent near-term arms control priority?
Correct. No other plausible AI arms control failure approaches the consequences of an AI-triggered inadvertent nuclear launch. Catastrophic tail risk, not technical tractability, drives the prioritization.
Urgency comes from catastrophic tail risk, not technical feasibility. An AI false positive triggering nuclear launch before any human can intervene is the worst-case scenario — which is why the firewall, however difficult to verify, is the highest priority.
15. What is "verification culture" and why is its absence a fundamental obstacle to U.S.-China AI arms control?
Correct. Verification culture is not a legal document — it is a relationship: shared technical vocabulary, practiced data exchange, and mutual confidence that defection would be noticed. The U.S. and China have almost none of this, while the U.S. and USSR built it over decades of interaction before major treaties became possible.
Verification culture is a relationship, not a legal structure. It requires shared vocabulary, practiced transparency, and confidence that cheating would be detected — developed through iterative interaction over years. U.S.-China AI relations are at approximately Year Zero of this process.