L1
Β·
Quiz
Β·
Lab
L2
Β·
Quiz
Β·
Lab
L3
Β·
Quiz
Β·
Lab
L4
Β·
Quiz
Β·
Lab
Module Test
Module 6 Β· Lesson 1

National AI Policies and Regulatory Frameworks

How governments are writing the rules for AI β€” from the EU AI Act to executive orders.
Who actually decides what AI is allowed to do β€” and how do those decisions get made?

In a single week of October 2023, two landmark documents landed within days of each other. The European Parliament's negotiators reached a provisional agreement on the EU AI Act, the world's first comprehensive AI law. Three days later, U.S. President Biden signed Executive Order 14110 on Safe, Secure, and Trustworthy AI β€” the most detailed federal directive on AI governance in U.S. history. Neither document mentioned the other. Both were urgent. Neither was sufficient alone.

The Spectrum of Regulatory Approaches

Governments have approached AI governance through three broad models: hard law (binding statutes with penalties), soft law (voluntary frameworks, guidelines, standards bodies), and sectoral regulation (applying existing rules β€” medical device law, financial regulation β€” to AI systems in specific domains). Most jurisdictions use some combination of all three.

The EU AI Act, which entered into force in August 2024, is the most ambitious hard-law effort. It classifies AI systems by risk level. Unacceptable-risk systems β€” such as real-time biometric surveillance in public spaces and social-scoring systems β€” are banned outright. High-risk systems, including AI used in credit scoring, hiring, medical diagnosis, and critical infrastructure, face mandatory conformity assessments, human oversight requirements, and transparency obligations before deployment. Lower-risk systems carry disclosure requirements but are otherwise lightly regulated. General-purpose AI models (GPAIs) with systemic risk β€” defined by compute thresholds β€” face additional obligations including red-team evaluations and incident reporting.

The U.S. approach under Biden's Executive Order 14110 worked differently. Rather than legislation, it used presidential authority to direct federal agencies. The Order required developers of frontier AI models trained above a specific compute threshold to share safety test results with the government before public deployment. It tasked the National Institute of Standards and Technology (NIST) with developing evaluation tools and tasked the Department of Commerce with establishing reporting requirements. When the Trump administration rescinded EO 14110 in January 2025, it directed agencies to draft a replacement strategy oriented around "AI dominance" rather than safety-first framing β€” illustrating how dramatically domestic AI policy can shift across administrations.

Policy Detail

The EU AI Act's compute threshold for "systemic risk" GPAI designation was set at 10²⁡ floating-point operations (FLOPs) of training compute β€” roughly the scale of GPT-4. Models above this threshold face adversarial testing, cybersecurity requirements, and mandatory incident reporting to the European AI Office, a new regulatory body established within the European Commission.

China's Layered Approach

China has taken a technology-specific approach, issuing separate regulations for different AI capabilities rather than one omnibus law. The Cyberspace Administration of China (CAC) issued rules on algorithmic recommendation systems in 2022, deep synthesis (deepfakes) in 2022, and generative AI services in August 2023. The generative AI rules require content to reflect "core socialist values," mandate labeling of AI-generated material, and hold service providers liable for user-generated content that violates rules. A key feature: the regulations apply to services "provided to the public within the territory of China," covering foreign providers serving Chinese users.

The UK's post-Brexit approach explicitly rejected a new AI-specific statute. The 2023 AI Safety White Paper proposed that existing sector regulators β€” financial conduct, medicines, competition β€” apply their domain expertise to AI within their remit, coordinated by a new central function. The UK also hosted the first AI Safety Summit at Bletchley Park in November 2023, convening 28 governments and major AI companies to sign the Bletchley Declaration β€” acknowledging frontier AI risks and committing to international cooperation on evaluation.

Key Terms
Risk-Based ClassificationRegulatory approach that assigns different compliance obligations based on the potential harm level of an AI application, rather than treating all AI identically.
Conformity AssessmentPre-deployment evaluation process (sometimes by third-party auditors) verifying that a high-risk AI system meets legal requirements before it can be sold or deployed.
Compute ThresholdA numeric training-compute cutoff used to define which AI models qualify as frontier or systemic-risk, triggering additional regulatory obligations.
Sectoral RegulationGovernance strategy applying existing domain-specific laws (medical devices, financial services, aviation) to AI systems operating in those sectors, rather than creating new AI-specific statutes.
Why It Matters for Alignment

Regulation shapes what alignment work gets done. When the EU AI Act mandates that high-risk systems include human oversight mechanisms, it creates legal demand for technical solutions to the oversight problem. When the U.S. requires safety testing before deployment, it creates pressure on labs to develop evaluation methods. Policy and technical research are not separate tracks β€” they set each other's agenda.

Apr 2021
EU AI Act proposal published by European Commission β€” first comprehensive AI regulatory framework.
Mar 2022
NIST AI Risk Management Framework (AI RMF) draft released; final version published Jan 2023.
Aug 2023
China's Generative AI Regulations take effect β€” first national rules specifically targeting LLM services.
Oct 2023
U.S. Executive Order 14110 signed; EU AI Act political agreement reached same week.
Nov 2023
Bletchley Declaration signed by 28 governments at the first AI Safety Summit.
Aug 2024
EU AI Act enters into force with phased implementation over 12–36 months.
Jan 2025
EO 14110 rescinded by incoming U.S. administration; replacement AI strategy ordered.
Lesson 1 Β· Quiz

National Policies Check

Three questions β€” select the best answer for each.
1. The EU AI Act classifies AI systems that perform real-time biometric surveillance in public spaces as which risk category?
Correct. Real-time public biometric surveillance falls under the "unacceptable risk" category and is prohibited outright under the EU AI Act, along with social-scoring systems and AI that exploits psychological vulnerabilities.
Not quite. The EU AI Act reserves its harshest category β€” outright prohibition β€” for real-time biometric surveillance in public spaces. High-risk systems face compliance requirements but are not banned.
2. What did U.S. Executive Order 14110 (October 2023) require from developers of frontier AI models?
Correct. EO 14110 used the Defense Production Act to require developers of models trained above a specific compute threshold to share safety evaluation results with the U.S. government prior to deployment.
Not quite. The Order required frontier AI developers to share safety test results with the government before deployment β€” a significant reporting obligation, though not a ban on releases.
3. China's Generative AI Regulations (2023) hold service providers liable for what type of content?
Correct. China's generative AI rules extend provider liability to user-generated content posted through their platforms β€” a broader scope than many Western frameworks, which typically exempt platforms from liability for user content.
Not quite. China's rules go further than most Western frameworks by holding providers liable for user-generated content that violates the regulations, not only for AI outputs.
Lesson 1 Β· Lab

Regulatory Design Workshop

Explore AI regulatory frameworks with your AI policy advisor.

Your Task

You are advising a mid-sized country drafting its first AI law. Your AI policy advisor can help you think through the trade-offs between different regulatory models β€” risk-based classification, sectoral regulation, compute thresholds, and voluntary frameworks.

Ask about specific provisions in the EU AI Act, compare approaches across jurisdictions, or explore the practical enforcement challenges any of these frameworks face.

Start here: "What are the main arguments for and against adopting a risk-based classification approach like the EU AI Act, versus a sectoral approach like the UK's?"
AI Policy Advisor
Regulatory Frameworks
Welcome to the regulatory design workshop. I'm your AI policy advisor β€” I can help you think through the trade-offs in different governance approaches. What aspect of AI regulation would you like to explore?
Module 6 Β· Lesson 2

International Coordination and the Summit Process

From Bletchley to Seoul to Paris β€” how nations are building (and struggling to build) shared safety standards.
Can countries that compete intensely in AI development still cooperate meaningfully on AI safety?

The venue was chosen deliberately. Bletchley Park, where Alan Turing's team broke the Enigma cipher during World War II, now hosted representatives of 28 governments β€” including the United States, China, and the European Union β€” alongside executives from OpenAI, Google DeepMind, Anthropic, and Meta. The Bletchley Declaration they signed was modest in commitments but historic in composition: it was the first time China and the U.S. had co-signed a document acknowledging that frontier AI posed potentially catastrophic risks and required international cooperation.

What the Summit Process Has Produced

The Bletchley Summit launched a series of intergovernmental AI safety meetings. The second AI Safety Summit was held in Seoul in May 2024, producing the Seoul Statement of Intent β€” a commitment by 16 AI companies to publish their safety frameworks and conduct pre-deployment evaluations for their most capable models. This was the first time major private-sector AI developers made explicit, public commitments on safety evaluation methodology. The Seoul Summit also announced a network of government AI Safety Institutes that would cooperate on technical evaluations.

The third summit, the Paris AI Action Summit of February 2025, shifted the emphasis. The headline outcome was a broad communiquΓ© on "AI for humanity" that emphasized economic opportunity alongside safety, signed by over 60 countries. Notably, the United States and United Kingdom did not sign the communiquΓ© β€” a signal of changing priorities in both governments. The Paris Summit also saw the first formal exercises between the U.S. AI Safety Institute (housed at NIST) and its international counterparts.

The AI Safety Institutes Network

By mid-2025, the United Kingdom, United States, Japan, Canada, France, Germany, South Korea, Singapore, and Australia had all established national AI Safety Institutes or equivalent bodies. The UK AI Safety Institute (AISI, later renamed the AI Security Institute) was first, established in October 2023 with a mandate to conduct pre-deployment evaluations of frontier models. These institutes have begun sharing evaluation methodologies and red-team findings, forming a de facto international technical network even where formal treaties don't exist.

The G7 Hiroshima AI Process

Parallel to the summit process, the G7 countries launched the Hiroshima AI Process in 2023, culminating in the International Code of Conduct for Advanced AI Systems released in October 2023 β€” the same week as EO 14110 and the EU AI Act political agreement. The Code of Conduct listed 11 guiding principles for frontier AI developers, covering transparency, bias testing, security vulnerabilities, and post-deployment monitoring. It was voluntary β€” a soft-law instrument β€” but it represented the first multilateral agreement on developer conduct across G7 jurisdictions.

The OECD AI Policy Observatory has tracked AI governance measures across its member countries since 2019. Its data shows that over 70 countries had adopted or were developing national AI strategies by 2024, but fewer than a dozen had enacted binding AI-specific legislation. The gap between strategy documents and enforceable rules remains wide.

Structural Obstacles to International Coordination

Several structural factors make binding international AI agreements difficult. First, definitional disagreement: countries disagree on what counts as a "frontier" or "high-risk" AI system, making it hard to agree on what should be regulated. Second, competitive dynamics: the U.S., China, and EU each see AI leadership as a strategic priority, creating incentives to avoid rules that might disadvantage their developers. Third, verification problems: unlike nuclear weapons treaties, there is no clear equivalent of an inspection regime for AI models β€” it is technically difficult to verify whether a model has been trained or deployed in violation of agreed-upon limits. Fourth, speed mismatch: AI capabilities advance faster than treaty negotiation timelines typically allow.

The mechanisms that have made progress tend to be softer: voluntary commitments, shared evaluation frameworks, information sharing among safety institutes, and bilateral technical dialogues. The U.S.-China AI talks held in Geneva in May 2024 β€” the first formal intergovernmental AI safety dialogue between the two countries β€” focused on establishing communication channels rather than making specific commitments, but were significant precisely because they existed at all.

Coordination vs. Competition

One recurring pattern in international AI governance: the same countries that cooperate on safety at summits compete aggressively on capabilities in their domestic AI investment strategies. Whether this cooperation-competition duality is sustainable β€” or whether one dynamic will eventually overwhelm the other β€” is one of the central questions for the field.

Bletchley DeclarationNovember 2023 joint statement by 28 governments acknowledging frontier AI risks and committing to international information sharing β€” notable for including both the U.S. and China.
AI Safety Institute (AISI)Government body dedicated to technical evaluation of frontier AI systems. The UK's was first (2023); the U.S., Japan, and others followed. These institutes form an international evaluation network.
Hiroshima AI ProcessG7 initiative producing the International Code of Conduct for Advanced AI Systems β€” a voluntary set of developer principles adopted in October 2023.
Verification ProblemThe challenge that, unlike nuclear arms treaties, there is no established technical or inspection mechanism to verify whether an AI model was trained or deployed in compliance with international agreements.
Lesson 2 Β· Quiz

International Coordination Check

Three questions on the global summit process.
1. The Bletchley Declaration (November 2023) was historically significant primarily because:
Correct. The Bletchley Declaration's political significance was that both the U.S. and China co-signed it β€” the first joint acknowledgment that frontier AI posed potentially catastrophic risks requiring international cooperation between the two competing powers.
Not quite. The Declaration had no binding force and created no new institutions. Its significance was that it represented the first U.S.-China joint acknowledgment of frontier AI as a shared risk requiring cooperation.
2. What was the distinctive commitment made by AI companies at the Seoul AI Safety Summit (May 2024)?
Correct. The Seoul Statement of Intent had 16 major AI companies β€” including OpenAI, Google, Anthropic, Meta, and others β€” commit to publishing their safety frameworks and conducting pre-deployment evaluations, the first such explicit public industry commitment.
Not quite. The Seoul commitments were softer: companies agreed to publish their safety frameworks and conduct pre-deployment evaluations β€” significant as public commitments, but voluntary rather than binding.
3. Which of the following is cited as a structural obstacle making binding international AI agreements difficult?
Correct. The verification problem β€” the absence of an inspection regime analogous to nuclear arms treaties β€” is a core structural obstacle. It is technically difficult to determine whether a model was trained or deployed in violation of an international agreement.
Not quite. The verification problem is key: unlike nuclear weapons, there's no established way to inspect or verify whether an AI model was developed in compliance with any international agreement β€” making binding commitments much harder to sustain.
Lesson 2 Β· Lab

Summit Design Challenge

Work through international coordination problems with an AI diplomacy advisor.

Your Task

You are helping design the agenda for an upcoming international AI safety summit. Your AI diplomacy advisor can help you think through what kinds of agreements are achievable, what the verification challenges are, and how to structure productive dialogue between countries with competing interests.

Explore specific proposals, ask about what has and hasn't worked in past summits, or push on the hardest coordination problems.

Start here: "Given the verification problem, what kinds of international AI agreements are actually achievable β€” and which ambitious proposals are likely to fail?"
AI Diplomacy Advisor
International Coordination
Welcome to the summit design lab. I'm your AI diplomacy advisor β€” I can help you think through what's politically and technically achievable in international AI governance. What aspect of the coordination challenge would you like to explore?
Module 6 Β· Lesson 3

Industry Self-Governance and Voluntary Commitments

Responsible scaling policies, safety frameworks, and the limits of voluntary pledges.
When companies write their own safety rules, what do those rules actually commit them to β€” and what do they leave open?

Seven AI companies β€” OpenAI, Google, Microsoft, Anthropic, Amazon, Meta, and Inflection β€” gathered at the White House to announce a set of voluntary commitments on AI safety. The companies agreed to share information about safety risks with governments and researchers, invest in cybersecurity, and develop technical mechanisms to indicate when content is AI-generated. The announcement was staged with ceremony but was explicitly voluntary, with no enforcement mechanism. Within the week, researchers were already debating whether the commitments were substantive or largely restatements of existing practice.

Responsible Scaling Policies

The most technically specific form of industry self-governance is the Responsible Scaling Policy (RSP) β€” a framework pioneered by Anthropic in September 2023 and subsequently adopted in various forms by other frontier labs. An RSP is a company's public commitment to conduct specific evaluations before training or deploying each new generation of models, and to pause or slow development if evaluations reveal capabilities crossing defined safety thresholds.

Anthropic's RSP defines AI Safety Levels (ASLs) analogous to biosafety levels in laboratory settings. ASL-1 applies to clearly non-dangerous models. ASL-2, the current default, covers models with limited uplift potential β€” models that can discuss dangerous topics but cannot provide meaningful assistance beyond what is freely available. ASL-3 would apply to models that could provide meaningful uplift to someone attempting to create weapons of mass disruption; hitting this threshold would trigger mandatory additional safety measures before deployment. ASL-4 and beyond represent levels where the company has committed to halt development until adequate safeguards exist.

Google DeepMind released its own Frontier Safety Framework in May 2024, using "Critical Capability Levels" (CCLs) rather than ASLs but covering similar ground: biological, chemical, nuclear, radiological, and cybersecurity uplift potential, plus autonomy and self-replication capabilities. OpenAI published its Preparedness Framework in December 2023, covering similar capability categories with four risk levels (low, medium, high, critical) and a commitment that only models at or below "medium" overall risk could be deployed.

The Credibility Problem

Voluntary frameworks face an inherent credibility challenge: the same organizations that write the rules also judge their own compliance. When Anthropic evaluated Claude 3 Opus against ASL-3 thresholds, it concluded the model was below the threshold β€” but the evaluation methodology was internal. External auditors had no independent access. Critics note that RSPs create no mechanism for a company to be penalized if it crosses a threshold and deploys anyway. Proponents argue they still improve internal decision-making and create reputational stakes that function as soft enforcement.

The White House Voluntary Commitments (2023)

The July 2023 White House commitments covered three areas: safety (pre-deployment red-teaming, sharing safety information across companies and with governments), security (protecting model weights from theft), and trust (developing technical content provenance standards, publishing transparency reports). A second round of commitments followed in September 2023 with 8 additional companies signing. The Biden administration presented these commitments as precursors to binding regulation; the Trump administration that followed was more interested in the voluntary commitments as an alternative to regulation.

Evaluating the July 2023 commitments is difficult because there is no independent monitoring body and no defined metrics. The commitment to share safety information, for example, does not specify what information, at what level of detail, on what timeline, or with which specific researchers. This vagueness is both a political feature (it was easier to sign) and a practical limitation.

Frontier Model Forum and Industry Consortia

In July 2023, Anthropic, Google, Microsoft, and OpenAI jointly established the Frontier Model Forum β€” an industry body focused on AI safety research, best practices, and information sharing among frontier labs. By 2024 the Forum had expanded to include Amazon and other members. Its stated activities include funding external safety research, developing evaluation standards, and facilitating government engagement. Critics observe that the Forum's funding and agenda are controlled by its member companies, raising questions about whether it can produce genuinely independent safety standards.

The Partnership on AI, established earlier (2016) by Amazon, Apple, Facebook, Google, IBM, and Microsoft, covers a broader range of AI ethics issues β€” bias, fairness, transparency β€” and includes civil society and academic members alongside industry. It has produced reports and frameworks on topics including AI safety, synthetic media, and publication norms, but has no enforcement authority over members.

What Voluntary Commitments Can and Cannot Do

Voluntary frameworks can raise internal standards, create external expectations, and establish precedents that future regulation can codify. They cannot impose costs on defectors, prevent competitive races to the bottom, or hold companies accountable when they fail to follow their own policies. The alignment community debates whether voluntary commitments speed or slow binding regulation β€” by demonstrating that industry can self-regulate, they might reduce pressure for legislation; by establishing norms, they might make future legislation easier to pass.

Responsible Scaling Policy (RSP)A company's public commitment to evaluate AI models against specific capability thresholds and slow or halt development if those thresholds are crossed. Pioneered by Anthropic in September 2023.
AI Safety Level (ASL)Anthropic's tiered framework for classifying models by risk level (ASL-1 through ASL-4+), analogous to biosafety levels, with increasingly stringent requirements at higher levels.
Frontier Model ForumIndustry consortium founded in 2023 by Anthropic, Google, Microsoft, and OpenAI to coordinate on safety research and practices among frontier AI developers.
Credibility ProblemThe challenge that voluntary safety frameworks are evaluated by the same organizations that write them, with no independent audit or enforcement mechanism.
Lesson 3 Β· Quiz

Industry Self-Governance Check

Three questions on voluntary commitments and RSPs.
1. In Anthropic's Responsible Scaling Policy, what is the defining feature of an "ASL-3" model?
Correct. ASL-3 in Anthropic's framework is triggered when a model could provide meaningful uplift β€” going beyond freely available information β€” to someone trying to create weapons of mass disruption. Reaching ASL-3 requires additional mandatory safeguards before deployment.
Not quite. ASL-3 is defined by the model's potential to provide meaningful uplift for creating weapons of mass disruption β€” going beyond what's freely available in ways that could materially help bad actors.
2. What is the "credibility problem" with industry Responsible Scaling Policies?
Correct. The credibility problem is structural: there is no independent auditor, no external evaluation of compliance, and no mechanism to impose costs if a company crosses a threshold and deploys anyway. The company is both rule-maker and judge.
Not quite. The core credibility problem is that companies are simultaneously the authors, the evaluators, and the enforcers of their own safety commitments β€” with no independent check on their self-assessments.
3. The Frontier Model Forum was founded in July 2023 by which combination of companies?
Correct. The Frontier Model Forum was founded by Anthropic, Google, Microsoft, and OpenAI β€” the four companies regarded as operating at the frontier of large model development at the time. Amazon later joined.
Not quite. The founding four were Anthropic, Google, Microsoft, and OpenAI β€” the leading frontier model developers β€” with Amazon subsequently joining the consortium.
Lesson 3 Β· Lab

RSP Stress Test

Probe the strengths and weaknesses of voluntary safety frameworks.

Your Task

You are a policy analyst evaluating the practical effectiveness of Responsible Scaling Policies and voluntary industry commitments. Your AI safety policy assistant can help you identify what RSPs actually commit companies to, where the gaps are, and how they compare to binding regulatory alternatives.

Push on specific provisions, ask about real-world enforcement cases, or explore how RSPs might be strengthened.

Start here: "If a company's internal evaluation concludes a model is below ASL-3 threshold, but external researchers disagree β€” what mechanisms exist to resolve that dispute, and what should exist?"
AI Safety Policy Assistant
Voluntary Frameworks
Welcome to the RSP stress test lab. I'm your AI safety policy assistant β€” I can help you analyze what voluntary safety commitments actually commit companies to and where the gaps are. What would you like to examine?
Module 6 Β· Lesson 4

Civil Society, Standards Bodies, and the Long Game

How researchers, auditors, technical standards, and public engagement shape AI safety beyond government and industry.
If governments are slow and companies have conflicts of interest, who else is doing the safety work β€” and does it matter?

On March 22, 2023, the Future of Life Institute published an open letter calling for a six-month pause on training AI systems more powerful than GPT-4. Within days, over 1,000 researchers and technologists had signed it β€” including Yoshua Bengio, Stuart Russell, and Elon Musk. Within a week, the number had grown to tens of thousands. The letter produced no pause. OpenAI, Google, Anthropic, and Meta continued their development programs without interruption. But the letter did something the signatories may not have fully anticipated: it made frontier AI risk a mainstream news story and accelerated the political pressure that led to the White House commitments of July 2023 and the Bletchley Summit of November 2023.

Technical Standards Bodies

Among the least visible but most consequential actors in AI governance are technical standards bodies. ISO/IEC JTC 1/SC 42, the joint ISO/IEC subcommittee on AI, has been developing international AI standards since 2017. Its output includes ISO/IEC 42001 (AI management systems), ISO/IEC 23053 (AI framework), and a growing suite of standards on bias, explainability, robustness, and risk management. Standards matter because they define what "conformity assessment" actually means β€” when the EU AI Act requires high-risk AI systems to pass conformity assessments, the technical criteria are drawn from standards developed by bodies like SC 42.

NIST's AI Risk Management Framework (AI RMF), published in January 2023, is a U.S. voluntary standard that has achieved significant adoption. It organizes AI risk management around four functions: Govern (organizational policies), Map (risk identification), Measure (risk assessment), and Manage (risk response). Because NIST standards are widely referenced in federal procurement and are being incorporated into state AI legislation, the AI RMF shapes AI development practices even though it is not legally binding.

IEEE and the Ethics of Autonomous Systems

The IEEE Standards Association's Ethically Aligned Design initiative, and the resulting P7000 series of standards, covers topics including algorithmic bias, data privacy, transparency, and fail-safe design for autonomous systems. The IEEE P7001 standard on transparency in autonomous systems, published in 2021, provides technical specifications for how autonomous systems should communicate their capabilities and limitations to users β€” directly relevant to the deceptive alignment problem in AI safety.

Academic and Civil Society Organizations

The AI safety research ecosystem includes a set of non-profit and academic institutions that operate independently of both government and industry. The Center for AI Safety (CAIS), founded by Dan Hendrycks, focuses on concrete technical safety research and has produced widely-cited work on adversarial robustness, out-of-distribution generalization, and AI risk evaluation. CAIS also organized the May 2023 statement β€” signed by hundreds of leading AI researchers β€” stating simply that "mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks."

The Future of Life Institute (FLI), which organized the March 2023 pause letter, focuses on advocacy and convening rather than primary research. The Alignment Research Center (ARC), founded by Paul Christiano after he left OpenAI, does primary technical research on interpretability and evaluation. ARC's ARC Evals division (now operating as METR) has conducted external capability evaluations for multiple frontier labs, including assessments of GPT-4, Claude, and Gemini β€” one of the few cases of independent third-party AI safety evaluation with real model access.

The AI Now Institute at NYU focuses on the social and political dimensions of AI β€” labor, discrimination, surveillance, accountability β€” rather than existential risk. This division within the AI safety/ethics field between "near-term harms" and "long-term risks" researchers reflects genuine disagreements about where to focus limited attention and resources, and has been a source of tension within the broader community.

The Role of Auditing and Red-Teaming

Independent auditing of AI systems is an emerging field with significant governance implications. Red-teaming β€” adversarial testing to find system failures β€” was standard practice in cybersecurity before AI, and has been adapted for AI safety. The U.S. AI Safety Institute organized the largest public AI red-team exercise to date at DEF CON 2023, where approximately 2,200 participants tested eight major AI systems for safety and security failures over three days. The exercise produced findings that were shared with the developers and the government.

Third-party AI auditing firms β€” including entities like KPMG's AI Assurance, Fairly AI, and academic-affiliated groups β€” provide conformity assessments for enterprise AI systems. The EU AI Act will create significant demand for this industry, as high-risk AI systems require third-party audits before deployment. The challenge is that auditing an AI system is fundamentally different from auditing financial statements: there are no universally accepted technical criteria, and system behavior can change after deployment.

The Long Game

Governance infrastructure takes decades to build. The nuclear non-proliferation regime, built through the NPT (1970), the IAEA, and successive arms control treaties, was still maturing fifty years after its founding. The AI governance landscape of 2024–2025 β€” with its mix of national legislation, voluntary commitments, nascent international forums, and emerging technical standards β€” looks more like 1946 nuclear governance than a mature regime. Whether the field can build robust institutions before capabilities create irreversible risks is the fundamental governance challenge of the coming decade.

ISO/IEC SC 42The international joint committee developing technical AI standards, including ISO/IEC 42001 on AI management systems. Its standards define the technical criteria used in regulatory conformity assessments.
NIST AI RMFThe U.S. National Institute of Standards and Technology's AI Risk Management Framework (January 2023) β€” a voluntary but widely-adopted standard organizing AI risk management around Govern, Map, Measure, and Manage functions.
METR (formerly ARC Evals)An independent organization that conducts third-party capability evaluations of frontier AI models, providing one of the few external assessments with direct model access.
Red-TeamingAdversarial testing in which trained evaluators systematically attempt to elicit unsafe, harmful, or deceptive behavior from an AI system β€” a key method for identifying safety failures before deployment.
Organization
Center for AI Safety
Technical safety research. Published the 2023 extinction-risk statement signed by hundreds of leading researchers including AI pioneers Geoffrey Hinton and Yoshua Bengio.
Organization
METR (Machine Evaluation & Testing for R&D)
Independent evaluation body conducting third-party capability assessments of frontier models including GPT-4, Claude, and Gemini. Formerly ARC Evals.
Organization
AI Now Institute
NYU-based research institute focused on near-term AI harms: labor displacement, algorithmic discrimination, surveillance, and corporate accountability.
Standards Body
ISO/IEC JTC 1/SC 42
Joint international committee developing AI standards since 2017. Its outputs define technical requirements referenced in AI legislation worldwide.
Lesson 4 Β· Quiz

Civil Society & Standards Check

Three questions on the broader governance ecosystem.
1. The NIST AI Risk Management Framework organizes AI risk management around four functions. Which of the following is NOT one of them?
Correct. The four functions of the NIST AI RMF are Govern, Map, Measure, and Manage. "Prohibit" is not one of them β€” the framework is voluntary and focuses on risk identification and management rather than prohibition.
Not quite. The four NIST AI RMF functions are Govern, Map, Measure, and Manage. "Prohibit" is not part of the framework β€” it is voluntary and structured around risk management, not mandates.
2. The March 2023 open letter calling for a six-month AI training pause was published by which organization?
Correct. The Future of Life Institute (FLI) published the open letter on March 22, 2023. Despite gathering tens of thousands of signatures, the letter produced no actual pause β€” but it significantly amplified public and political attention to frontier AI risk.
Not quite. The Future of Life Institute (FLI) organized the March 2023 pause letter. While it didn't achieve a pause, it became one of the most consequential public advocacy documents in AI governance history.
3. METR (formerly ARC Evals) is significant in AI governance because it:
Correct. METR is one of the very few organizations conducting genuinely independent third-party evaluations of frontier AI models β€” with actual model access, not just post-deployment testing. This makes it a critical node in the safety evaluation ecosystem.
Not quite. METR's significance is that it performs independent, third-party capability evaluations of frontier models β€” including GPT-4, Claude, and Gemini β€” with direct model access, filling a critical gap in the landscape of voluntary company self-assessment.
Lesson 4 Β· Lab

Governance Ecosystem Mapping

Explore how different actors in AI governance interact and reinforce β€” or undermine β€” each other.

Your Task

You are a researcher preparing a briefing on the AI governance ecosystem for a new foundation deciding where to direct its funding. Your AI governance research assistant can help you map the landscape, identify gaps, and think through where additional resources would have the most leverage.

Ask about specific organizations, funding gaps, tensions between different parts of the ecosystem, or how governance infrastructure might be strengthened.

Start here: "Where are the biggest gaps in the current AI governance ecosystem β€” areas where no organization is doing the necessary work, or where existing efforts are clearly insufficient?"
AI Governance Research Assistant
Ecosystem Analysis
Welcome to the governance ecosystem lab. I'm your AI governance research assistant β€” I can help you map the landscape of who is doing what in AI governance and where the gaps are. What aspect would you like to explore?
Module 6 Β· Final Assessment

Governance and Global Safety Efforts

15 questions β€” score 80% or higher to pass the module.
1. The EU AI Act's "unacceptable risk" category results in what outcome for affected AI systems?
Correct. Unacceptable-risk systems under the EU AI Act are banned entirely β€” they cannot be placed on the market or put into service in the EU.
Not quite. Unacceptable-risk systems are not subject to compliance requirements β€” they are simply prohibited. High-risk systems face mandatory audits; unacceptable-risk systems face prohibition.
2. What compute threshold does the EU AI Act use to define "systemic risk" GPAIs?
Correct. The EU AI Act set the systemic-risk GPAI threshold at 10²⁡ FLOPs of training compute β€” roughly the scale at which GPT-4 was trained. Models above this threshold face additional safety obligations.
Not quite. The threshold is 10²⁡ FLOPs β€” chosen to capture models at or above GPT-4 scale, triggering requirements for adversarial testing and incident reporting.
3. The UK's post-Brexit AI governance approach differs from the EU AI Act primarily in that:
Correct. The UK's 2023 AI Safety White Paper explicitly rejected creating a new AI-specific law, preferring instead to let existing domain regulators apply their expertise to AI in their sectors β€” a deliberate contrast with the EU's comprehensive statute approach.
Not quite. The UK took the opposite approach to the EU: it deliberately chose not to create an AI-specific statute, instead distributing governance responsibility to existing sector regulators (financial, medicines, competition) with central coordination.
4. China's Generative AI Regulations (effective August 2023) require AI-generated content to reflect:
Correct. China's Generative AI Regulations explicitly require that AI-generated content reflect "core socialist values" β€” a politically-specific content requirement with no direct equivalent in Western AI regulations.
Not quite. China's regulations explicitly require content to reflect "core socialist values" β€” a content mandate that distinguishes the Chinese approach from European and American frameworks focused primarily on safety and accuracy rather than political alignment.
5. The first AI Safety Summit at Bletchley Park (November 2023) produced which document?
Correct. The Bletchley Declaration β€” named for the summit venue β€” was signed by 28 governments including the U.S. and China, acknowledging frontier AI risks and committing to international cooperation.
Not quite. The Bletchley Declaration was the output of the first summit. The Seoul Statement came from the second summit (May 2024), and the Paris communiquΓ© from the third (February 2025).
6. The Seoul AI Safety Summit (May 2024) was notable primarily for:
Correct. The Seoul summit secured the first formal, public industry safety commitments β€” 16 companies agreeing to publish their safety frameworks and conduct pre-deployment evaluations β€” alongside the announcement of an AI Safety Institutes cooperation network.
Not quite. The Seoul summit's headline achievement was getting 16 major AI companies to formally commit to publishing safety frameworks and conducting pre-deployment evaluations β€” the first explicit public industry commitments of this kind.
7. Why is the "verification problem" a major obstacle to international AI governance agreements?
Correct. Unlike nuclear arms treaties, which benefit from physical inspection regimes, there is no equivalent mechanism for AI β€” making it technically difficult to know whether a country or company has violated agreed-upon limits on AI development.
Not quite. The verification problem is structural: the absence of any inspection-equivalent mechanism. Physical weapons inspectors can count warheads; there is no analogous way to verify training compute, model capabilities, or deployment scope.
8. Anthropic's ASL-3 threshold in its Responsible Scaling Policy is triggered by:
Correct. ASL-3 is defined by the capability to provide meaningful uplift β€” assistance beyond freely available information β€” to someone trying to create weapons of mass disruption. This is a capability-based threshold, not a scale or refusal-rate threshold.
Not quite. ASL-3 is a capability-based threshold: it applies when a model could meaningfully help someone cause catastrophic harm via weapons of mass disruption β€” specifically, assistance beyond what's already freely available.
9. The G7 Hiroshima AI Process produced which document in October 2023?
Correct. The G7 Hiroshima AI Process concluded in October 2023 with the International Code of Conduct for Advanced AI Systems β€” an 11-principle voluntary framework for frontier AI developers across G7 countries.
Not quite. The G7 Hiroshima Process produced the International Code of Conduct for Advanced AI Systems β€” a voluntary, 11-principle framework released in October 2023, notable as the first multilateral developer code of conduct.
10. What makes the credibility problem in voluntary safety frameworks fundamentally structural?
Correct. The structural problem is that the same organization writes the rules, conducts the evaluations, interprets the results, and decides whether to proceed β€” with no external check. This creates incentive problems regardless of the organization's good intentions.
Not quite. The structural issue is the absence of independence: when a company is simultaneously rulemaker, evaluator, and enforcer, there is no external check β€” even well-intentioned organizations face incentives that can distort self-assessment.
11. NIST's AI Risk Management Framework (AI RMF) is best described as:
Correct. The NIST AI RMF is voluntary, but its adoption in federal procurement requirements and state AI legislation gives it significant practical influence even without binding legal force.
Not quite. The AI RMF is a voluntary framework β€” but "voluntary" doesn't mean "without influence." Its incorporation into federal procurement standards and state legislation means it shapes AI development practices widely in practice.
12. ISO/IEC JTC 1/SC 42 is significant in AI governance because:
Correct. SC 42 develops the technical standards that give substance to regulatory requirements. When the EU AI Act mandates conformity assessments for high-risk systems, the specific technical criteria auditors use are drawn from standards like ISO/IEC 42001.
Not quite. SC 42 matters because its standards are the technical substrate of regulation. Laws mandate "conformity assessments" β€” but what those assessments actually measure is defined by SC 42's standards.
13. The DEF CON 2023 AI red-team exercise organized by the U.S. AI Safety Institute involved approximately how many participants testing AI systems?
Correct. Approximately 2,200 participants took part in the DEF CON 2023 red-team exercise over three days, testing eight major AI systems β€” making it the largest public AI red-team exercise conducted to that point.
Not quite. Approximately 2,200 participants joined the DEF CON 2023 red-team exercise, testing eight AI systems over three days in the largest public AI safety evaluation exercise to date.
14. What distinguishes METR (formerly ARC Evals) from most other AI safety organizations?
Correct. METR is one of the very few organizations that conducts independent evaluations with actual model access β€” rather than relying on post-deployment testing or company self-reports. This positions it uniquely in the safety evaluation ecosystem.
Not quite. METR's defining feature is independent evaluation with direct model access. This is rare: most safety evaluations are either internal (done by the company) or post-deployment (done without model access). METR bridges that gap.
15. The analogy between current AI governance and "1946 nuclear governance" is used to suggest:
Correct. The analogy points to a difficult truth: robust governance regimes take decades. The nuclear regime took 25+ years to mature after 1945. AI governance faces a similar maturation challenge but on a potentially much faster capability timeline.
Not quite. The 1946 analogy is about institutional maturity and time. The nuclear non-proliferation regime took decades to build after the first bombs. AI governance is in an early, fragile state β€” and the question is whether it can mature before capabilities create irreversible risks.