In a single week of October 2023, two landmark documents landed within days of each other. The European Parliament's negotiators reached a provisional agreement on the EU AI Act, the world's first comprehensive AI law. Three days later, U.S. President Biden signed Executive Order 14110 on Safe, Secure, and Trustworthy AI β the most detailed federal directive on AI governance in U.S. history. Neither document mentioned the other. Both were urgent. Neither was sufficient alone.
Governments have approached AI governance through three broad models: hard law (binding statutes with penalties), soft law (voluntary frameworks, guidelines, standards bodies), and sectoral regulation (applying existing rules β medical device law, financial regulation β to AI systems in specific domains). Most jurisdictions use some combination of all three.
The EU AI Act, which entered into force in August 2024, is the most ambitious hard-law effort. It classifies AI systems by risk level. Unacceptable-risk systems β such as real-time biometric surveillance in public spaces and social-scoring systems β are banned outright. High-risk systems, including AI used in credit scoring, hiring, medical diagnosis, and critical infrastructure, face mandatory conformity assessments, human oversight requirements, and transparency obligations before deployment. Lower-risk systems carry disclosure requirements but are otherwise lightly regulated. General-purpose AI models (GPAIs) with systemic risk β defined by compute thresholds β face additional obligations including red-team evaluations and incident reporting.
The U.S. approach under Biden's Executive Order 14110 worked differently. Rather than legislation, it used presidential authority to direct federal agencies. The Order required developers of frontier AI models trained above a specific compute threshold to share safety test results with the government before public deployment. It tasked the National Institute of Standards and Technology (NIST) with developing evaluation tools and tasked the Department of Commerce with establishing reporting requirements. When the Trump administration rescinded EO 14110 in January 2025, it directed agencies to draft a replacement strategy oriented around "AI dominance" rather than safety-first framing β illustrating how dramatically domestic AI policy can shift across administrations.
The EU AI Act's compute threshold for "systemic risk" GPAI designation was set at 10Β²β΅ floating-point operations (FLOPs) of training compute β roughly the scale of GPT-4. Models above this threshold face adversarial testing, cybersecurity requirements, and mandatory incident reporting to the European AI Office, a new regulatory body established within the European Commission.
China has taken a technology-specific approach, issuing separate regulations for different AI capabilities rather than one omnibus law. The Cyberspace Administration of China (CAC) issued rules on algorithmic recommendation systems in 2022, deep synthesis (deepfakes) in 2022, and generative AI services in August 2023. The generative AI rules require content to reflect "core socialist values," mandate labeling of AI-generated material, and hold service providers liable for user-generated content that violates rules. A key feature: the regulations apply to services "provided to the public within the territory of China," covering foreign providers serving Chinese users.
The UK's post-Brexit approach explicitly rejected a new AI-specific statute. The 2023 AI Safety White Paper proposed that existing sector regulators β financial conduct, medicines, competition β apply their domain expertise to AI within their remit, coordinated by a new central function. The UK also hosted the first AI Safety Summit at Bletchley Park in November 2023, convening 28 governments and major AI companies to sign the Bletchley Declaration β acknowledging frontier AI risks and committing to international cooperation on evaluation.
Regulation shapes what alignment work gets done. When the EU AI Act mandates that high-risk systems include human oversight mechanisms, it creates legal demand for technical solutions to the oversight problem. When the U.S. requires safety testing before deployment, it creates pressure on labs to develop evaluation methods. Policy and technical research are not separate tracks β they set each other's agenda.
You are advising a mid-sized country drafting its first AI law. Your AI policy advisor can help you think through the trade-offs between different regulatory models β risk-based classification, sectoral regulation, compute thresholds, and voluntary frameworks.
Ask about specific provisions in the EU AI Act, compare approaches across jurisdictions, or explore the practical enforcement challenges any of these frameworks face.
The venue was chosen deliberately. Bletchley Park, where Alan Turing's team broke the Enigma cipher during World War II, now hosted representatives of 28 governments β including the United States, China, and the European Union β alongside executives from OpenAI, Google DeepMind, Anthropic, and Meta. The Bletchley Declaration they signed was modest in commitments but historic in composition: it was the first time China and the U.S. had co-signed a document acknowledging that frontier AI posed potentially catastrophic risks and required international cooperation.
The Bletchley Summit launched a series of intergovernmental AI safety meetings. The second AI Safety Summit was held in Seoul in May 2024, producing the Seoul Statement of Intent β a commitment by 16 AI companies to publish their safety frameworks and conduct pre-deployment evaluations for their most capable models. This was the first time major private-sector AI developers made explicit, public commitments on safety evaluation methodology. The Seoul Summit also announced a network of government AI Safety Institutes that would cooperate on technical evaluations.
The third summit, the Paris AI Action Summit of February 2025, shifted the emphasis. The headline outcome was a broad communiquΓ© on "AI for humanity" that emphasized economic opportunity alongside safety, signed by over 60 countries. Notably, the United States and United Kingdom did not sign the communiquΓ© β a signal of changing priorities in both governments. The Paris Summit also saw the first formal exercises between the U.S. AI Safety Institute (housed at NIST) and its international counterparts.
By mid-2025, the United Kingdom, United States, Japan, Canada, France, Germany, South Korea, Singapore, and Australia had all established national AI Safety Institutes or equivalent bodies. The UK AI Safety Institute (AISI, later renamed the AI Security Institute) was first, established in October 2023 with a mandate to conduct pre-deployment evaluations of frontier models. These institutes have begun sharing evaluation methodologies and red-team findings, forming a de facto international technical network even where formal treaties don't exist.
Parallel to the summit process, the G7 countries launched the Hiroshima AI Process in 2023, culminating in the International Code of Conduct for Advanced AI Systems released in October 2023 β the same week as EO 14110 and the EU AI Act political agreement. The Code of Conduct listed 11 guiding principles for frontier AI developers, covering transparency, bias testing, security vulnerabilities, and post-deployment monitoring. It was voluntary β a soft-law instrument β but it represented the first multilateral agreement on developer conduct across G7 jurisdictions.
The OECD AI Policy Observatory has tracked AI governance measures across its member countries since 2019. Its data shows that over 70 countries had adopted or were developing national AI strategies by 2024, but fewer than a dozen had enacted binding AI-specific legislation. The gap between strategy documents and enforceable rules remains wide.
Several structural factors make binding international AI agreements difficult. First, definitional disagreement: countries disagree on what counts as a "frontier" or "high-risk" AI system, making it hard to agree on what should be regulated. Second, competitive dynamics: the U.S., China, and EU each see AI leadership as a strategic priority, creating incentives to avoid rules that might disadvantage their developers. Third, verification problems: unlike nuclear weapons treaties, there is no clear equivalent of an inspection regime for AI models β it is technically difficult to verify whether a model has been trained or deployed in violation of agreed-upon limits. Fourth, speed mismatch: AI capabilities advance faster than treaty negotiation timelines typically allow.
The mechanisms that have made progress tend to be softer: voluntary commitments, shared evaluation frameworks, information sharing among safety institutes, and bilateral technical dialogues. The U.S.-China AI talks held in Geneva in May 2024 β the first formal intergovernmental AI safety dialogue between the two countries β focused on establishing communication channels rather than making specific commitments, but were significant precisely because they existed at all.
One recurring pattern in international AI governance: the same countries that cooperate on safety at summits compete aggressively on capabilities in their domestic AI investment strategies. Whether this cooperation-competition duality is sustainable β or whether one dynamic will eventually overwhelm the other β is one of the central questions for the field.
You are helping design the agenda for an upcoming international AI safety summit. Your AI diplomacy advisor can help you think through what kinds of agreements are achievable, what the verification challenges are, and how to structure productive dialogue between countries with competing interests.
Explore specific proposals, ask about what has and hasn't worked in past summits, or push on the hardest coordination problems.
Seven AI companies β OpenAI, Google, Microsoft, Anthropic, Amazon, Meta, and Inflection β gathered at the White House to announce a set of voluntary commitments on AI safety. The companies agreed to share information about safety risks with governments and researchers, invest in cybersecurity, and develop technical mechanisms to indicate when content is AI-generated. The announcement was staged with ceremony but was explicitly voluntary, with no enforcement mechanism. Within the week, researchers were already debating whether the commitments were substantive or largely restatements of existing practice.
The most technically specific form of industry self-governance is the Responsible Scaling Policy (RSP) β a framework pioneered by Anthropic in September 2023 and subsequently adopted in various forms by other frontier labs. An RSP is a company's public commitment to conduct specific evaluations before training or deploying each new generation of models, and to pause or slow development if evaluations reveal capabilities crossing defined safety thresholds.
Anthropic's RSP defines AI Safety Levels (ASLs) analogous to biosafety levels in laboratory settings. ASL-1 applies to clearly non-dangerous models. ASL-2, the current default, covers models with limited uplift potential β models that can discuss dangerous topics but cannot provide meaningful assistance beyond what is freely available. ASL-3 would apply to models that could provide meaningful uplift to someone attempting to create weapons of mass disruption; hitting this threshold would trigger mandatory additional safety measures before deployment. ASL-4 and beyond represent levels where the company has committed to halt development until adequate safeguards exist.
Google DeepMind released its own Frontier Safety Framework in May 2024, using "Critical Capability Levels" (CCLs) rather than ASLs but covering similar ground: biological, chemical, nuclear, radiological, and cybersecurity uplift potential, plus autonomy and self-replication capabilities. OpenAI published its Preparedness Framework in December 2023, covering similar capability categories with four risk levels (low, medium, high, critical) and a commitment that only models at or below "medium" overall risk could be deployed.
Voluntary frameworks face an inherent credibility challenge: the same organizations that write the rules also judge their own compliance. When Anthropic evaluated Claude 3 Opus against ASL-3 thresholds, it concluded the model was below the threshold β but the evaluation methodology was internal. External auditors had no independent access. Critics note that RSPs create no mechanism for a company to be penalized if it crosses a threshold and deploys anyway. Proponents argue they still improve internal decision-making and create reputational stakes that function as soft enforcement.
The July 2023 White House commitments covered three areas: safety (pre-deployment red-teaming, sharing safety information across companies and with governments), security (protecting model weights from theft), and trust (developing technical content provenance standards, publishing transparency reports). A second round of commitments followed in September 2023 with 8 additional companies signing. The Biden administration presented these commitments as precursors to binding regulation; the Trump administration that followed was more interested in the voluntary commitments as an alternative to regulation.
Evaluating the July 2023 commitments is difficult because there is no independent monitoring body and no defined metrics. The commitment to share safety information, for example, does not specify what information, at what level of detail, on what timeline, or with which specific researchers. This vagueness is both a political feature (it was easier to sign) and a practical limitation.
In July 2023, Anthropic, Google, Microsoft, and OpenAI jointly established the Frontier Model Forum β an industry body focused on AI safety research, best practices, and information sharing among frontier labs. By 2024 the Forum had expanded to include Amazon and other members. Its stated activities include funding external safety research, developing evaluation standards, and facilitating government engagement. Critics observe that the Forum's funding and agenda are controlled by its member companies, raising questions about whether it can produce genuinely independent safety standards.
The Partnership on AI, established earlier (2016) by Amazon, Apple, Facebook, Google, IBM, and Microsoft, covers a broader range of AI ethics issues β bias, fairness, transparency β and includes civil society and academic members alongside industry. It has produced reports and frameworks on topics including AI safety, synthetic media, and publication norms, but has no enforcement authority over members.
Voluntary frameworks can raise internal standards, create external expectations, and establish precedents that future regulation can codify. They cannot impose costs on defectors, prevent competitive races to the bottom, or hold companies accountable when they fail to follow their own policies. The alignment community debates whether voluntary commitments speed or slow binding regulation β by demonstrating that industry can self-regulate, they might reduce pressure for legislation; by establishing norms, they might make future legislation easier to pass.
You are a policy analyst evaluating the practical effectiveness of Responsible Scaling Policies and voluntary industry commitments. Your AI safety policy assistant can help you identify what RSPs actually commit companies to, where the gaps are, and how they compare to binding regulatory alternatives.
Push on specific provisions, ask about real-world enforcement cases, or explore how RSPs might be strengthened.
On March 22, 2023, the Future of Life Institute published an open letter calling for a six-month pause on training AI systems more powerful than GPT-4. Within days, over 1,000 researchers and technologists had signed it β including Yoshua Bengio, Stuart Russell, and Elon Musk. Within a week, the number had grown to tens of thousands. The letter produced no pause. OpenAI, Google, Anthropic, and Meta continued their development programs without interruption. But the letter did something the signatories may not have fully anticipated: it made frontier AI risk a mainstream news story and accelerated the political pressure that led to the White House commitments of July 2023 and the Bletchley Summit of November 2023.
Among the least visible but most consequential actors in AI governance are technical standards bodies. ISO/IEC JTC 1/SC 42, the joint ISO/IEC subcommittee on AI, has been developing international AI standards since 2017. Its output includes ISO/IEC 42001 (AI management systems), ISO/IEC 23053 (AI framework), and a growing suite of standards on bias, explainability, robustness, and risk management. Standards matter because they define what "conformity assessment" actually means β when the EU AI Act requires high-risk AI systems to pass conformity assessments, the technical criteria are drawn from standards developed by bodies like SC 42.
NIST's AI Risk Management Framework (AI RMF), published in January 2023, is a U.S. voluntary standard that has achieved significant adoption. It organizes AI risk management around four functions: Govern (organizational policies), Map (risk identification), Measure (risk assessment), and Manage (risk response). Because NIST standards are widely referenced in federal procurement and are being incorporated into state AI legislation, the AI RMF shapes AI development practices even though it is not legally binding.
The IEEE Standards Association's Ethically Aligned Design initiative, and the resulting P7000 series of standards, covers topics including algorithmic bias, data privacy, transparency, and fail-safe design for autonomous systems. The IEEE P7001 standard on transparency in autonomous systems, published in 2021, provides technical specifications for how autonomous systems should communicate their capabilities and limitations to users β directly relevant to the deceptive alignment problem in AI safety.
The AI safety research ecosystem includes a set of non-profit and academic institutions that operate independently of both government and industry. The Center for AI Safety (CAIS), founded by Dan Hendrycks, focuses on concrete technical safety research and has produced widely-cited work on adversarial robustness, out-of-distribution generalization, and AI risk evaluation. CAIS also organized the May 2023 statement β signed by hundreds of leading AI researchers β stating simply that "mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks."
The Future of Life Institute (FLI), which organized the March 2023 pause letter, focuses on advocacy and convening rather than primary research. The Alignment Research Center (ARC), founded by Paul Christiano after he left OpenAI, does primary technical research on interpretability and evaluation. ARC's ARC Evals division (now operating as METR) has conducted external capability evaluations for multiple frontier labs, including assessments of GPT-4, Claude, and Gemini β one of the few cases of independent third-party AI safety evaluation with real model access.
The AI Now Institute at NYU focuses on the social and political dimensions of AI β labor, discrimination, surveillance, accountability β rather than existential risk. This division within the AI safety/ethics field between "near-term harms" and "long-term risks" researchers reflects genuine disagreements about where to focus limited attention and resources, and has been a source of tension within the broader community.
Independent auditing of AI systems is an emerging field with significant governance implications. Red-teaming β adversarial testing to find system failures β was standard practice in cybersecurity before AI, and has been adapted for AI safety. The U.S. AI Safety Institute organized the largest public AI red-team exercise to date at DEF CON 2023, where approximately 2,200 participants tested eight major AI systems for safety and security failures over three days. The exercise produced findings that were shared with the developers and the government.
Third-party AI auditing firms β including entities like KPMG's AI Assurance, Fairly AI, and academic-affiliated groups β provide conformity assessments for enterprise AI systems. The EU AI Act will create significant demand for this industry, as high-risk AI systems require third-party audits before deployment. The challenge is that auditing an AI system is fundamentally different from auditing financial statements: there are no universally accepted technical criteria, and system behavior can change after deployment.
Governance infrastructure takes decades to build. The nuclear non-proliferation regime, built through the NPT (1970), the IAEA, and successive arms control treaties, was still maturing fifty years after its founding. The AI governance landscape of 2024β2025 β with its mix of national legislation, voluntary commitments, nascent international forums, and emerging technical standards β looks more like 1946 nuclear governance than a mature regime. Whether the field can build robust institutions before capabilities create irreversible risks is the fundamental governance challenge of the coming decade.
You are a researcher preparing a briefing on the AI governance ecosystem for a new foundation deciding where to direct its funding. Your AI governance research assistant can help you map the landscape, identify gaps, and think through where additional resources would have the most leverage.
Ask about specific organizations, funding gaps, tensions between different parts of the ecosystem, or how governance infrastructure might be strengthened.