On 13 March 2024, the European Parliament voted 523 to 46 to pass the EU AI Act — the world's first comprehensive legal framework for artificial intelligence. Negotiators had worked through the night in December 2023 after foundation models nearly derailed the entire bill. The compromise reached that night would redefine what "high-risk" AI meant, and who was responsible when it caused harm.
The EU AI Act represents years of legislative work that began in April 2021. Its passage illustrates both the ambition and the difficulty of AI governance: the original draft didn't even mention large language models, which barely existed at the time. The final text required last-minute rewrites to address GPT-4 and systems like it.
Simultaneously, the United States took a fundamentally different path. President Biden's Executive Order on Safe, Secure, and Trustworthy AI (October 2023) used existing executive authority to require safety evaluations for frontier models, directed agencies to develop sector-specific guidance, and invoked the Defense Production Act to require AI developers to share safety test results with the government. It was broad in ambition but limited in binding force.
The UK pursued a third model. Rather than legislating a new AI law, the government argued that existing sector regulators — the FCA for finance, the CQC for healthcare, the ICO for data — should apply their existing authority to AI. The AI Safety Summit held at Bletchley Park in November 2023 convened 28 countries plus the EU to sign the first international declaration on frontier AI risk, but produced no binding commitments.
The Act classifies AI systems into four risk tiers. "Unacceptable risk" (banned): social scoring by governments, real-time biometric surveillance in public spaces. "High risk": AI in medical devices, CV screening, credit scoring. "Limited risk": chatbots must disclose they're AI. "Minimal risk": spam filters, AI in video games. High-risk systems face mandatory conformity assessments, human oversight requirements, and registration in an EU database before deployment.
China has moved fastest in issuing specific AI regulations. The Cyberspace Administration of China published binding rules for recommendation algorithms (effective March 2022), deep synthesis — deepfakes and synthetic media — (effective January 2023), and generative AI services (effective August 2023). Each required registration, content moderation, and watermarking of AI-generated content.
The generative AI rules are particularly notable: any public-facing generative AI service must undergo a security assessment, ensure outputs "reflect core socialist values," and prevent generation of content that "subverts state power." Critics note these rules constrain domestic competition as much as they address safety.
The November 2023 Bletchley Park summit produced the first international consensus statement on "frontier AI safety." Signatories — including the US, UK, EU, China, India, and 23 others — agreed that frontier models pose "potential for serious, even catastrophic, harm" and committed to information sharing. Notably, China signed. But the declaration created no enforcement mechanism, no shared definitions, and no binding timelines.
Regulation without standards is difficult to enforce. The EU AI Act references technical standards from CEN-CENELEC and ETSI, but those standards don't yet fully exist. NIST in the US published its AI Risk Management Framework in January 2023, offering voluntary guidance that has become a de facto reference point for US companies. ISO/IEC is developing international AI standards (42001 series), but international standards bodies move on multi-year timescales while models improve quarterly.
This gap between regulatory ambition and technical measurement capability is one of the defining problems of AI governance. You cannot enforce requirements for "transparency" or "robustness" until there is agreement on how to measure them.
You're advising a company preparing to deploy an AI hiring tool across the EU, US, and China. Each jurisdiction has different rules. Use this lab to map the requirements and identify conflicts.
Before Claude 2 was released in July 2023, Anthropic ran extensive internal red-teaming exercises — teams of researchers deliberately trying to elicit harmful outputs, test the model's capabilities in dangerous domains, and find gaps between intended and actual behavior. This practice, borrowed from cybersecurity, had become standard at frontier AI labs. The question was whether it was sufficient — and who got to decide.
Red-teaming in AI borrowed from military and cybersecurity traditions: assemble an adversarial team, give them the goal of breaking a system, and use what they find to harden it before deployment. For AI models, this means testing whether a system can be prompted to assist with weapons synthesis, generate child sexual abuse material, provide detailed cyberattack instructions, or exhibit deceptive behavior.
OpenAI published details of its GPT-4 red-teaming process in the model's technical report (March 2023). External red-teamers were given early access under NDA to probe the model for months before release. They found, among other things, that earlier versions could provide detailed step-by-step instructions for synthesizing chemical weapons and assist with planning attacks — capabilities that were then mitigated through fine-tuning and system prompts before launch.
The METR (Model Evaluation and Threat Research) organization, formerly ARC Evals, has developed evaluations specifically for "dangerous capability" assessment: Can the model acquire resources autonomously? Can it assist with creating CBRN weapons? Can it subvert oversight mechanisms? These evaluations were used in pre-deployment assessments for GPT-4 and Claude 2.
ARC Evals (now METR) ran evaluations on GPT-4 before its March 2023 release, specifically testing "power-seeking" behaviors: whether the model could autonomously replicate itself, acquire computational resources, and resist shutdown. The team found GPT-4 in its evaluated form did not exhibit these behaviors — but noted the evaluation was not exhaustive and that more capable future models would need more sophisticated testing. OpenAI published a summary in the GPT-4 system card.
In July 2023, Anthropic, Google, Microsoft, and OpenAI jointly founded the Frontier Model Forum — an industry body to coordinate safety research and develop evaluation standards. Initial commitments included a $10 million AI Safety Fund for independent research and sharing safety information with governments and each other in pre-competitive ways.
Critics noted the obvious tension: the companies most commercially invested in deploying powerful AI were also the primary evaluators of whether it was safe. The Forum acknowledged this but argued it was more practical than waiting for government capacity to develop. Meta joined later in 2023.
Published in January 2023, the NIST AI RMF provides voluntary guidance organized around four functions: Govern (organizational policies), Map (identify context and risks), Measure (analyze and assess), and Manage (prioritize and respond). It deliberately avoided prescribing specific technical tests, arguing that AI evolves too fast for fixed requirements.
The framework became a reference standard for US federal agencies and many private-sector organizations. The Biden EO directed NIST to develop additional guidelines specifically for generative AI and red-teaming, published as NIST AI 100-1 in March 2023.
In September 2023, Anthropic published the first public "Responsible Scaling Policy" — a binding commitment to conduct capability evaluations before each major model release and to halt deployment if models exceed defined "AI Safety Level" thresholds for dangerous capabilities (CBRN assistance, autonomous replication). The policy defined ASL-2 (current models) and ASL-3 thresholds, with deployment pauses required at each level. Other labs were publicly asked to adopt similar policies; OpenAI published a "Preparedness Framework" in November 2023, and Google DeepMind published a "Frontier Safety Framework" in May 2024.
The honest assessment from researchers at these organizations: current evaluations are far from comprehensive. Red-teaming finds what red-teamers think to look for. Capability evaluations measure current models, not what they might become with fine-tuning or additional context. There is no agreed standard for what "passing" a safety evaluation means.
The UK's AI Safety Institute (AISI), established after the Bletchley Summit in November 2023, was created specifically to develop government capacity for independent evaluations — not relying on company self-reporting. By early 2024 it had evaluated several frontier models and published preliminary findings. The US AI Safety Institute was established within NIST in November 2023 with a similar mandate.
You're part of a safety team at an AI lab preparing to release a new frontier model. Your job is to design a red-teaming and evaluation protocol. What would you test for, and how?
The AI Seoul Summit in May 2024 marked the second global gathering on frontier AI safety. Twenty-seven nations, plus the EU, signed the Seoul Ministerial Statement committing to establishing AI Safety Institutes and coordinating on evaluations. For the first time, major AI companies — including Anthropic, Google DeepMind, Meta, Microsoft, OpenAI, Samsung, and others — publicly committed to not deploying models if evaluations found they exceeded safety thresholds. But China was absent from the company commitments, and the US-China dynamic continued to shape every conversation that wasn't in the room.
The fundamental tension in international AI governance is that the two leading AI powers — the United States and China — are also engaged in deep strategic competition. The US imposed sweeping export controls on advanced AI chips (A100, H100, and successors) to China in October 2022, tightened in October 2023. China's stated goal is AI self-sufficiency by 2030, including domestic chip production through companies like SMIC.
This competition makes certain kinds of cooperation difficult. Intelligence-sharing about dangerous AI capabilities requires trust that doesn't exist. Agreements not to develop certain AI applications militarily are hard to verify. Yet both sides have spoken about avoiding AI-triggered accidents — the US and China held the first direct AI safety talks between governments in May 2024, described by officials as "frank."
The dynamic has a historical parallel: US-Soviet arms control negotiations began even at the height of Cold War competition. The 1972 SALT I treaty, the 1987 INF Treaty — these showed that adversaries can agree on specific limitations even when broadly antagonistic. Whether AI safety admits of the same kind of bilateral treaty structure is an open and important question.
In October 2022, the Commerce Department's Bureau of Industry and Security issued new export controls restricting sale of Nvidia A100 and H100 chips, and equivalent products from AMD, to China without a license. The rules used a "performance threshold" definition (chips above ~4,800 TOPS with certain interconnect bandwidth). Nvidia attempted to sell a degraded version (A800) that technically fell below the threshold; October 2023 controls closed that loophole. The controls directly constrain China's ability to train frontier-scale AI models, creating a hardware-based governance lever.
Beyond the US-China split, international AI governance reflects genuinely different philosophical approaches. The EU's approach is fundamentally rights-based: AI regulation is an extension of fundamental rights protection (dignity, privacy, non-discrimination), grounded in the Charter of Fundamental Rights. Harms to individuals are the primary concern.
The UK's approach emphasizes innovation-compatible governance: sector regulators apply existing law, avoiding new legislation that might stifle the industry. The goal is light-touch rules that don't disadvantage British AI companies relative to American competitors.
The US federal approach (post-EO 2023) is executive-led and sector-specific: individual agencies (FDA for medical AI, FTC for consumer AI, CFPB for financial AI) develop their own rules, with NIST providing cross-cutting frameworks. Congress has not passed comprehensive AI legislation as of mid-2024.
China's approach is authoritarian-technocratic: AI must serve state objectives, content must align with official ideology, and commercial AI deployment is conditional on security registration. This is regulation as state control as much as safety governance.
At the May 2024 Seoul Summit, 16 major AI companies — including Anthropic, Google DeepMind, Meta, Microsoft, Mistral, OpenAI, and Samsung — signed a "Frontier AI Safety Commitments" document. Companies pledged: to publish safety frameworks before releasing new frontier models, not to deploy models that evaluations show have crossed dangerous capability thresholds, to share safety information with governments and with each other, and to invest in safety research. This was the first time AI companies collectively and publicly accepted safety deployment conditions. Critics noted the absence of enforcement mechanisms and the fact that signatories defined their own thresholds.
Existing international institutions were not designed for AI governance. The UN Secretary-General's High-Level Advisory Body on AI, established in 2023, published interim recommendations in December 2023 calling for a new international scientific panel on AI (analogous to the IPCC for climate) and an international dialogue mechanism. UNESCO adopted a Recommendation on the Ethics of AI in November 2021, signed by 193 member states — making it the broadest international AI agreement, though non-binding.
The ITU (International Telecommunication Union) and OECD have developed AI policy frameworks; the OECD's AI Principles (2019) were the first intergovernmental AI policy standard, endorsed by 46 countries including non-OECD members. But none of these bodies has enforcement authority.
The honest assessment from governance scholars: international AI governance is at roughly the stage that nuclear governance was in 1947 — urgently needed, technically complex, and stymied by geopolitical competition. The institutions don't yet exist at the scale the problem demands.
You're advising a working group at a fictional international AI governance body trying to draft a framework that major powers — including the US, EU, China, and India — might actually sign. Work through the political and technical obstacles.
On 17 November 2023, OpenAI's board of directors fired CEO Sam Altman "for not being consistently candid with the board." The stated reason was vague; the underlying tension — between OpenAI's safety-focused nonprofit mission and the commercial pressure of its for-profit subsidiary — was not. Within four days, nearly 700 of OpenAI's 770 employees had signed a letter threatening to resign if Altman wasn't reinstated. He was. The board members who voted to fire him, including chief safety officer Ilya Sutskever who later recanted, were replaced. The episode illuminated every question about corporate AI governance that had been building for years.
OpenAI was structured as a "capped-profit" company: a nonprofit board retained ultimate control over the mission, with the commercial arm able to earn returns up to 100x investment before remaining profits went to the nonprofit. The theory was that nonprofit oversight would keep safety paramount over commercial pressure.
The November 2023 crisis exposed the limits of this theory. The board — which included figures with AI safety backgrounds — attempted to exercise the oversight the structure was designed to enable. The company's employees, investors, and Microsoft (which had invested $13 billion and occupied a seat on the board as a non-voting observer) effectively reversed the decision within days.
Afterward, OpenAI restructured its governance: the nonprofit board retained some oversight but new independent directors with corporate governance expertise joined; the capped-profit structure was to be converted to a full for-profit entity pending regulatory approvals. Safety became a board-level committee rather than a cross-functional priority embedded in the corporate structure itself.
In May 2024, OpenAI's "Superalignment" team — responsible for developing technical approaches to aligning future superintelligent AI — began to collapse. Co-lead Ilya Sutskever left in May; co-lead Jan Leike resigned days later, publicly stating that "safety culture and processes have taken a backseat to shiny products." Policy head Jan Leike's LinkedIn post was specific: the Superalignment team had been promised 20% of compute for safety research; he alleged this commitment was not honored. A cascade of senior safety researchers followed him out the door through mid-2024. OpenAI disputed some characterizations but acknowledged the departures.
Anthropic was founded in 2021 largely by researchers who left OpenAI, in part over concerns about safety culture. Its governance structure reflects those concerns: it is incorporated as a Public Benefit Corporation (PBC) in Delaware, legally requiring directors to consider effects on society and the environment, not only shareholder returns. Anthropic's Long-Term Benefit Trust, which holds the company's mission, has the ability to appoint board members if the board deviates from its safety mission.
Whether these structural protections are sufficient — or whether commercial pressure will erode them as the company scales — is an open question. Amazon invested $4 billion (with a path to $4 billion more) in 2023; Google invested $500 million. As training runs grow more expensive and competitive pressure intensifies, the test of Anthropic's governance structure hasn't fully come.
One underappreciated governance lever is employee voice. The November 2023 OpenAI crisis showed that employee collective action can override board decisions — but in the direction of less safety oversight, not more. More constructively, safety researchers at major labs have historically raised concerns through internal channels, written research memos that became external publications, and resigned publicly when they felt safety was being compromised.
Geoffrey Hinton resigned from Google in May 2023, explicitly to speak freely about AI risks. OpenAI's Alignment Faking paper (December 2024 by Anthropic) and OpenAI's research on emergent deceptive behaviors came partly from researchers willing to publish findings that complicated the companies' commercial narratives. The challenge is that the same market competition that funds safety research also creates incentives to suppress findings that delay deployment.
The "alignment problem" — ensuring AI systems pursue the goals humans actually want — has a corporate analog: ensuring AI companies pursue the goals society actually wants, not just those that maximize revenue. The technical and institutional challenges rhyme: both require specifying values precisely, building monitoring systems that catch deviations, and creating correction mechanisms that work even when the corrected entity has more capability than its overseers. The OpenAI crisis showed these problems are not merely theoretical.
From the evidence of 2023–2024, several governance approaches appear more robust than others. Mandatory safety reporting — requirements to share dangerous capability evaluations with government bodies before deployment — creates external oversight that doesn't depend on company culture. The Biden EO's requirement and the UK AISI's evaluation access are early examples.
Independent evaluation bodies with genuine resource and access give outside parties a real view of what models can do. The UK AISI and US AISI matter precisely because they are not company self-reports. Whistleblower protections for AI safety researchers are discussed but not yet well-established in law.
The deeper question — whether any governance structure can keep pace with capabilities that improve quarterly while policy moves on multi-year timescales — remains genuinely open. The history of technology governance offers cautionary tales: social media platforms were largely ungoverned during the years when their behavioral design patterns were locked in. The lesson most AI governance advocates draw is that the time to build oversight institutions is before the technology is fully deployed, not after.
You're a governance consultant advising a new AI company that has just raised $500 million and plans to build a frontier model. The founders want to take safety seriously — but also need to compete commercially. Help them design governance structures that can actually hold under pressure.