Lesson 1 · Module 4

The Regulatory Landscape

How governments worldwide are racing to govern technology that moves faster than legislation

Why is AI governance so fragmented — and what does that mean for safety?

On 13 March 2024, the European Parliament voted 523 to 46 to pass the EU AI Act — the world's first comprehensive legal framework for artificial intelligence. Negotiators had worked through the night in December 2023 after foundation models nearly derailed the entire bill. The compromise reached that night would redefine what "high-risk" AI meant, and who was responsible when it caused harm.

The Governance Gap

The EU AI Act represents years of legislative work that began in April 2021. Its passage illustrates both the ambition and the difficulty of AI governance: the original draft didn't even mention large language models, which barely existed at the time. The final text required last-minute rewrites to address GPT-4 and systems like it.

Simultaneously, the United States took a fundamentally different path. President Biden's Executive Order on Safe, Secure, and Trustworthy AI (October 2023) used existing executive authority to require safety evaluations for frontier models, directed agencies to develop sector-specific guidance, and invoked the Defense Production Act to require AI developers to share safety test results with the government. It was broad in ambition but limited in binding force.

The UK pursued a third model. Rather than legislating a new AI law, the government argued that existing sector regulators — the FCA for finance, the CQC for healthcare, the ICO for data — should apply their existing authority to AI. The AI Safety Summit held at Bletchley Park in November 2023 convened 28 countries plus the EU to sign the first international declaration on frontier AI risk, but produced no binding commitments.

Real Case — EU AI Act Risk Tiers

The Act classifies AI systems into four risk tiers. "Unacceptable risk" (banned): social scoring by governments, real-time biometric surveillance in public spaces. "High risk": AI in medical devices, CV screening, credit scoring. "Limited risk": chatbots must disclose they're AI. "Minimal risk": spam filters, AI in video games. High-risk systems face mandatory conformity assessments, human oversight requirements, and registration in an EU database before deployment.

China's Approach: Sector-by-Sector Rules

China has moved fastest in issuing specific AI regulations. The Cyberspace Administration of China published binding rules for recommendation algorithms (effective March 2022), deep synthesis — deepfakes and synthetic media — (effective January 2023), and generative AI services (effective August 2023). Each required registration, content moderation, and watermarking of AI-generated content.

The generative AI rules are particularly notable: any public-facing generative AI service must undergo a security assessment, ensure outputs "reflect core socialist values," and prevent generation of content that "subverts state power." Critics note these rules constrain domestic competition as much as they address safety.

Key Concepts in AI Governance

Risk-based regulationCalibrating legal obligations to the severity of potential harms, rather than applying uniform rules to all AI systems. The EU AI Act's tiered structure is the leading example.

Frontier modelA highly capable AI system at or near the cutting edge of performance, typically trained at enormous scale. Both the EU AI Act and Biden's EO created special obligations for these systems.

Regulatory fragmentationThe absence of coordinated international rules, resulting in companies facing different — sometimes contradictory — legal requirements in different jurisdictions.

Compute governanceUsing control over AI training hardware (chips, data centers) as a regulatory lever. The US export controls on advanced semiconductors to China, issued October 2022 and tightened in 2023, are the primary real-world example.

The Bletchley Declaration

The November 2023 Bletchley Park summit produced the first international consensus statement on "frontier AI safety." Signatories — including the US, UK, EU, China, India, and 23 others — agreed that frontier models pose "potential for serious, even catastrophic, harm" and committed to information sharing. Notably, China signed. But the declaration created no enforcement mechanism, no shared definitions, and no binding timelines.

The Standards Problem

Regulation without standards is difficult to enforce. The EU AI Act references technical standards from CEN-CENELEC and ETSI, but those standards don't yet fully exist. NIST in the US published its AI Risk Management Framework in January 2023, offering voluntary guidance that has become a de facto reference point for US companies. ISO/IEC is developing international AI standards (42001 series), but international standards bodies move on multi-year timescales while models improve quarterly.

This gap between regulatory ambition and technical measurement capability is one of the defining problems of AI governance. You cannot enforce requirements for "transparency" or "robustness" until there is agreement on how to measure them.

Lesson 1 Quiz

The Regulatory Landscape · Check your understanding

1. The EU AI Act was passed by the European Parliament with what vote margin in March 2024?

Correct. The 523–46 vote reflected broad cross-party support after a contentious negotiation over foundation models in December 2023.

Not quite. The vote was 523 to 46 — a very strong majority, though foundation models nearly derailed negotiations in late 2023.

2. Which legal authority did President Biden invoke in his October 2023 AI Executive Order to require safety test result sharing?

Correct. The Defense Production Act gave the administration authority to compel AI developers to share safety evaluations with the government before deployment.

Not quite. Biden invoked the Defense Production Act — a Korean War-era law that grants the executive branch broad authority over industries deemed critical to national security.

3. Under the EU AI Act's risk framework, which of the following is classified as "unacceptable risk" and therefore banned?

Correct. Government social scoring — like assigning citizens behavioral ratings that affect their rights — is banned outright. CV screening is high-risk (allowed with conditions); chatbots are limited-risk; spam filters are minimal-risk.

Not quite. Government social scoring systems are the banned category. CV screening is "high-risk" (permitted with requirements), not banned.

4. China's August 2023 generative AI regulations required what marking on AI-generated content?

Correct. China's generative AI rules mandate watermarking, along with security assessments and content moderation requirements tied to "core socialist values."

Not quite. China's rules require watermarking of AI-generated content, along with security registration and content requirements.

Lab 1 — Regulatory Comparisons

Explore how different governance models handle AI risk

Your Mission

You're advising a company preparing to deploy an AI hiring tool across the EU, US, and China. Each jurisdiction has different rules. Use this lab to map the requirements and identify conflicts.

Try asking: "What does the EU AI Act require for an AI CV screening system?" or "How do US and EU approaches to AI governance differ fundamentally?" or "What are the key risks of regulatory fragmentation for a global AI product?"

AI Governance Advisor

Lab 1

Welcome. I'm your AI governance advisor for this lab. You're mapping regulatory requirements for an AI hiring tool to be deployed in the EU, US, and China. What aspect of the regulatory landscape would you like to explore first?

Lesson 2 · Module 4

Safety Evaluations and Red-Teaming

The technical machinery being built to catch dangerous AI capabilities before deployment

How do you test a system for risks you haven't imagined yet?

Before Claude 2 was released in July 2023, Anthropic ran extensive internal red-teaming exercises — teams of researchers deliberately trying to elicit harmful outputs, test the model's capabilities in dangerous domains, and find gaps between intended and actual behavior. This practice, borrowed from cybersecurity, had become standard at frontier AI labs. The question was whether it was sufficient — and who got to decide.

What Red-Teaming Means for AI

Red-teaming in AI borrowed from military and cybersecurity traditions: assemble an adversarial team, give them the goal of breaking a system, and use what they find to harden it before deployment. For AI models, this means testing whether a system can be prompted to assist with weapons synthesis, generate child sexual abuse material, provide detailed cyberattack instructions, or exhibit deceptive behavior.

OpenAI published details of its GPT-4 red-teaming process in the model's technical report (March 2023). External red-teamers were given early access under NDA to probe the model for months before release. They found, among other things, that earlier versions could provide detailed step-by-step instructions for synthesizing chemical weapons and assist with planning attacks — capabilities that were then mitigated through fine-tuning and system prompts before launch.

The METR (Model Evaluation and Threat Research) organization, formerly ARC Evals, has developed evaluations specifically for "dangerous capability" assessment: Can the model acquire resources autonomously? Can it assist with creating CBRN weapons? Can it subvert oversight mechanisms? These evaluations were used in pre-deployment assessments for GPT-4 and Claude 2.

Real Case — GPT-4 Pre-Deployment Evaluation

ARC Evals (now METR) ran evaluations on GPT-4 before its March 2023 release, specifically testing "power-seeking" behaviors: whether the model could autonomously replicate itself, acquire computational resources, and resist shutdown. The team found GPT-4 in its evaluated form did not exhibit these behaviors — but noted the evaluation was not exhaustive and that more capable future models would need more sophisticated testing. OpenAI published a summary in the GPT-4 system card.

The Frontier Model Forum

In July 2023, Anthropic, Google, Microsoft, and OpenAI jointly founded the Frontier Model Forum — an industry body to coordinate safety research and develop evaluation standards. Initial commitments included a $10 million AI Safety Fund for independent research and sharing safety information with governments and each other in pre-competitive ways.

Critics noted the obvious tension: the companies most commercially invested in deploying powerful AI were also the primary evaluators of whether it was safe. The Forum acknowledged this but argued it was more practical than waiting for government capacity to develop. Meta joined later in 2023.

NIST's AI Risk Management Framework

Published in January 2023, the NIST AI RMF provides voluntary guidance organized around four functions: Govern (organizational policies), Map (identify context and risks), Measure (analyze and assess), and Manage (prioritize and respond). It deliberately avoided prescribing specific technical tests, arguing that AI evolves too fast for fixed requirements.

The framework became a reference standard for US federal agencies and many private-sector organizations. The Biden EO directed NIST to develop additional guidelines specifically for generative AI and red-teaming, published as NIST AI 100-1 in March 2023.

Dangerous capability evaluationStructured tests to determine whether an AI model possesses specific harmful capabilities — such as assisting with weapons of mass destruction, autonomous replication, or cyberattack planning — before deployment.

System cardA document published by an AI developer disclosing safety evaluations, known limitations, and mitigation measures for a specific model. OpenAI published system cards for DALL-E and GPT-4; Anthropic publishes model cards for Claude.

Responsible scaling policyA commitment by an AI developer to pause or slow development if evaluations show capabilities crossing defined danger thresholds. Anthropic published its RSP in September 2023, a first in the industry.

Anthropic's Responsible Scaling Policy

In September 2023, Anthropic published the first public "Responsible Scaling Policy" — a binding commitment to conduct capability evaluations before each major model release and to halt deployment if models exceed defined "AI Safety Level" thresholds for dangerous capabilities (CBRN assistance, autonomous replication). The policy defined ASL-2 (current models) and ASL-3 thresholds, with deployment pauses required at each level. Other labs were publicly asked to adopt similar policies; OpenAI published a "Preparedness Framework" in November 2023, and Google DeepMind published a "Frontier Safety Framework" in May 2024.

The Limits of Current Evaluations

The honest assessment from researchers at these organizations: current evaluations are far from comprehensive. Red-teaming finds what red-teamers think to look for. Capability evaluations measure current models, not what they might become with fine-tuning or additional context. There is no agreed standard for what "passing" a safety evaluation means.

The UK's AI Safety Institute (AISI), established after the Bletchley Summit in November 2023, was created specifically to develop government capacity for independent evaluations — not relying on company self-reporting. By early 2024 it had evaluated several frontier models and published preliminary findings. The US AI Safety Institute was established within NIST in November 2023 with a similar mandate.

Lesson 2 Quiz

Safety Evaluations and Red-Teaming · Check your understanding

1. Which organization, formerly known as ARC Evals, conducted pre-deployment dangerous capability evaluations on GPT-4?

Correct. METR (then called ARC Evals) ran evaluations specifically testing for autonomous power-seeking behaviors in GPT-4 before its March 2023 release, with results summarized in OpenAI's system card.

Not quite. METR — formerly ARC Evals — conducted these specific evaluations. NIST provides frameworks; the UK AISI was founded later in 2023.

2. Anthropic's Responsible Scaling Policy, published in September 2023, was notable because it was the first industry policy to do what?

Correct. The RSP created binding internal commitments to pause development if models crossed "AI Safety Level" capability thresholds — a self-imposed brake mechanism on scaling.

Not quite. The RSP committed Anthropic to halt or slow development if capability evaluations showed models exceeding defined thresholds — the first such public commitment in the industry.

3. The NIST AI Risk Management Framework organizes its guidance around four functions. Which of the following is NOT one of them?

Correct. The four functions are Govern, Map, Measure, and Manage. "Prohibit" is not one of them — the NIST RMF is deliberately non-prescriptive and avoids banning specific uses.

Not quite. The four NIST AI RMF functions are Govern, Map, Measure, and Manage — not Prohibit. The framework is voluntary and avoids hard bans.

4. The Frontier Model Forum was founded in July 2023 by which four companies?

Correct. The four founding members were the leading frontier model developers at the time: Anthropic, Google, Microsoft, and OpenAI. Meta joined later.

Not quite. The founding members were Anthropic, Google, Microsoft, and OpenAI — the four companies then considered to be developing frontier-scale models.

Lab 2 — Designing Safety Evaluations

Think through what it means to rigorously test an AI system

Your Mission

You're part of a safety team at an AI lab preparing to release a new frontier model. Your job is to design a red-teaming and evaluation protocol. What would you test for, and how?

Try asking: "What categories of dangerous capabilities should we prioritize evaluating?" or "How would you test whether a model could assist with bioweapon synthesis?" or "What are the weaknesses of current red-teaming approaches?"

Safety Evaluation Advisor

Lab 2

Welcome to the safety evaluation lab. You're designing a pre-deployment evaluation protocol for a new frontier language model. What aspects of safety testing do you want to explore or design?

Lesson 3 · Module 4

International Coordination and Competing Visions

The geopolitics of intelligence — how great-power competition shapes the governance of transformative AI

Can nations that compete strategically over AI also cooperate to govern it safely?

The AI Seoul Summit in May 2024 marked the second global gathering on frontier AI safety. Twenty-seven nations, plus the EU, signed the Seoul Ministerial Statement committing to establishing AI Safety Institutes and coordinating on evaluations. For the first time, major AI companies — including Anthropic, Google DeepMind, Meta, Microsoft, OpenAI, Samsung, and others — publicly committed to not deploying models if evaluations found they exceeded safety thresholds. But China was absent from the company commitments, and the US-China dynamic continued to shape every conversation that wasn't in the room.

The US-China AI Competition

The fundamental tension in international AI governance is that the two leading AI powers — the United States and China — are also engaged in deep strategic competition. The US imposed sweeping export controls on advanced AI chips (A100, H100, and successors) to China in October 2022, tightened in October 2023. China's stated goal is AI self-sufficiency by 2030, including domestic chip production through companies like SMIC.

This competition makes certain kinds of cooperation difficult. Intelligence-sharing about dangerous AI capabilities requires trust that doesn't exist. Agreements not to develop certain AI applications militarily are hard to verify. Yet both sides have spoken about avoiding AI-triggered accidents — the US and China held the first direct AI safety talks between governments in May 2024, described by officials as "frank."

The dynamic has a historical parallel: US-Soviet arms control negotiations began even at the height of Cold War competition. The 1972 SALT I treaty, the 1987 INF Treaty — these showed that adversaries can agree on specific limitations even when broadly antagonistic. Whether AI safety admits of the same kind of bilateral treaty structure is an open and important question.

Real Case — US Export Controls on AI Chips

In October 2022, the Commerce Department's Bureau of Industry and Security issued new export controls restricting sale of Nvidia A100 and H100 chips, and equivalent products from AMD, to China without a license. The rules used a "performance threshold" definition (chips above ~4,800 TOPS with certain interconnect bandwidth). Nvidia attempted to sell a degraded version (A800) that technically fell below the threshold; October 2023 controls closed that loophole. The controls directly constrain China's ability to train frontier-scale AI models, creating a hardware-based governance lever.

Different Governance Philosophies

Beyond the US-China split, international AI governance reflects genuinely different philosophical approaches. The EU's approach is fundamentally rights-based: AI regulation is an extension of fundamental rights protection (dignity, privacy, non-discrimination), grounded in the Charter of Fundamental Rights. Harms to individuals are the primary concern.

The UK's approach emphasizes innovation-compatible governance: sector regulators apply existing law, avoiding new legislation that might stifle the industry. The goal is light-touch rules that don't disadvantage British AI companies relative to American competitors.

The US federal approach (post-EO 2023) is executive-led and sector-specific: individual agencies (FDA for medical AI, FTC for consumer AI, CFPB for financial AI) develop their own rules, with NIST providing cross-cutting frameworks. Congress has not passed comprehensive AI legislation as of mid-2024.

China's approach is authoritarian-technocratic: AI must serve state objectives, content must align with official ideology, and commercial AI deployment is conditional on security registration. This is regulation as state control as much as safety governance.

AI Safety Institute (AISI)A government body established to conduct independent evaluations of frontier AI models. The UK established the first AISI in November 2023; the US established one within NIST the same month. By the Seoul Summit in May 2024, ten countries had committed to creating AISIs.

Compute governanceUsing control over the hardware required to train AI (chips, data center infrastructure, cloud access) as a regulatory tool. US semiconductor export controls are the primary example.

Racing dynamicsThe incentive structure where competing nations or companies rush to develop and deploy AI capabilities before rivals, potentially cutting corners on safety to maintain competitive position.

The Seoul AI Safety Commitments

At the May 2024 Seoul Summit, 16 major AI companies — including Anthropic, Google DeepMind, Meta, Microsoft, Mistral, OpenAI, and Samsung — signed a "Frontier AI Safety Commitments" document. Companies pledged: to publish safety frameworks before releasing new frontier models, not to deploy models that evaluations show have crossed dangerous capability thresholds, to share safety information with governments and with each other, and to invest in safety research. This was the first time AI companies collectively and publicly accepted safety deployment conditions. Critics noted the absence of enforcement mechanisms and the fact that signatories defined their own thresholds.

The Role of International Institutions

Existing international institutions were not designed for AI governance. The UN Secretary-General's High-Level Advisory Body on AI, established in 2023, published interim recommendations in December 2023 calling for a new international scientific panel on AI (analogous to the IPCC for climate) and an international dialogue mechanism. UNESCO adopted a Recommendation on the Ethics of AI in November 2021, signed by 193 member states — making it the broadest international AI agreement, though non-binding.

The ITU (International Telecommunication Union) and OECD have developed AI policy frameworks; the OECD's AI Principles (2019) were the first intergovernmental AI policy standard, endorsed by 46 countries including non-OECD members. But none of these bodies has enforcement authority.

The honest assessment from governance scholars: international AI governance is at roughly the stage that nuclear governance was in 1947 — urgently needed, technically complex, and stymied by geopolitical competition. The institutions don't yet exist at the scale the problem demands.

Lesson 3 Quiz

International Coordination · Check your understanding

1. The US export controls on AI chips imposed in October 2022 specifically targeted which Nvidia products?

Correct. The October 2022 controls targeted A100 and H100 chips (and AMD equivalents) — the data center GPUs used to train frontier AI models at scale.

Not quite. The controls targeted A100 and H100 data center chips used for training large AI models, not consumer GPUs. Nvidia attempted a workaround with the A800 before that loophole was closed in 2023.

2. The Seoul AI Safety Commitments (May 2024) were signed by how many major AI companies?

Correct. Sixteen major AI companies signed the Frontier AI Safety Commitments at Seoul, including Anthropic, Google DeepMind, Meta, Microsoft, Mistral, OpenAI, and Samsung.

Not quite. Sixteen major AI companies signed at Seoul — notably the first time companies collectively accepted safety deployment conditions in a public document.

3. UNESCO's Recommendation on the Ethics of AI was adopted in November 2021 by how many member states?

Correct. All 193 UNESCO member states adopted the Recommendation — making it the broadest international AI agreement to date, though it is non-binding.

Not quite. The UNESCO Recommendation was adopted by all 193 member states — the widest international buy-in of any AI agreement, though it carries no enforcement mechanism.

4. Which governance philosophy characterizes the EU's approach to AI regulation, according to the lesson?

Correct. The EU approach frames AI regulation as an extension of fundamental rights — dignity, privacy, non-discrimination — grounded in the Charter of Fundamental Rights.

Not quite. The EU takes a rights-based approach. Innovation-compatible describes the UK; executive-led agency rulemaking describes the US; authoritarian-technocratic describes China.

Lab 3 — International Governance Scenarios

Explore the real tensions in coordinating AI governance across competing powers

Your Mission

You're advising a working group at a fictional international AI governance body trying to draft a framework that major powers — including the US, EU, China, and India — might actually sign. Work through the political and technical obstacles.

Try asking: "What's the biggest obstacle to US-China cooperation on AI safety?" or "How could compute governance be used in an international AI treaty?" or "What lessons from nuclear arms control apply to AI?"

International AI Policy Advisor

Lab 3

Welcome. You're advising an international AI governance working group navigating the tensions between the US, EU, China, India, and other major AI powers. What dimension of international coordination do you want to explore?

Lesson 4 · Module 4

Corporate Governance and the Alignment Imperative

Inside the organizations building transformative AI — what structures, incentives, and crises shape safety decisions

When the people building powerful AI are also those evaluating its safety, what governance structures actually matter?

On 17 November 2023, OpenAI's board of directors fired CEO Sam Altman "for not being consistently candid with the board." The stated reason was vague; the underlying tension — between OpenAI's safety-focused nonprofit mission and the commercial pressure of its for-profit subsidiary — was not. Within four days, nearly 700 of OpenAI's 770 employees had signed a letter threatening to resign if Altman wasn't reinstated. He was. The board members who voted to fire him, including chief safety officer Ilya Sutskever who later recanted, were replaced. The episode illuminated every question about corporate AI governance that had been building for years.

The OpenAI Governance Crisis

OpenAI was structured as a "capped-profit" company: a nonprofit board retained ultimate control over the mission, with the commercial arm able to earn returns up to 100x investment before remaining profits went to the nonprofit. The theory was that nonprofit oversight would keep safety paramount over commercial pressure.

The November 2023 crisis exposed the limits of this theory. The board — which included figures with AI safety backgrounds — attempted to exercise the oversight the structure was designed to enable. The company's employees, investors, and Microsoft (which had invested $13 billion and occupied a seat on the board as a non-voting observer) effectively reversed the decision within days.

Afterward, OpenAI restructured its governance: the nonprofit board retained some oversight but new independent directors with corporate governance expertise joined; the capped-profit structure was to be converted to a full for-profit entity pending regulatory approvals. Safety became a board-level committee rather than a cross-functional priority embedded in the corporate structure itself.

Real Case — OpenAI's Superalignment Team Departures

In May 2024, OpenAI's "Superalignment" team — responsible for developing technical approaches to aligning future superintelligent AI — began to collapse. Co-lead Ilya Sutskever left in May; co-lead Jan Leike resigned days later, publicly stating that "safety culture and processes have taken a backseat to shiny products." Policy head Jan Leike's LinkedIn post was specific: the Superalignment team had been promised 20% of compute for safety research; he alleged this commitment was not honored. A cascade of senior safety researchers followed him out the door through mid-2024. OpenAI disputed some characterizations but acknowledged the departures.

Anthropic's Governance Model

Anthropic was founded in 2021 largely by researchers who left OpenAI, in part over concerns about safety culture. Its governance structure reflects those concerns: it is incorporated as a Public Benefit Corporation (PBC) in Delaware, legally requiring directors to consider effects on society and the environment, not only shareholder returns. Anthropic's Long-Term Benefit Trust, which holds the company's mission, has the ability to appoint board members if the board deviates from its safety mission.

Whether these structural protections are sufficient — or whether commercial pressure will erode them as the company scales — is an open question. Amazon invested $4 billion (with a path to $4 billion more) in 2023; Google invested $500 million. As training runs grow more expensive and competitive pressure intensifies, the test of Anthropic's governance structure hasn't fully come.

The Role of Employee Voice

One underappreciated governance lever is employee voice. The November 2023 OpenAI crisis showed that employee collective action can override board decisions — but in the direction of less safety oversight, not more. More constructively, safety researchers at major labs have historically raised concerns through internal channels, written research memos that became external publications, and resigned publicly when they felt safety was being compromised.

Geoffrey Hinton resigned from Google in May 2023, explicitly to speak freely about AI risks. OpenAI's Alignment Faking paper (December 2024 by Anthropic) and OpenAI's research on emergent deceptive behaviors came partly from researchers willing to publish findings that complicated the companies' commercial narratives. The challenge is that the same market competition that funds safety research also creates incentives to suppress findings that delay deployment.

Capped-profit structureA hybrid corporate form in which investors can earn returns up to a defined multiple of their investment, after which further profits flow to the nonprofit mission. OpenAI pioneered this structure; its governance failures led to restructuring toward standard for-profit.

Public Benefit CorporationA US corporate form (available in Delaware and other states) that legally requires directors to weigh societal impact alongside shareholder returns. Anthropic and Kickstarter are examples. Provides some protection against purely profit-driven decisions but is not immune to commercial pressure.

Constitutional AIAnthropic's technique for training models to be helpful, harmless, and honest using a set of explicit principles (a "constitution") and AI feedback, rather than only human feedback. Published in December 2022 as a proposed alignment methodology.

The Alignment Problem in Corporate Context

The "alignment problem" — ensuring AI systems pursue the goals humans actually want — has a corporate analog: ensuring AI companies pursue the goals society actually wants, not just those that maximize revenue. The technical and institutional challenges rhyme: both require specifying values precisely, building monitoring systems that catch deviations, and creating correction mechanisms that work even when the corrected entity has more capability than its overseers. The OpenAI crisis showed these problems are not merely theoretical.

What Governance Structures Actually Work

From the evidence of 2023–2024, several governance approaches appear more robust than others. Mandatory safety reporting — requirements to share dangerous capability evaluations with government bodies before deployment — creates external oversight that doesn't depend on company culture. The Biden EO's requirement and the UK AISI's evaluation access are early examples.

Independent evaluation bodies with genuine resource and access give outside parties a real view of what models can do. The UK AISI and US AISI matter precisely because they are not company self-reports. Whistleblower protections for AI safety researchers are discussed but not yet well-established in law.

The deeper question — whether any governance structure can keep pace with capabilities that improve quarterly while policy moves on multi-year timescales — remains genuinely open. The history of technology governance offers cautionary tales: social media platforms were largely ungoverned during the years when their behavioral design patterns were locked in. The lesson most AI governance advocates draw is that the time to build oversight institutions is before the technology is fully deployed, not after.

Lesson 4 Quiz

Corporate Governance and Alignment · Check your understanding

1. The November 2023 OpenAI board crisis was resolved in what way, and within approximately what timeframe?

Correct. Altman was reinstated within four days, after nearly 700 of OpenAI's 770 employees signed a letter threatening to resign if he was not brought back. The board members who fired him were largely replaced.

Not quite. Employee collective action — a letter signed by ~700 of 770 staff — effectively forced the board's reversal within four days, with Altman reinstated and the firing board members replaced.

2. Jan Leike's public resignation from OpenAI's Superalignment team in May 2024 alleged what specific broken commitment?

Correct. Leike specifically alleged that the Superalignment team had been promised 20% of compute for safety research, and that this commitment was not honored — signaling that product development was being prioritized over safety.

Not quite. Leike's specific allegation was that the Superalignment team's promised 20% compute allocation for safety research was not honored, and that safety culture had been displaced by product priorities.

3. Anthropic is incorporated as what type of legal entity, and what obligation does this structure impose on directors?

Correct. Anthropic is a Public Benefit Corporation (PBC), which legally requires directors to consider effects on society and the environment — not only financial returns to shareholders.

Not quite. Anthropic is a Public Benefit Corporation (PBC) — not a nonprofit or capped-profit structure like OpenAI's original model. PBC directors must legally weigh societal impact alongside shareholder returns.

4. Geoffrey Hinton resigned from Google in May 2023. What was his stated primary reason?

Correct. Hinton explicitly cited his desire to speak freely about AI risks as a reason for leaving — a position he said would have been complicated by his Google affiliation.

Not quite. Hinton stated he left Google specifically so he could speak freely about AI risks without the constraint of representing a company with commercial AI interests.

Lab 4 — Corporate AI Governance Design

Build governance structures that can survive commercial pressure

Your Mission

You're a governance consultant advising a new AI company that has just raised $500 million and plans to build a frontier model. The founders want to take safety seriously — but also need to compete commercially. Help them design governance structures that can actually hold under pressure.

Try asking: "What corporate structure best protects safety from commercial pressure?" or "How should a safety team be positioned to have real influence?" or "What can we learn from OpenAI's governance failures for designing better structures?"

AI Corporate Governance Advisor

Lab 4

Welcome. I'm your AI corporate governance advisor. You're designing governance structures for a well-funded new AI lab that wants to take safety seriously while remaining commercially competitive. What would you like to design or explore first?

Module 4 Test

Governing Transformative AI · 15 questions · Pass at 80%

1. In what month and year did the European Parliament pass the EU AI Act?

Correct. March 2024, with a 523–46 vote after contentious December 2023 negotiations over foundation models.

The EU AI Act passed in March 2024 with a 523–46 vote.

2. Which AI systems does the EU AI Act classify as "unacceptable risk" and ban outright?

Correct. Government social scoring and real-time biometric surveillance in public spaces are banned outright as "unacceptable risk."

The banned "unacceptable risk" category includes government social scoring and real-time biometric surveillance — not all LLMs or all hiring AI.

3. The NIST AI Risk Management Framework is organized around which four functions?

Correct. NIST AI RMF's four functions are Govern, Map, Measure, and Manage — published January 2023.

The NIST AI RMF functions are Govern, Map, Measure, and Manage.

4. What type of evaluation did METR (formerly ARC Evals) specifically conduct on GPT-4 before its release?

Correct. METR/ARC Evals specifically tested for autonomous power-seeking — whether GPT-4 could replicate itself, acquire resources, and resist shutdown.

METR ran dangerous capability evaluations specifically probing for autonomous power-seeking behaviors like self-replication and resource acquisition.

5. Anthropic published its Responsible Scaling Policy in what month and year?

Correct. September 2023 — the first public responsible scaling policy in the industry, establishing ASL safety level thresholds.

Anthropic's RSP was published in September 2023.

6. China's generative AI regulations, effective August 2023, required AI-generated content to do what?

Correct. China's generative AI rules mandate watermarking and content that "reflects core socialist values," along with security registration.

China's August 2023 generative AI rules require watermarking and content alignment with "core socialist values."

7. The Bletchley Park AI Safety Summit (November 2023) produced what type of international agreement?

Correct. The Bletchley Declaration was non-binding, signed by 28 countries plus the EU — including China — but created no enforcement mechanism or binding timelines.

The Bletchley Declaration was non-binding — a political statement, not a treaty — signed by 28 countries, including China.

8. The Frontier Model Forum was founded by Anthropic, Google, Microsoft, and OpenAI in what month and year?

Correct. The Frontier Model Forum was founded in July 2023, with an initial $10 million AI Safety Fund for independent research.

The Frontier Model Forum was founded in July 2023.

9. US export controls imposed in October 2022 restricted export of which Nvidia chips to China?

Correct. The October 2022 controls targeted A100 and H100 data center chips. Nvidia's A800 workaround was closed by tightened October 2023 controls.

The controls targeted A100 and H100 data center chips — the primary hardware for training frontier AI models.

10. Approximately how many of OpenAI's employees signed the letter threatening resignation during the November 2023 board crisis?

Correct. Nearly 700 of OpenAI's approximately 770 employees signed the letter — an overwhelming supermajority that effectively forced the board's reversal.

About 700 of 770 employees signed — nearly the entire company, which effectively overrode the board's decision.

11. Jan Leike's resignation from OpenAI's Superalignment team alleged that what specific promised resource was not delivered?

Correct. Leike publicly stated the Superalignment team had been promised 20% of compute for safety research — a commitment he alleged was not honored.

Leike specifically alleged the promised 20% compute allocation for safety research was not delivered.

12. Anthropic's corporate structure includes what special governance body related to its mission?

Correct. Anthropic's Long-Term Benefit Trust holds the company's mission and retains the ability to appoint board members if the company deviates from its safety focus.

Anthropic's Long-Term Benefit Trust can appoint board members if the company strays from its safety mission — the key structural protection distinguishing it from a standard PBC.

13. The Seoul AI Safety Summit in May 2024 resulted in how many nations plus the EU signing the ministerial statement?

Correct. 27 nations plus the EU signed the Seoul Ministerial Statement, which included commitments to establish AI Safety Institutes and coordinate evaluations.

27 nations plus the EU signed at Seoul — a broader coalition than Bletchley's 28, with stronger company safety commitments attached.

14. The OECD AI Principles, the first intergovernmental AI policy standard, were adopted in what year?

Correct. The OECD AI Principles were adopted in 2019 and endorsed by 46 countries — the first intergovernmental AI policy standard.

The OECD AI Principles were adopted in 2019 — predating most national AI strategies and regulatory frameworks.

15. Which of the following best describes the fundamental challenge of AI governance timescales?

Correct. This mismatch — quarterly capability improvements versus multi-year policy cycles — is identified throughout the module as a defining structural challenge of AI governance.

The core timescale problem is that AI capabilities improve on quarterly cycles while policy, legislation, and standards development moves on multi-year cycles — creating a persistent governance gap.