Seven major AI companies β Amazon, Anthropic, Google, Inflection, Meta, Microsoft, and OpenAI β filed into the Roosevelt Room at the White House and emerged with a set of voluntary commitments. They pledged to share safety information, invest in cybersecurity, and watermark AI-generated content. The Biden administration called it a historic first step. Critics called it a press release. Both were partly right.
Within weeks, the same companies jointly announced the Frontier Model Forum, a membership body that would advance AI safety research and define best practices. The question hanging in the air was familiar from every other regulated industry: voluntary commitments are easy to make in a room full of cameras. What actually gets measured, and who enforces it?
Industry self-regulation in AI follows patterns seen in previous technology waves. Companies offer voluntary commitments β formal public pledges about responsible behavior β to signal trustworthiness, pre-empt legislation, and establish norms that favor incumbents with the resources to implement them. The July 2023 White House pledges covered eight areas: internal red-teaming before deployment, sharing safety information across companies, investing in cybersecurity, implementing watermarking of AI-generated content, reporting vulnerabilities, supporting research on AI risk, prioritizing research on societal risks, and developing technical mechanisms to identify AI-generated content.
The commitments were notable for what they did not include: no independent audits, no enforcement body, no withdrawal consequences, and no specific metrics. A company could claim compliance by taking any marginal action in each area. OpenAI, for instance, had already been running red-teaming internally; the pledge required no new behavior on its part.
Voluntary AI commitments consistently lack three elements that make regulatory frameworks effective: independent verification, quantitative benchmarks, and consequences for non-compliance. Without these, commitments function primarily as public relations instruments.
Announced on July 26, 2023, the Frontier Model Forum (FMF) was formed by Anthropic, Google, Microsoft, and OpenAI. Its stated goals included advancing AI safety research, identifying best practices for responsible deployment, sharing knowledge with policymakers, and supporting efforts to address AI risks. A $10 million AI Safety Fund was established.
The FMF structure reflects the tension at the heart of industry self-regulation: the companies funding the forum are also the companies whose conduct it is meant to oversee. Board membership is controlled by paying members. Decisions require consensus among major players who are simultaneously fierce commercial competitors. By early 2024, the FMF had published working papers on red-teaming methodologies and model evaluations, but had not produced binding safety standards or sanctioned any member for unsafe practices.
Critics including AI researcher Timnit Gebru and organizations like the Algorithmic Justice League noted that the FMF focused almost exclusively on catastrophic and existential AI risks β the concerns most relevant to large foundation model developers β while sidelining near-term harms like algorithmic discrimination, labor displacement, and surveillance, which affect marginalized communities most acutely.
Tech industry self-regulation has a mixed record. The Children's Online Privacy Protection Act (COPPA) of 1998 was partly a response to industry failure to voluntarily protect children's data, demonstrating that voluntary measures eventually yield to legislation when harms mount. By contrast, payment card security (PCI-DSS) shows a case where industry self-regulation produced technically specific, consistently enforced standards β though only after data breaches created powerful liability incentives.
The Global Network Initiative (GNI), founded in 2008 after Google and Yahoo faced criticism for cooperating with censorship in China, provides perhaps the most instructive model for AI. GNI requires member companies to undergo independent assessments of their human rights practices every two years. Membership nonetheless remains small, audits are limited in scope, and GNI has no power to expel or sanction members beyond reputational consequences.
The pattern across sectors is consistent: voluntary commitments proliferate when regulation threatens, establish norms that favor incumbents, and tend to address symptoms rather than structural causes of harm. The more specific and independently verified the commitment, the more costly it becomes to maintain and the less likely companies are to adopt it.
In 1930, the Hollywood film industry created the Hays Code as a voluntary self-censorship regime to avoid federal regulation. It persisted for 38 years. The MPAA ratings system that replaced it in 1968 also began as voluntary self-regulation and remains so today. Both illustrate how voluntary regimes can achieve longevity and social legitimacy β but also how they can entrench industry values at the expense of public ones.
You'll analyze real voluntary AI commitments using a framework that distinguishes credible pledges from performative ones. Consider the 2023 White House commitments, the Frontier Model Forum, and similar industry initiatives.
Discuss with the AI assistant what distinguishes a credible voluntary commitment from a public-relations exercise, and how you would design a stronger self-regulatory framework for frontier AI companies.
Google's Advanced Technology External Advisory Council (ATEN) lasted exactly eight days. Announced on March 26, 2019 as an external ethics board for Google AI, it collapsed after employees circulated petitions objecting to the inclusion of Heritage Foundation president Kay Coles James, whose organization had opposed LGBTQ rights. A second member resigned over drone warfare concerns. Google quietly announced on April 4 that the council was dissolved, with a statement acknowledging it had become "untenable."
The episode crystallized a structural problem that plagues internal AI governance everywhere: the company controls membership, agenda, funding, and the ability to disband the board. An ethics body that cannot survive its first controversy over membership has no credible authority over anything else.
Internal AI ethics boards proliferated between 2017 and 2022. Microsoft, IBM, Salesforce, SAP, and dozens of smaller companies created dedicated ethics teams or advisory councils. Their mandates typically included reviewing products for bias and fairness, advising on responsible deployment, and producing public principles documents. By 2022, the wave had crested β and in many cases reversed.
In November 2022, Meta disbanded its Responsible AI team, reassigning most members to generative AI product work. The team had been responsible for Meta's fairness toolkits and its Fundamental AI Research ethics work. The dissolution coincided with Meta's pivot toward aggressive AI product development under Yann LeCun's leadership. Meta did not publicly explain the decision.
At OpenAI, the departure of safety-oriented researchers became a recurring story. Ilya Sutskever, co-founder and chief scientist, departed in May 2024 after playing a role in the November 2023 board drama that briefly ousted CEO Sam Altman. Jan Leike, who co-led OpenAI's Superalignment team β tasked with solving alignment for superintelligent systems β resigned in May 2024 with an extraordinary public statement: "Safety culture and processes have taken a back seat to shiny products." The Superalignment team was effectively dissolved months after its founding.
Internal ethics boards face a fundamental principal-agent problem: they are funded by and accountable to the organization whose conduct they evaluate. This creates structural pressure to align findings with business needs, avoid blocking high-revenue products, and self-censor to preserve access and influence.
Red-teaming β deliberately adversarial testing by an internal or external team β has become the primary technical safety process companies invoke. The term originated in Cold War military strategy, where "red teams" simulated Soviet attacks on US defenses. In AI, red-teaming means systematic adversarial prompting to elicit harmful outputs, test safety mitigations, and identify failure modes before deployment.
OpenAI's GPT-4 technical report (March 2023) described an extensive red-teaming process involving over 50 experts in domains including biosecurity, cybersecurity, and disinformation. The report documented specific risk categories tested and mitigations applied. This level of transparency was notable β and prompted immediate questions about what was not disclosed.
The AI Safety Institute (AISI) in the UK, established under the Bletchley Declaration in November 2023, has worked to formalize red-teaming as a pre-deployment evaluation standard. AISI conducted evaluations of several frontier models before their public release, finding in its first published evaluation of Claude 3 Opus (April 2024) that the model showed no uplift capability for creating chemical or biological weapons β but acknowledged the methodology was still evolving.
Critics of company-conducted red-teaming note that the teams report to company leadership, test only what leadership decides to test, and have their findings filtered before public release. Independent red-teaming β as practiced by AISI, the US AI Safety Institute (USAISI), and academic researchers β addresses this but faces access challenges: companies control what model versions researchers can test.
Research on corporate governance suggests internal ethics mechanisms work best when they have: independent reporting lines (to boards rather than executives), veto power or meaningful delay authority over product launches, protected employment for ethics personnel, external validation of findings, and public accountability through disclosure of recommendations and outcomes.
Anthropic's Constitutional AI approach and its published Acceptable Use Policy represent an attempt to embed safety in the technical training process rather than relying solely on post-hoc review β but Anthropic is a private company with no obligation to disclose whether its internal safety recommendations have ever delayed or modified a product launch. Google DeepMind's published safety policies and regular model cards represent stronger disclosure practices than most, yet the merger of Google Brain and DeepMind in 2023 raised concerns about whether safety-focused research culture would survive commercial pressures.
On November 17, 2023, OpenAI's board β which under the company's unusual structure had a nonprofit governance mandate to ensure AI benefited humanity β fired CEO Sam Altman, citing concerns about his candor. Within 96 hours, Microsoft offered Altman a new role, nearly the entire OpenAI staff threatened to resign and follow him, and the board reversed course and reinstated Altman. The board members who had voted to fire him resigned or were removed. The episode demonstrated that even a structurally unusual governance mechanism designed to prioritize safety over commercial interests could be rapidly overwhelmed by financial and employment pressure.
Given the failures of Google's ATEN, the dissolution of Meta's Responsible AI team, and the OpenAI board crisis, you're tasked with designing a more credible internal AI governance structure for a hypothetical large AI company.
Discuss with the AI assistant what structural features would make an internal AI ethics board or safety team genuinely effective rather than performative β and what trade-offs companies face in implementing them.
In the marble corridors of ISO's Geneva offices, representatives from 167 national standards bodies spent three years negotiating ISO/IEC 42001 β the world's first international standard for AI management systems. Published in December 2023, it specifies how organizations should plan, implement, and improve their AI governance. It does not tell them what to do about any specific AI capability.
The standard emerged from a working group heavily populated by representatives from large technology companies. IBM, Microsoft, and Google each had multiple delegates. Civil society organizations, affected communities, and academic researchers were nearly absent from the drafting process. This is not unusual β it is how standards are made. The question for AI governance is whether technical standards written by incumbent technology companies can adequately protect interests those companies have no financial incentive to prioritize.
Released in January 2023, the NIST AI Risk Management Framework (AI RMF 1.0) is the United States' primary voluntary technical guidance for AI governance. Developed through extensive public consultation, it organizes AI risk management around four functions: Govern, Map, Measure, and Manage β with detailed practices for each.
The AI RMF is notable for its comprehensiveness and its explicit acknowledgment of sociotechnical risks including bias, privacy, and human rights. It draws explicitly from prior NIST frameworks in cybersecurity and privacy. The framework is explicitly voluntary and non-prescriptive β it tells organizations to think carefully about AI risks but does not specify what risk levels are acceptable or what mitigations are required.
By 2024, NIST had developed supplementary profiles including the Generative AI Profile (NIST AI 600-1), which addressed risks specific to large language models including confabulation, data privacy, and information integrity. Federal agencies were directed by executive order to use the AI RMF for managing AI risks in government applications.
The limitation of the NIST approach is characteristic of voluntary frameworks: adoption is uneven. Companies already committed to responsible AI adopt the framework and find it useful. Companies racing to ship products treat it as a documentation exercise. The framework cannot distinguish between these uses, and NIST has no enforcement authority.
Govern: Organizational policies, culture, and accountability structures for AI risk. Map: Categorizing AI contexts, intended uses, and potential harms. Measure: Analyzing and assessing AI risks quantitatively and qualitatively. Manage: Prioritizing, responding to, and monitoring AI risks throughout the system lifecycle.
The IEEE's Ethically Aligned Design initiative, launched in 2016, produced its first full document in 2019 β a 290-page framework for embedding ethical principles into autonomous and intelligent systems. The document drew on contributions from hundreds of experts globally and addressed topics ranging from algorithmic bias to autonomous weapons.
More concretely, the IEEE Standards Association launched the P7000 series β a collection of specific standards including P7001 (Transparency of Autonomous Systems), P7002 (Data Privacy Process), P7003 (Algorithmic Bias Considerations), and P7004 (Standard for Child and Student Data Governance). These process standards define how to think through specific risks, not what outcomes to achieve.
The adoption of IEEE P7000 standards by industry has been limited. Unlike ISO 9001 (quality management) or ISO 27001 (information security), which became near-universal requirements for enterprise contracting, the P7000 series has not achieved equivalent market pressure for adoption. No major corporate purchasing contract or government procurement requirement mandated compliance as of 2024.
Founded in 2016 by Amazon, Apple, DeepMind, Facebook, Google, IBM, and Microsoft β later joined by academic and civil society members β the Partnership on AI (PAI) was established to study and formulate best practices for AI systems. Unlike the Frontier Model Forum, PAI includes civil society and academic members, making it structurally more representative.
PAI's published outputs include the Responsible Practices for Synthetic Media framework (2023), which provides guidance on deepfake labeling and content authentication. The framework was cited by several signatories as the basis for their voluntary watermarking commitments in the White House 2023 pledges. PAI also produced research on worker wellbeing in AI-impacted industries and fairness in algorithmic decision-making.
The structural tension in PAI is between its corporate funders β who control most of the operating budget β and its civil society members, who bring perspectives on affected communities. Corporate members have generally resisted positions that would require binding commitments or regulatory action. Several civil society organizations have publicly expressed frustration with the pace and depth of PAI's outputs, noting that consensus requirements among members with conflicting interests consistently produce the weakest possible recommendations.
Standards bodies consistently face the same structural problem: those with the most expertise in a technology (its developers) have the most influence in writing its standards, while those most affected by the technology (communities bearing its risks) have the least. The more technically complex the standard, the greater this expertise gap becomes. AI governance standards are among the most technically complex in history.
Standards bodies like ISO, IEEE, and NIST shape AI governance through technical norms that can be as influential as legislation. But the composition of drafting groups, consensus requirements, and adoption dynamics all affect whose interests these standards serve.
Explore with the AI assistant how to evaluate whether an AI standard genuinely protects public interests vs. encodes incumbent industry preferences β and what reforms to standards processes would produce better outcomes.
As EU negotiators worked through the final trilogues on the AI Act, a late-stage lobbying push sought to carve out foundation models from the most stringent requirements. The companies involved β primarily European voices for US foundation model developers β argued that regulating general-purpose AI at the model level would stifle innovation and impose compliance costs that favored large incumbents.
The final text created a tiered system: general-purpose AI models with high systemic impact face stricter obligations including model evaluations, transparency, and adversarial testing β but the threshold was set at 10^25 FLOPS of training compute, a level only a handful of the largest models crossed. Smaller but still powerful models faced lighter obligations. Critics noted that this threshold would be technically obsolete within years as compute efficiency improved.
Co-regulation refers to regulatory frameworks where governments set overarching goals and accountability requirements, while delegating detailed rule-setting and some enforcement to industry bodies or technical standards organizations. It is distinct from pure self-regulation (industry governs itself entirely) and command-and-control regulation (government specifies all requirements).
Co-regulation has precedents in financial services (banks set internal risk models subject to regulatory approval), telecommunications (spectrum allocation with industry technical standards), and internet content moderation (platforms set policies within legal frameworks like NetzDG in Germany or the DSA in the EU). Each sector shows both the potential and the risks: co-regulation can leverage industry expertise and adapt quickly to technology change, but regulatory capture β where the regulated industry shapes regulation in its own interest β is a persistent risk.
For AI, co-regulation models take several forms. The EU AI Act's regulatory sandboxes allow companies to test high-risk AI systems under relaxed rules in exchange for data sharing with regulators. The UK's pro-innovation regulatory framework, outlined in its March 2023 AI regulation white paper, explicitly assigns responsibility to existing sector regulators (FCA for financial AI, CMA for competition, ICO for data) rather than creating a new AI-specific body, while industry-funded "frontier AI taskforces" provide technical guidance.
The EU AI Act (formally adopted June 2024) mandates that general-purpose AI model providers with high systemic impact conduct adversarial testing, share results with the European AI Office, and maintain model cards. For implementation details, however, the Act defers to codes of practice developed by industry groups working with the European AI Office β a classic co-regulatory structure.
The European AI Office, established within the European Commission in February 2024, is the primary supervisory body for general-purpose AI models under the AI Act. It oversees compliance, coordinates enforcement across member states, and β crucially β manages the development of codes of practice that will fill in the technical details of the Act's requirements.
The first AI Code of Practice drafting process, launched in late 2024, involved over 1,000 stakeholders including AI developers, civil society organizations, and member state representatives. The process was significantly more inclusive than ISO standards drafting β but also significantly more complex and slower. The codes of practice must be finalized before GPAI rules fully apply, creating a window during which enforcement depends on voluntary compliance.
A specific tension emerged around copyright and training data: the AI Act requires GPAI model providers to publish "sufficiently detailed summaries" of training data used. Several major providers argued this would require disclosing commercially sensitive information. The codes of practice process became a venue to negotiate how much transparency was actually required β illustrating how co-regulation often involves ongoing negotiation about the actual content of rules, not just their implementation.
The UK explicitly chose not to pass comprehensive AI legislation in 2023β2024, instead issuing a white paper directing existing sector regulators to apply their existing frameworks to AI. The Competition and Markets Authority (CMA) launched a foundation models review in 2023, examining whether frontier AI created anticompetitive market structures. The Information Commissioner's Office (ICO) issued guidance on generative AI and data protection. The Medicines and Healthcare products Regulatory Agency (MHRA) addressed AI in medical devices.
This distributed approach leverages sector-specific expertise and avoids creating a large new regulatory bureaucracy. Its weakness is coordination: an AI system used in healthcare, credit scoring, and employment decisions may be regulated by three different bodies with different standards, creating inconsistency and compliance complexity. The government's proposed AI Safety Institute and later AI Security Institute focused on frontier model evaluation rather than horizontal governance β leaving the coordination gap largely unfilled as of 2024.
The UK's approach also created uncertainty for industry: companies operating across EU and UK markets faced different regulatory requirements, and the UK's "pro-innovation" framing raised questions about whether safety would be adequately weighted when it conflicted with competitiveness goals.
Across the three models examined in this lesson β EU comprehensive legislation with co-regulatory implementation details, UK distributed sector regulation with voluntary guidelines, and US voluntary frameworks with some federal sector requirements β the EU model provides the strongest legal accountability while the US and UK models offer more industry flexibility. No model has yet produced independent verification of meaningful safety improvements for frontier AI systems. The empirical record remains limited because the models themselves are so new.
You've seen how the EU AI Act uses co-regulatory codes of practice, how the UK distributes AI oversight across sector regulators, and how voluntary frameworks fill the space where law hasn't yet reached. Now design your own co-regulatory model.
Choose a specific AI application domain β hiring algorithms, medical diagnosis AI, autonomous vehicles, content moderation systems, or credit scoring β and work with the AI assistant to build a co-regulatory framework that balances innovation with meaningful public accountability.