On July 22, 2023, Ilya Sutskever โ co-founder of OpenAI and one of the most cited AI researchers alive โ quietly signed a public statement. The statement was short. It said, in plain language, that mitigating the risk of extinction from AI should be a global priority alongside pandemics and nuclear war.
Over 350 AI scientists, executives, and engineers signed it on the same day. Many of them had spent years building the very systems they were now warning about. The statement was published by the Center for AI Safety, a small research nonprofit in San Francisco that most people outside the field had never heard of.
Reporters called it shocking. But to the researchers who had been working on AI safety for a decade, it wasn't shocking at all. It was, if anything, overdue. The question is: who had been working on this before it became a headline โ and why did they start?
In the early 2000s, worrying about AI safety was considered eccentric โ maybe even embarrassing. AI systems at the time could barely play chess or recognize photographs. Talking about existential risk from AI was the kind of thing that got researchers laughed out of conferences.
Then in 2000, a philosopher named Nick Bostrom at Oxford University published a paper asking a question almost nobody else was asking: what happens when AI systems become significantly more capable than humans? Not science fiction AI โ real systems, built the same way the existing ones were built, just much better. He called the scenario he was worried about a superintelligence, meaning an AI whose reasoning ability exceeds the best human experts across most or all domains.
Most people dismissed it. A handful didn't.
In 2004, a programmer and writer named Eliezer Yudkowsky had already founded something called the Machine Intelligence Research Institute (MIRI) in Berkeley, California โ one of the first organizations in history dedicated specifically to making sure future AI systems wouldn't cause catastrophic harm. He was 24 years old. He had no PhD. He had been writing about AI risk online since 1996, when the web was barely a few years old.
What made Yudkowsky take it seriously when almost nobody else did? He had thought through a specific scenario: an AI that is given a goal โ almost any goal โ and is sufficiently capable might pursue that goal in ways that humans don't anticipate, don't want, and can't stop. The problem wasn't that AI would be evil. The problem was that it might be indifferent to human welfare while being extremely effective at achieving something else. He called this misaligned goals โ the AI optimizes hard for the wrong thing.
For years, MIRI operated on small donations and was read mostly by people who found it through online forums. That changed in 2014, when Nick Bostrom published a book called Superintelligence. It became a bestseller. Elon Musk tweeted that it was worth reading "if you care about the future of humanity." Bill Gates said it was one of the most important books he'd read. Suddenly, AI safety was in the news.
That same year, 2014, a group of researchers at Google โ most of them working on machine learning โ started asking internal questions about what would happen if their systems scaled up significantly. One of those researchers was Stuart Russell, a computer science professor at UC Berkeley who would later write the standard university textbook on AI. Russell wasn't a fringe figure. He was about as mainstream as AI research gets. And he started saying publicly that the field needed to think seriously about whether AI systems would actually do what humans wanted them to do.
In 2015, Russell helped co-found the Center for Human-Compatible AI (CHAI) at Berkeley โ specifically to work on the technical problem of making AI systems that reliably understand and pursue human values, not just the specific goals written into their code.
Also in 2015, Elon Musk, Sam Altman, and several top researchers co-founded OpenAI, explicitly framing it as a safety-first AI lab. The stated reason: if powerful AI was coming regardless, it was better to have safety-focused researchers at the frontier than to leave the field to organizations that didn't think about the risks.
That logic โ you can't stop it, so you'd better be the one doing it carefully โ is one of the central tensions in AI safety that you'll see come up again and again.
If you believe a dangerous technology is inevitable, does building it yourself โ while trying to be careful โ actually make the world safer? Or does it just accelerate the timeline? There is no clean answer to this. Some of the people who co-founded OpenAI in 2015 later left, saying the lab had become what it was created to prevent.
Most people think AI safety research started recently โ maybe 2022 or 2023, when ChatGPT became famous and politicians started making speeches about it. You now know that the foundational work started decades earlier, built by a small number of people who were dismissed, then gradually taken seriously, then suddenly treated as prophets.
That history matters. It tells you that the organizations working on AI safety today didn't appear out of nowhere. They have intellectual genealogies โ specific arguments, specific people, specific moments when the field shifted. MIRI, CHAI, OpenAI's safety team, the Center for AI Safety: these aren't interchangeable. They have different philosophies, different methods, and sometimes disagree sharply with each other.
Knowing where each organization came from helps you evaluate what they say โ and what they might be missing.
When a tech CEO says their company "takes safety seriously," you can now ask: what organization did their safety team come from? What specific problem are they trying to solve? Are they focused on misaligned goals, on near-term harms, or on longer-term risks? The answers tell you far more than the headline does.
You've been asked to write a one-paragraph briefing on why AI safety research started when it did and who drove it. Your AI research partner has read the same material โ but will challenge your framing, ask for evidence, and point out what you might be getting wrong.
Your job is to defend a position, not just summarize facts. Take a stance: was the early AI safety movement too early, too late, or well-timed? Who deserves the most credit for making it a legitimate field?
On November 17, 2023, the board of directors of OpenAI fired Sam Altman, the company's CEO. The reason, they said, was that he had not been "consistently candid" with the board. What exactly he had withheld, they did not say publicly.
Within 48 hours, nearly every senior employee at OpenAI โ over 700 people โ signed a letter threatening to quit unless the board reversed its decision. Several members of OpenAI's own safety team signed the letter too. Five days later, Altman was reinstated. The board members who fired him resigned.
The episode revealed something that most people outside the industry didn't know: OpenAI's structure was specifically designed so that a safety-focused nonprofit board could override commercial decisions. That structure had just failed its first real test โ or succeeded in it, depending on who you ask. Either way, a safety mechanism had been installed, used, and then dismantled, all in under a week.
Every major AI lab โ OpenAI, Google DeepMind, Anthropic, Meta AI โ now has some version of a dedicated safety team. But the word "safety" can mean very different things depending on which lab you're talking about.
At Anthropic, founded in 2021 by Dario Amodei, Daniela Amodei, and several colleagues who left OpenAI specifically over safety disagreements, nearly the entire company is framed around safety research. Anthropic doesn't just have a safety team โ it argues that safety and capability research are the same work. Their AI system, Claude, is designed from the ground up using a method called Constitutional AI, where the AI is trained to reason about its own behavior against a set of principles, not just to follow rules. Anthropic's stated mission is to be the company that figures out how to make powerful AI systems that don't harm people โ while building some of the most powerful AI systems in existence. The tension there is the same one from Lesson 1.
At Google DeepMind, safety work is distributed across multiple teams. Some focus on near-term harms โ things like bias in language models, or dangerous misuse. Others, including the team led by Geoffrey Irving and researchers like Jan Leike (who later moved to Anthropic), work on longer-term alignment problems: what happens as systems become more capable?
At Meta AI, the philosophy is different again. Yann LeCun, Meta's chief AI scientist, has publicly argued that current AI systems โ including large language models โ are not on a path toward dangerous superintelligence at all, and that treating them as if they are is a distraction from more immediate, tractable problems like bias and misinformation. This puts Meta's safety philosophy in direct conflict with the approaches at Anthropic and OpenAI.
Here is the structural problem with internal safety teams: they exist inside companies whose financial survival depends on shipping products. A safety team that says "we need to delay this release by six months" is costing the company money, competitive position, and possibly the ability to raise funding. The safety team doesn't control the budget. It doesn't set the product roadmap. It advises.
In May 2024, Jan Leike โ who led the Superalignment team at OpenAI, which was explicitly created to solve the long-term alignment problem โ resigned. In his resignation post on X (formerly Twitter), he was unusually blunt: "Safety culture and processes have taken a back seat to shiny products." He said that OpenAI's board and leadership had repeatedly made choices that prioritized product development over safety research, and that he no longer believed the company was on the right path.
A week earlier, Ilya Sutskever โ the same co-founder who signed the AI safety statement in 2023 โ also quietly left the company. Both departures were significant: these were the people who were supposed to be the internal check on OpenAI's development speed.
This is a pattern worth watching. Safety teams get founded with real authority, then gradually find that authority eroded as commercial pressures grow. It has happened at OpenAI. Researchers who study organizational behavior call it safety normalization โ where safety becomes a brand and a talking point rather than an actual constraint on decisions.
If a safety researcher stays at a company they believe is making dangerous decisions, they might slow things down slightly from the inside. If they leave, they lose all influence over what the company does next. Which choice is more ethical โ staying in a compromised position, or leaving with your integrity intact? This question has no clean answer. People who work in AI safety disagree about it bitterly.
When the Amodei siblings and their colleagues left OpenAI in 2021, they didn't just start a new lab. They structured it differently. Anthropic is a public benefit corporation โ a legal structure in the US that requires the company to consider public benefit alongside profit, not just profit. It also has a Long-Term Benefit Trust that is supposed to hold the company accountable to its safety mission even if investors push for faster, less careful development.
Whether that legal structure actually constrains behavior in practice is an open question โ one that will probably be tested in the next few years as AI systems become more capable and the financial stakes get higher. But the structure itself reflects a lesson learned from watching OpenAI's board get overridden: if you want safety to be a real constraint, it has to be baked into the company's legal DNA, not just its culture.
You now understand something that even most journalists covering AI don't clearly grasp: the difference between a company that has a safety team and a company that has built safety into its governance structure is enormous. One is a department. The other is a legal obligation.
Next time you read that an AI company "prioritizes safety," look up how they are legally structured. Is safety embedded in their charter, their board, their investor agreements โ or is it just a team name on an org chart? The answer tells you far more than their press releases do.
You're an independent auditor reviewing a fictional AI company called NovaMind. NovaMind has: a 20-person safety team, a Chief Safety Officer who reports to the CEO, a safety review process for new products, and a company culture document that says "safety first." But the CSO has no veto power over product releases, and safety team members are evaluated partly on product shipping metrics.
Your partner has reviewed the same materials. State your overall assessment: is NovaMind's safety structure genuine or mostly cosmetic? Then defend it as I push back.
On May 16, 2023, Sam Altman sat before the US Senate Judiciary Committee. Senators on both sides of the aisle asked him about AI risks. Altman agreed with many of their concerns. He suggested the government should create a new agency to regulate AI. He said OpenAI wanted oversight. The hearing was widely covered.
What got less coverage: sitting a few rows behind Altman in the hearing room was Yoshua Bengio, one of the three researchers who won the 2018 Turing Award โ the Nobel Prize of computer science โ for inventing the deep learning methods that made modern AI possible. Bengio wasn't testifying that day. He was there as an observer. But he had come to Washington because he had recently started speaking publicly about something he found deeply uncomfortable: he believed the systems he helped invent were now potentially dangerous, and that the people building them commercially had too much financial incentive to minimize those risks.
Within a year, Bengio would become one of the most prominent voices calling for an international treaty on AI development โ a binding agreement between governments, like the treaties on nuclear weapons and chemical weapons. That call came not from inside a lab, but from a university professor who had nothing to sell.
There's a structural reason why some of the bluntest warnings about AI risk come from academic researchers and nonprofit organizations rather than from the companies building the systems: independence means you don't have a product to protect.
When a researcher at Anthropic says a system might be dangerous, they are also saying something about a system their employer sells. When a researcher at a university says the same thing, they have nothing to lose financially. That's not to say company researchers can't be honest โ many are โ but the incentive structure is different, and incentive structures shape what gets said out loud.
The academic and nonprofit side of AI safety includes several distinct types of organizations. The Center for AI Safety (CAIS), run by Dan Hendrycks, focuses on specific technical risks and published the 2023 statement on extinction risk. It operates on donations and has no commercial AI products. The Future of Life Institute (FLI), founded in 2014 by physicist Max Tegmark and others, works on policy and published the 2023 open letter calling for a six-month pause in advanced AI development โ a letter signed by over 30,000 people including Elon Musk and Steve Wozniak.
The AI Now Institute takes a very different approach. Founded in 2017 by Kate Crawford and Meredith Whittaker at New York University, it focuses not on hypothetical future superintelligence but on AI systems causing harm right now: surveillance, hiring discrimination, welfare systems that deny benefits based on faulty algorithms, and the concentration of AI power in a small number of companies.
These organizations are sometimes in conflict with each other, not just with AI companies. Whether you should focus on near-term concrete harms or longer-term speculative catastrophes is one of the most significant debates in the AI safety field.
In March 2023, the Future of Life Institute published an open letter calling for a six-month pause in training AI systems more powerful than GPT-4. The letter argued that no one โ not the labs, not governments, not the public โ was prepared to handle what came next, and that a pause would give society time to catch up.
The letter was signed by many prominent AI researchers and public figures. It was also criticized, sharply, by many others โ including some who deeply care about AI safety.
Timnit Gebru, a researcher who was forced out of Google's AI ethics team in 2020 after co-authoring a paper about the risks of large language models, argued that the pause letter was a distraction. In her view, focusing on hypothetical future superintelligence allowed companies to avoid accountability for the harms their current systems were causing today. She founded the DAIR Institute (Distributed AI Research Institute) specifically to do AI research that isn't funded by the big labs โ and to focus on communities most affected by AI systems right now, not on scenarios that might happen in 20 years.
This is the kind of real, substantive disagreement that exists inside the AI safety world โ not just between safety researchers and AI companies, but among people who all agree AI can cause serious harm. What kind of harm, and when, and to whom โ those questions divide the field.
If you had to allocate $100 million in AI safety research funding, how much would you direct toward near-term harms โ bias, surveillance, algorithmic discrimination affecting people today โ versus long-term alignment research aimed at preventing catastrophic risks from future, more capable systems? There is no objectively correct answer. Thoughtful, informed people argue both ways. What does your reasoning tell you?
Yoshua Bengio is not the only Turing Award winner who became a prominent voice on AI safety. Geoffrey Hinton, whose deep learning research was as foundational as Bengio's, left Google in May 2023. His stated reason: he wanted to speak freely about AI risks without worrying about how it reflected on his employer. He said publicly that he regretted some of his life's work โ that he had contributed to building something he now worried about.
The fact that the people who helped build modern AI are now among its loudest critics is not a contradiction. It reflects something important: understanding a technology deeply enough to build it is exactly what generates the most credible fears about where it's going. These researchers aren't worried about science fiction. They are worried about specific mechanisms they understand firsthand.
You now understand something most people in any general conversation about AI don't: the AI safety field isn't a monolith. It contains Turing Award winners who helped build the technology and now warn about it, nonprofit researchers focused on discrimination and surveillance happening right now, policy advocates pushing for international treaties, and academics who believe the whole catastrophic-risk framing is overblown. Understanding which type of voice is speaking โ and why โ is the first step to evaluating anything they say.
When a headline says "AI safety researchers warn of risks," you can now ask: which kind? Near-term or long-term? Academic or nonprofit? Former industry, or always independent? A warning from Timnit Gebru about algorithmic bias in welfare systems and a warning from Yoshua Bengio about AI treaties are both from people who deeply understand AI โ but they are pointing at very different problems.
A fictional philanthropic foundation has $10 million to allocate to AI safety research. They can split it any way they want between two priorities: (A) near-term harms โ bias in hiring algorithms, surveillance systems, AI used in welfare decisions โ or (B) long-term alignment โ research on making future, more capable AI systems reliably pursue human values.
Your partner will push back on whatever split you propose. You need to defend the reasoning, not just pick a number.
On November 1, 2023, representatives from 28 countries gathered at Bletchley Park โ the site where British codebreakers, including Alan Turing, cracked Nazi communications during World War II. The location was not accidental. The British government, which organized the event, wanted to signal that this moment was historic: the first international summit on AI safety.
At the end of the two-day summit, representatives from all 28 countries โ including the United States, China, and the European Union โ signed what became known as the Bletchley Declaration. It acknowledged that AI "presents enormous global opportunities" but also potentially "serious, even catastrophic, harm" and called for international cooperation on safety. It was the first time China and the US had signed a joint statement on AI.
What the declaration didn't do: set any binding rules, create any enforcement mechanism, or require any company to do anything differently. It was a statement of shared concern, not a shared set of actions. Critics called it a photo opportunity. Supporters called it the beginning of something. Both, in different ways, were right.
While individual countries debated and convened summits, the European Union moved faster than anyone else to actually write law. In March 2024, the European Parliament passed the EU AI Act โ the first comprehensive law governing AI anywhere in the world.
The Act works by assigning AI systems to risk categories. Systems in the highest-risk category โ things that could affect people's fundamental rights, like AI used in criminal sentencing, hiring, or critical infrastructure โ face the strictest requirements: mandatory testing, documentation, human oversight, and the right for people affected by AI decisions to seek explanation and appeal. Systems with lower risks face lighter requirements. Purely recreational AI faces almost no restrictions.
The Act also bans some uses of AI entirely in Europe: real-time biometric surveillance in public spaces (with narrow exceptions), AI systems that exploit psychological vulnerabilities to manipulate behavior, and social scoring systems like the ones used in parts of China.
Who wrote the draft legislation? Partly EU bureaucrats, partly technical experts โ but also, heavily, AI company lobbyists. OpenAI, Google, and other major AI companies spent significant resources trying to weaken parts of the Act. Some provisions were significantly watered down between the first draft and the final vote. This is not a conspiracy โ it's how legislation almost always works. But it means the final law reflects what the industry would accept as much as what safety researchers recommended.
In the United States, there is no comprehensive AI law as of 2024. Instead, President Biden issued an executive order on AI in October 2023, which required companies developing the most powerful AI systems to share safety testing results with the government before public release. It also created new federal standards for AI used by the government itself and directed federal agencies to consider AI risks in their existing regulatory frameworks.
An executive order is not a law โ a future president can revoke it. And in early 2025, many of the Biden AI executive order's provisions were reversed. This illustrates a core problem with the US approach: without Congress passing actual legislation, AI governance depends on who is in the White House, changing with each administration.
Several countries have taken China's approach as a reference point for a different reason: China has implemented real, binding AI regulations โ but primarily aimed at controlling the use of AI for political speech and content that challenges the government, rather than protecting individuals from AI harm. China's regulations are comprehensive and enforced, but their purpose is fundamentally different from the EU's. What looks like "AI safety regulation" from the outside can mean very different things depending on what โ and whom โ the government is trying to protect.
The US, the EU, and China represent three genuinely different theories of what AI governance should do: the US preference for voluntary commitments and industry self-regulation; the EU preference for rights-based legal frameworks that protect individuals; and China's government-control model that regulates AI use primarily in the interest of political stability. None of these is simply "safe" or "unsafe." Each reflects different values about what kind of power is most dangerous.
If the EU AI Act restricts certain high-risk AI applications in Europe but those same applications are freely available in countries without such laws, has the regulation actually made anyone safer? Or has it just moved the risk to places with less protection? This is called regulatory arbitrage โ and it is one of the hardest problems in international AI governance. There is no clean answer.
Here is the full picture you now have: AI safety is being worked on simultaneously by a small number of pioneering researchers who started in the early 2000s; by dedicated teams inside commercial AI labs whose power is structurally limited; by independent nonprofits and academics who can speak more freely precisely because they don't have products to sell; and now by governments, some of which are writing real law and some of which are issuing press releases.
These groups don't agree on what the problem is, don't agree on what the solutions are, and sometimes work actively against each other. The AI lab safety team and the independent nonprofit researcher may both use the phrase "AI safety" and mean entirely different things by it. A government regulation that one researcher calls a landmark achievement, another calls a captured compromise that protects industry more than people.
You are now equipped to navigate that landscape. When you see a headline about AI safety โ a new regulation, a researcher resigning, a company announcing a safety commitment, a government signing a declaration โ you can ask the right questions: Who has structural independence here? Who has financial incentives to minimize the risk? What specific harm are they trying to prevent? Does their proposed solution actually have enforcement power?
Most people reading that same headline will take it at face value. You won't have to.
Every AI safety announcement โ from a lab, a government, a nonprofit โ is made by people with specific interests, specific theories about what is dangerous, and specific amounts of actual power to do anything about it. Knowing the difference between these groups is the core skill for reading the AI safety landscape as it actually is, not as the press releases describe it.
A fictional international body has asked for a draft provision โ one specific rule โ that should be included in a global AI safety agreement. You decide what the rule is, who it applies to, and what the consequences are for breaking it. It should address a real risk from what you've learned in this module.
Your partner is a policy critic who has seen a lot of AI regulations watered down or fail in practice. They will ask: who enforces this? What stops companies from gaming it? Does it actually prevent the harm it claims to?