In November 2022, a company called OpenAI released a chatbot called ChatGPT. Within five days, one million people were using it. Within two months, one hundred million. No technology in history had ever spread that fast β not television, not the internet, not the smartphone. Schools started banning it. Teachers panicked. Some students used it to write essays they didn't write. Others used it to understand things their teachers hadn't explained clearly enough. The same tool, the same week, doing completely opposite things.
Here's what almost nobody talked about in all that noise: this wasn't really about cheating or chatbots. It was about something much older. Every powerful technology β electricity in the 1880s, the printing press in the 1440s, nuclear energy in the 1940s β arrived before anyone had figured out how to control it. People celebrated the power before understanding the cost. And then, sometimes years later, sometimes decades later, they had to go back and fix what they'd broken. With AI, we're living in that first window β the one before the fixing. And unlike electricity, this technology can make decisions.
This course won't give you all the answers, because nobody has all the answers yet. What it will give you is a way of thinking: a set of questions to ask when someone tells you AI is going to solve everything, or destroy everything, or is totally fine, or is the end of the world. After four lessons, you'll know what most adults in the room don't β and that's not an exaggeration. Most of the people making decisions about AI right now learned everything they know from headlines. You're going to do better than that.
In 2017, a teacher in Houston, Texas named Carolyn Blankenship was rated one of the worst teachers in her district. Her score came from a system called EVAAS β the Education Value-Added Assessment System β an algorithm designed to measure how much students improved because of a specific teacher. Blankenship had taught for over a decade. Her students liked her. Her principal liked her. But the score said she was failing, and under Houston's rules, a low score meant dismissal. She was fired.
When she and other teachers sued the school district, they asked a simple question: how does the algorithm actually work? The district's answer was remarkable. They said the formula was proprietary β owned by a private company β and could not be shared, even in court. A judge would eventually rule this was unconstitutional. But here is what stayed true: hundreds of teachers had already been evaluated, hired, fired, or denied raises based on a calculation that nobody in the school district β not the superintendent, not the school board β could fully explain. They had handed a decision that changed people's lives to a machine, and then forgotten to check whether the machine was right.
That's not a story from science fiction. That's a real city, real people, and a real algorithm β one still used in some form in school districts across the United States today.
An algorithm is just a set of instructions. A recipe is an algorithm. "If it's raining, take an umbrella" is an algorithm. The word sounds technical, but the idea is ancient β humans have been writing rules for decisions since before computers existed.
What changed in the past thirty years is scale. A human judge applies a rule to one case at a time, slowly, with the ability to notice when something feels off. An algorithm applies a rule to ten thousand cases in the time it takes you to read this sentence β with no ability to feel anything at all. That speed is powerful. It is also dangerous, because mistakes travel at the same speed as correct answers.
When we talk about AI today, we're usually talking about a special kind of algorithm called a machine learning model. Instead of a human writing every rule, the machine studies millions of examples and figures out its own rules. This is why these systems can do things that seem almost magical β recognizing faces, translating languages, writing passable college essays. But it also means that even the people who built the system often can't tell you exactly why it made a particular decision. The rules are buried inside millions of mathematical calculations nobody can read.
Here's the thing most people get wrong about AI safety: they think it's a technical problem, which means only technical people need to care about it. That's like saying only car engineers need to care about traffic laws. The engineers build the cars. Everyone else decides how they're used, where they're used, and what happens when they crash.
Algorithms are making consequential decisions right now β decisions that affect whether you get a loan, whether you get flagged as a shoplifter walking into a store, whether a social media feed shows you content that makes you feel worthless, or content that helps you understand yourself. These decisions are made by systems built by a fairly small number of engineers, often without the input of the people most affected by them.
In 2018, Amazon scrapped an AI recruiting tool they had been developing for years. The tool was supposed to identify the best job candidates from thousands of rΓ©sumΓ©s. The problem: it had learned from Amazon's historical hiring data, and Amazon had historically hired mostly men. So the algorithm taught itself that being a woman was a negative signal. It penalized rΓ©sumΓ©s that included the word "women's" β as in "women's chess club" β and downgraded graduates of all-women's colleges. Amazon shut it down before deploying it, but the tool had been in development for years before anyone noticed. It took a human to catch what the machine had quietly decided.
Amazon's recruiting AI wasn't programmed to discriminate. It learned to discriminate from real data about how real humans had actually behaved. If the humans were biased, and the AI learned from the humans β who is responsible for the bias the AI produced?
This is the kind of question AI safety asks. Not "how do we build faster computers?" but "how do we make sure the decisions these systems make are fair, explainable, and correctable?" Those questions don't require a computer science degree. They require the ability to think clearly about power, fairness, and accountability β things you can absolutely do right now.
AI safety researchers β the people whose entire job is thinking about what goes wrong β have identified a pattern that shows up again and again in cases like EVAAS and Amazon's recruiting tool. They call it different things, but here's a simple way to see it: there are three gaps that, when they open up, create real danger.
The Transparency Gap. When the people affected by a decision can't understand how it was made, they can't challenge it. Carolyn Blankenship couldn't appeal her firing because the algorithm's formula was hidden. Transparency means being able to look at the reasoning β not just the result.
The Accountability Gap. When something goes wrong with an AI system, it's often unclear who is responsible. Is it the engineer who wrote the code? The manager who deployed it? The company that sold it? The city that used it? Everyone points to someone else, and the person who was harmed is left with nobody to hold accountable.
The Feedback Gap. Human systems learn from their mistakes partly because the people making decisions feel the consequences of getting it wrong. A judge who makes a bad ruling might face public criticism. A teacher who fails a class that clearly didn't understand the material gets parent complaints, a new approach next year. An algorithm doesn't feel anything. It will make the same mistake in case 10,000 that it made in case 1, unless a human stops it.
The next time you hear about an AI system making a decision that affected real people β in hiring, policing, healthcare, education β you can now ask three specific questions: Can the people affected understand why the decision was made? Is there someone actually responsible if it goes wrong? And is there any mechanism to catch and correct mistakes? Most news coverage never gets this far. You just did.
Here is the genuine hard question that sits underneath this entire lesson β and underneath a lot of AI safety research. Consider this: human decision-makers are also biased, inconsistent, and sometimes corrupt. A human judge might give harsher sentences to defendants who remind them of someone they dislike. A human hiring manager might unconsciously prefer candidates who went to the same school they did. A human teacher might grade essays more harshly on a Monday morning when they're tired.
Studies have repeatedly found that AI systems β even flawed ones β are sometimes more consistent than humans making the same decisions. If you replace a biased human judge with a flawed-but-less-biased algorithm, have you made things better or worse?
There is no clean answer. A system that's statistically more fair can still be completely unaccountable β and unaccountability is its own kind of harm. A human who is inconsistent can also be persuaded, appealed to, moved by a story, or corrected in the moment. An algorithm cannot. We are trading one set of problems for a different set of problems, and we haven't fully agreed on which problems are worse.
That tension doesn't go away. It'll come up in every lesson of this course. Sit with it. The people who are most certain they've resolved it are usually the people who've thought about it the least.
A school district in your city just announced it will use an AI system to automatically grade student essays. The company that built it says it's faster, more consistent, and removes teacher bias. The school board approved it unanimously without a public vote. You have three questions to answer before your article publishes.
On April 10, 2010, a Polish Air Force Tu-154 aircraft approached Smolensk North Airport in Russia carrying Polish President Lech KaczyΕski and 95 other passengers and crew. The weather was terrible β dense fog, near-zero visibility. Air traffic controllers on the ground repeatedly told the crew to divert to another airport. The crew acknowledged the warnings. They also had onboard navigation systems giving them data about altitude and descent rate. And yet the plane continued its approach, descending too quickly in conditions it couldn't navigate. It clipped treetops and crashed. All 96 people aboard died.
Investigators later found something that has haunted aviation safety researchers ever since: the crew had become so accustomed to trusting their automated systems that when those systems gave ambiguous readings β and when the air traffic controllers gave clear human warnings β they weighted the machine's data more than the human's voice. This isn't exactly an AI story; the navigation systems weren't AI. But it illustrates a pattern that AI researchers call automation bias: the tendency for humans to over-rely on automated systems, even when better information is available from non-automated sources.
In the years since, automation bias has been documented not just in aviation but in medicine, law enforcement, financial trading, and content moderation. The pattern is consistent: once a machine is in the loop, humans tend to defer to it even when they shouldn't. This isn't stupidity. It's a well-documented feature of human psychology β and it's one of the central challenges of deploying AI in high-stakes environments.
In 2015, a study published in the journal Computers in Human Behavior gave participants a task: sort images into categories, with the help of an AI assistant that would recommend a category for each image. Sometimes the AI was right. Sometimes it was obviously wrong. The researchers found that when the AI was wrong, participants still followed its recommendation about 74% of the time β even when the correct answer was easy to see with their own eyes.
This is what automation bias looks like in practice: not someone blindly following a machine off a cliff, but someone who has a perfectly good instinct, looks at the machine's answer, and quietly decides the machine probably knows better. It happens to smart people. It happens to trained experts. It happens more often when people are tired, when they're under time pressure, or when they trust the general reputation of the system β even if this specific output is wrong.
For AI safety, this creates a specific problem: we build AI systems to assist human decision-making, but the presence of the AI can actually reduce the quality of human decision-making. The human becomes a rubber stamp on the machine's output rather than an independent check on it.
In 2019, a research team at Google Health published a paper in Nature describing an AI system that could detect breast cancer in mammograms more accurately than radiologists β reducing false negatives by 9.4% in U.S. studies. The result was celebrated widely. Headlines declared that AI would "beat doctors" at reading scans. The research was real and significant.
What received far less coverage: a follow-up analysis found that when radiologists worked alongside the AI β the "human-in-the-loop" condition that sounds like the responsible approach β some of their diagnostic accuracy actually decreased. The presence of the AI's recommendation changed how they looked at the image. Instead of starting with a fresh examination, they were now (consciously or not) confirming or questioning what the machine had already told them. Their thinking was anchored to the AI's output.
This is one of the most consequential unsolved problems in AI deployment. The intuitive solution to AI error β "just put a human in the loop" β doesn't always work if the human-AI combination produces worse outcomes than either would alone. And yet removing the human from the loop entirely creates the accountability problem from Lesson 1. Both options have serious costs.
If an AI system is provably more accurate at detecting cancer than a radiologist working alone, but combining AI with a radiologist produces worse outcomes than the AI alone β is it ethical to require human approval of every AI diagnosis? You're not required to have an answer. You are required to take the question seriously.
In 2010, on May 6th, the U.S. stock market experienced what became known as the Flash Crash. In about 36 minutes, major U.S. stock indices dropped nearly 10% β erasing almost a trillion dollars in market value β before recovering almost as fast. The cause: automated trading algorithms, each reacting to the other's behavior, created a self-amplifying cascade that no single human had triggered and no single human could stop. The market eventually corrected itself, but for those 36 minutes, it operated in a state that no human had designed and no human was in control of.
Since then, regulators have introduced "circuit breakers" β rules that automatically pause trading if prices move too fast. This is a human-designed constraint on AI behavior: a hard limit that says, "we don't care what the algorithm wants to do, it stops at this threshold." It's imperfect. It doesn't catch every problem. But it's an example of a principle that AI safety researchers now argue is essential: corrigibility.
The question of who holds the off switch β and whether they can actually use it β becomes more urgent as AI systems are deployed in faster, more interconnected environments. A human can decide not to push the button. The question AI safety researchers are now asking is: can the AI system decide not to let you push it?
Every time you see a headline about AI being "more accurate than doctors" or "better than human experts," you now know to ask a second question: What happens to human judgment when the AI is in the room? The technology's accuracy in a controlled test and its effect on real-world decision-making can be completely different things.
Valley General Hospital is about to deploy an AI system that reads X-rays to flag potential fractures. It's 94% accurate β better than any radiologist on staff. The hospital's CTO wants to use it to pre-screen all X-rays before radiologists review them. You've been asked for a recommendation. Your AI advisor has seen the implementation plan.
In 2017, a former Facebook data scientist named Frances Haugen was given access to internal research documents that Facebook had kept private. She wouldn't go public until 2021, but what she found dated back to at least 2016: Facebook's own internal research showed that its algorithm β the system that decided which posts, videos, and articles appeared in users' feeds β was specifically optimizing for engagement. Engagement meant likes, comments, shares, and time spent on the platform. And Facebook's researchers had found that content causing anger, outrage, and fear generated significantly more engagement than content causing happiness or calm. The algorithm hadn't been told to make people angry. It had simply learned, through millions of data points, that anger kept people scrolling.
What Haugen's documents revealed β and what made this a landmark moment in AI history β was that Facebook's engineers had identified this problem as early as 2018, and proposed a fix. A team built a system that would reduce "problematic content" in feeds. Senior leadership rejected it. The reason given in internal documents: the fix would reduce engagement metrics, which would reduce advertising revenue. A deliberate choice was made to keep an algorithm that the company's own researchers believed was causing social harm β because changing it would cost money.
This is a different kind of AI problem than the ones in Lessons 1 and 2. The algorithm wasn't secretly broken. It was doing exactly what it was designed to do. The values encoded in it β maximize engagement above all else β were a choice. And that choice had consequences felt by hundreds of millions of people who never agreed to it.
Every AI system makes tradeoffs. When you decide what to optimize for β what to measure, what counts as "good," what counts as "bad" β you are making a values choice. Sometimes this is obvious: a hiring algorithm that prioritizes candidates with Ivy League degrees encodes a value that Ivy League education predicts job performance. Sometimes it's hidden: a search algorithm that shows you results matching your past behavior encodes a value that past preferences should determine future information β a choice that limits what you're exposed to.
The technical term for what you measure and optimize is the objective function. Facebook's objective function was engagement. Amazon's early recruiting tool's objective function was similarity to past successful hires. EVAAS's objective function was student score improvement attributable to a single teacher. In each case, the objective function captured something real β but it missed things that turned out to matter enormously.
AI researchers call this the alignment problem. It applies to everything from social media feeds to self-driving cars to AI assistants: the difference between what we tell an AI to optimize and what we actually want is often a gap where serious harm can enter.
In 2023, a study by the AI Now Institute found that the five largest AI research laboratories in the world β OpenAI, Google DeepMind, Anthropic, Meta AI, and Microsoft Research β were all headquartered in the United States, primarily in the San Francisco Bay Area. The people who build the most widely deployed AI systems in the world represent a narrow demographic slice: predominantly American, predominantly male, predominantly from affluent technical backgrounds.
This matters because when you build something, your assumptions about the world β what's normal, what's harmful, what's funny, what's offensive, what a "good outcome" looks like β get embedded in it. In 2015, Google Photos' image recognition system labeled photos of Black users with the word "gorillas." This wasn't malicious. It was the product of training data that underrepresented people of color, built by a team that didn't catch the problem before deployment. Google's fix was to remove the category "gorilla" from the app's labels entirely β a solution that held as recently as 2023.
A 2019 paper published in Science found that a healthcare algorithm used by major U.S. hospital systems was systematically giving lower priority scores to Black patients than to equally sick white patients. The algorithm hadn't used race as a variable. It had used historical healthcare spending as a proxy for medical need β and because Black patients had historically received less healthcare spending (due to systemic inequities in the healthcare system), the algorithm interpreted this as lower medical need. A proxy for need that reflected past discrimination was being used to allocate future care.
If an AI system is built by people who genuinely did not intend to cause harm, but the system causes harm anyway because of whose data it was trained on and whose experiences were missing from its development β who is responsible for fixing it? And more importantly: who gets to decide what "fixed" looks like?
In 2021, the European Union proposed the Artificial Intelligence Act β the first major attempt by a government to regulate AI systems based on the level of risk they pose. The Act categorizes AI applications into risk tiers. Systems used in critical infrastructure, hiring, healthcare, or law enforcement are classified as "high risk" and must meet strict transparency, testing, and human-oversight requirements before deployment. Systems that pose unacceptable risk β like real-time public biometric surveillance or AI that manipulates people using psychological techniques β are prohibited entirely.
The AI Act became law in 2024. It's the most significant legal framework for AI in the world and serves as a template other governments are now studying. It's imperfect β experts debate whether the risk categories are drawn correctly, whether enforcement is feasible, and whether it will slow beneficial AI development. But it represents a concrete answer to a concrete question: who makes the rules about whose values get encoded in AI systems that affect everyone?
The answer the EU chose: elected governments, not private companies. That's a values choice too β and not everyone agrees with it. Some argue that government regulation stifles innovation. Others argue that leaving the rules to companies is like asking banks to write their own banking regulations. Both arguments have merit. The tension between them is the central political debate about AI happening in legislatures and boardrooms right now.
You can now read debates about AI regulation β which will dominate policy news for the rest of your life β and understand what's actually at stake. Not just "is AI dangerous?" but: Who defines what harm means? Who verifies that AI systems meet those standards? And what mechanisms exist to correct course when they don't? These are questions of democratic governance, not just technology.
A major social media platform is about to deploy an AI content moderation system in 50 countries. It flags and removes posts that violate community standards. The training data came primarily from English-language content moderated by teams in California. The platform wants your board's sign-off before global rollout. Your AI colleague has reviewed the technical documentation.
In March 2023, an open letter was published and signed by over 1,000 AI researchers, engineers, and technology executives β including Elon Musk, Steve Wozniak, and Stuart Russell, one of the world's leading AI scientists. The letter called for a six-month pause in the training of AI systems more powerful than GPT-4, the system OpenAI had just released. The letter said, plainly: "AI systems with human-competitive intelligence can pose profound risks to society and humanity... AI labs are locked in an out-of-control race to develop and deploy ever more powerful digital minds that no one β not even their creators β fully understand."
The pause never happened. Not for six months, not for six weeks. Development continued. The same companies whose employees signed the letter kept building. New systems were released. OpenAI, Anthropic, Google, and Meta all shipped increasingly capable models throughout 2023 and 2024. Some of the people who signed the letter went on to build or fund the very systems they'd warned about. Sam Altman, the CEO of OpenAI whose company had just released GPT-4, did not sign the letter β but had said publicly, in multiple interviews, that he believed he was building one of the most potentially dangerous technologies in human history and was doing it anyway.
This is not hypocrisy, exactly. Or at least, it's not just hypocrisy. It's a real strategic argument: if powerful AI is coming regardless, it's better to have safety-focused labs at the frontier than to cede that ground to developers less focused on safety. You can agree or disagree with that argument β but you need to know it exists, because it shapes almost every major decision in AI development right now.
There's a concept in game theory called a prisoner's dilemma β and it describes the AI development situation almost perfectly. Imagine two AI companies. Both would be better off if they both slowed down and focused on safety. But if only one slows down while the other keeps building, the one that slowed down loses market share, loses talent, loses investment, and eventually loses relevance. The rational move for each company, individually, is to keep building β even if both companies collectively would prefer a world where everyone slowed down.
This is why voluntary commitments to safety are considered insufficient by most policy researchers. A company that genuinely wants to be responsible still faces enormous competitive pressure to keep pace. The solution most frequently proposed: external rules that apply to everyone, so that slowing down for safety doesn't mean falling behind. Which brings us back to the question of governance.
AI safety isn't abstract. There are specific, concrete practices that safety-focused organizations use β and specific, concrete ways those practices get cut when they're expensive or inconvenient. Here are three that are currently being debated inside every major AI lab:
Red-teaming. Before deploying an AI system, safety teams deliberately try to break it β asking it harmful questions, looking for ways to make it produce dangerous outputs, probing for failure modes. In 2022, Anthropic published research describing how they had red-teamed their own Claude model for hundreds of hours before release. The practice is now standard at major labs, but it's expensive and time-consuming, and the pressure to ship faster constantly competes with the time needed to red-team thoroughly.
Staged rollouts. Rather than releasing a new AI system to the entire world at once, staged rollouts deploy first to a small group, watch for problems, and scale up only when the system behaves as expected. This is how responsible software engineers approach high-stakes deployments. It is not how every AI system has been deployed β the pressure for large launch-day numbers often wins.
Independent auditing. Third-party organizations β not the company that built the AI β review the system for bias, safety failures, and unexpected behaviors before or after deployment. This is analogous to financial auditing: companies don't grade their own earnings reports. AI auditing is still nascent; there's no standard framework, no universal requirement, and no agreed-upon credentials for who is qualified to do it. This is one of the most active areas of AI policy work happening right now.
AI governance decisions are being made right now β in the EU Parliament, in the U.S. Congress, in standard-setting bodies like NIST (National Institute of Standards and Technology), and in private meetings between AI company CEOs and government officials. The outcomes of those decisions will shape what AI systems are built, how they're tested, and who is allowed to challenge them when they fail. These are not inevitable outcomes. They are choices being made by specific people, and they can be influenced by an informed public.
Here is an honest answer to the question most people have after a course like this: what is a 12-year-old supposed to do about AI safety? The answer is not "nothing" β but it's also not "go write your representative a letter" (though that's not a terrible idea). The answer is more immediate and more practical.
First: notice. You now have vocabulary and frameworks that most adults don't have. When AI appears in news coverage β and it appears constantly β you can ask the right questions. Is this system transparent? Who is accountable when it fails? Does it apply equally to all groups? Who decided what it was optimizing for? Does any independent body review it? These questions are not rhetorical. They are the scaffolding of responsible AI deployment, and asking them is itself a form of participation in how AI governance develops.
Second: resist simplification. AI debates get flattened into "AI good" vs. "AI bad." You now know that's useless. The real questions are specific: which system, doing what, with what safeguards, governed by whom, correctable how? The people most likely to shape AI governance well are the ones who resist the urge to pick a team and instead insist on the specific, complicated, uncomfortable questions.
Third: take the long view. Every technology that humans have struggled to control β nuclear weapons, social media, financial derivatives β eventually got some form of governance, some set of rules, some set of institutions. The governance was always imperfect, often late, and frequently contested. But it happened. AI will be no different. The question is whether the governance that emerges is shaped by people who actually understand the problem β or just by whoever was loudest or richest in the early years. You now understand the problem. That is a real advantage.
You started with a teacher who got fired by an algorithm she couldn't question. You now understand why that happened β and can name the specific gaps (transparency, accountability, feedback) that allowed it. You understand why having a human in the loop isn't automatically the answer. You understand that AI systems encode values, and those values reflect whoever built them. And you understand that keeping AI under control requires institutions, governance, and practices β not just good intentions from engineers. Most people who work in AI policy came to these ideas after years of technical or legal training. You got there in four lessons. That's a starting point, not an endpoint β but it's a real one.
Your city's council is voting next week on a proposal to allow AI systems across three domains simultaneously: automated essay grading in public schools, AI-assisted hiring for city jobs, and predictive policing software. The AI company says all three systems have passed internal testing. No independent audit has been done. You have three minutes to give your testimony. Your debate partner represents the company and believes all three should be approved.