In October 2021, a 14-year-old boy in the UK named Molly Russell's father found out that Instagram and Pinterest had been recommending his daughter — over and over, algorithmically, without any human deciding to do so — content about self-harm and depression, right up until she died in 2017. The inquest concluded in 2022 that the platforms' recommendation algorithms had contributed to her death. No one programmed those systems to hurt anyone. The people who built them almost certainly never imagined this outcome. The algorithm was just doing what it was optimized to do: keep a user engaged.
That story isn't ancient history. It happened because of a specific set of choices — choices about what an AI system should optimize for, who should oversee it, and what guardrails should exist. Those choices were made by engineers and executives, many of them well-meaning, and the consequences landed on a family that had nothing to do with any of it. That gap — between who makes the decisions about AI and who lives with the results — is exactly what this course is about.
By the end of this module, you'll be able to read any headline about AI and immediately ask the right questions: What was this system optimizing for? Who decided that? Who didn't have a say? Those aren't just abstract questions. They're the difference between AI that serves everyone and AI that quietly works against most of us. You won't leave here with all the answers. But you'll have the framework — and most people making these decisions professionally don't even have that yet.
If you finish every module, here's who you become:
Starting around 2014, Amazon built an AI hiring tool designed to save time by automatically screening job applicants. The company fed it ten years of résumés from people who had been hired — and the system learned from those. By 2018, Amazon's own engineers discovered that the tool was consistently downgrading résumés that included the word "women's" — as in "women's chess club" or "women's college." It was also penalizing graduates of all-women's colleges. Amazon shut the project down. The system had done exactly what it was designed to do: learn from historical hiring data. But that data reflected a decade of a male-dominated tech industry. The AI learned the bias and amplified it. Nobody programmed it to discriminate. It figured that out on its own.
That case is now one of the most studied examples in AI safety. And the uncomfortable question it raises is: what were they actually trying to build? A "good" hiring tool, obviously. But what does "good" mean when you train a system on data that reflects an unfair world?
Most people hear "AI safety" and picture science-fiction: a robot uprising, a computer that decides to destroy humanity. That's not what this course is about — or at least, it's not the part that matters for the next ten years of your life.
AI safety is the field of making sure AI systems do what we actually want them to do, without causing harm in the process. That sounds obvious. It turns out to be incredibly hard.
The Amazon hiring tool wasn't unsafe because it was powerful. It was unsafe because the people who built it forgot to ask a crucial question: "If we train this on historical data, what values will it absorb?" They defined "good" as "similar to past successful hires" — and that definition quietly contained a decade of inequality.
Here's something worth holding onto: most AI safety problems are not technical problems at their core. They're value problems. The engineers knew how to build the system. They didn't know — or didn't ask — what the system should actually care about.
Every AI system is built around a goal — a thing it's trying to maximize or minimize. In machine learning, this is called the objective function (or sometimes the "reward"). The system gets better and better at achieving that goal. The problem: the goal you write down is almost never exactly what you wanted.
Think about YouTube's recommendation algorithm. Before 2019, it was optimized to maximize watch time — keep users watching as long as possible. The system got extraordinarily good at this. It also started routing huge numbers of people toward increasingly extreme content, because extreme content kept people watching. Zeynep Tufekci, a sociologist at the University of North Carolina, wrote about this in 2018, describing how watching moderate political videos would lead the algorithm to progressively more radical content. YouTube modified the algorithm in 2019. But for years, the system did exactly what it was told — maximize watch time — and the side effects spread across political discourse worldwide.
You can't just tell an AI "be helpful." You have to define helpful so precisely that the system can't find a shortcut that technically satisfies your definition while violating your intent. This is called the alignment problem — aligning what the AI optimizes for with what humans actually value.
This matters to you right now because every platform you use — every feed, every recommendation, every autocomplete — is an optimization system. Once you understand this, you start noticing whose interests are baked into those objectives. Usually it's the company's revenue. Not yours.
Here's the part that most people miss when they hear "AI safety": it's not a problem for engineers to solve in private and then present to the rest of us as a solved product. The decisions that shape AI systems — what they optimize for, whose interests they serve, what harms are acceptable — are fundamentally political and ethical decisions. They belong to everyone.
But right now, they're mostly being made by a small number of people at a small number of companies, most of them in a few ZIP codes in California and Washington state. The rest of the world is downstream of those decisions.
In 2016, the investigative news organization ProPublica published an analysis of a software tool called COMPAS, used by judges in US courts to predict whether a defendant would commit another crime. The tool assigned a "risk score" that influenced bail and sentencing decisions. ProPublica found that the algorithm was nearly twice as likely to falsely flag Black defendants as high risk compared to white defendants. The company that built COMPAS — Northpointe — disputed the analysis and defended its methodology. The disagreement revealed something important: there is no single mathematical definition of "fair." Different definitions of fairness are mathematically incompatible. Someone had to choose which one COMPAS used, and they chose without the affected communities having any say.
When someone says an AI system is "objective" or "unbiased," you now know to ask: objective according to what definition? Unbiased by whose measurement? The choice of objective is a value judgment, and value judgments are never neutral. Knowing this changes how you read every headline about AI.
This doesn't mean AI is always bad, or that the people building these systems are villains. Most of them are trying to build useful things. But "trying to be useful" and "actually understanding the consequences of your choices" are two different things. The gap between them is where AI safety work happens.
COMPAS made predictions about real people. Some of those predictions were wrong. In some cases, someone spent more time in jail because an algorithm flagged them as high risk when they weren't. In other cases, someone was released who went on to commit another crime.
Here's the uncomfortable truth: human judges make these kinds of errors too — and research consistently shows that human judges also have racial biases in their sentencing. So the question isn't "should we use the algorithm or the human?" The question is: which type of error are we more willing to accept, and from whom?
If an AI system and a human judge make the same type of error at the same rate, but the AI's error is documented and traceable and the human's error is invisible — is the AI safer? Or does transparency make the harm feel more deliberate? Who should be accountable when an algorithm is wrong?
Sit with that. It's not a trick question with a hidden right answer. The people who study AI safety professionally disagree about it. The reason it matters to you — right now, at your age — is that these systems are being deployed in schools, courts, hospitals, and hiring processes. By the time you're an adult, decisions shaped by AI will be woven into nearly every institution you interact with. The people deciding how those systems work need to hear from more than just engineers and executives.
That's why AI safety is everyone's problem. Not just because you'll be affected by it. Because the conversation about how to do it right is happening now, and most seats at that table are empty.
A school district has deployed an AI tool that flags students as "at risk of dropping out" based on attendance records, grade trends, and disciplinary history. The district says it's using the tool to help students get support earlier. But several parents and teachers are raising concerns.
Your job is to investigate. Talk to REMI — the AI assistant below — who has analyzed the system's documentation. Challenge REMI's reasoning, push back on assumptions, and work out what the real risks are.
In January 2012, researchers working with Facebook secretly altered the News Feeds of 689,003 users without their knowledge. For one week, some users saw more positive posts than usual; others saw more negative ones. The goal was to find out whether emotional content was contagious — whether seeing sad posts made you post sad things. The study was published in the journal PNAS in June 2014. When the public found out, the reaction was immediate and furious. People were horrified that a company had deliberately manipulated their emotional environment as an experiment, with no consent, no warning, and no way to opt out. Facebook's response, essentially, was: you agreed to our terms of service. Adam Kramer, the researcher who led the study, later wrote on Facebook that he was "deeply sorry" for the distress it caused. The lead academic researcher, Jeffrey Hancock of Cornell University, defended the work as important and ethical. Neither apology nor defense resolved the core question: who gave Facebook the right to decide what emotional content 700,000 people would see?
Every AI system learns from data. The data is collected by someone, curated by someone, and labeled by someone. Each of those steps involves choices — and choices reflect values, priorities, and blind spots.
Take image recognition. In 2015, Google Photos launched a feature that automatically organized photos by what was in them — faces, places, objects. A Black software developer named Jacky Alcine discovered that the system had labeled photos of him and his friend as "gorillas." Google apologized and removed the label. But the fix, as reported by Wired in 2018, wasn't to train the system better on Black faces — it was to block the gorilla category entirely. The system still struggled with dark-skinned faces. The problem wasn't solved; it was hidden.
The reason this matters isn't just one bad label on one photo app. Image recognition is used in facial recognition software deployed by law enforcement. In 2020, Robert Williams, a Black man in Detroit, was wrongfully arrested after a facial recognition system misidentified him from a grainy surveillance video. He was handcuffed in front of his children. The system had been trained on a dataset that significantly underrepresented dark-skinned faces — so it performed worse on them. Nobody who wrote that training dataset decided to make a system that would wrongfully arrest people. But their choices got Robert Williams handcuffed on his front lawn.
When a doctor makes a judgment call and it turns out to be wrong, there's usually a trail of reasoning. You can examine it. You can ask the doctor to explain. When an AI system makes a wrong call, there's often no trail at all — just an output. This is called the black box problem.
In 2019, a study published in Science magazine analyzed a healthcare algorithm used across the US to decide which patients needed extra medical care. Researchers Ziad Obermeyer and colleagues found the algorithm was significantly less likely to flag Black patients as needing care than equally sick white patients. The company that made it — Optum — had not intentionally built a racist system. The algorithm used healthcare costs as a proxy for healthcare needs. But because of systemic inequality in the US healthcare system, Black patients with the same level of illness historically generated lower healthcare costs (in part because they had less access to care). The algorithm interpreted "lower costs" as "healthier." The effect: sicker Black patients were being passed over for care they needed.
The researchers estimated the algorithm affected approximately 200 million people annually. Nobody who designed it had made a deliberate choice to disadvantage Black patients. They'd made a data choice that seemed reasonable in isolation and turned out to have enormous consequences.
Every time someone says "the algorithm is neutral," you can ask: neutral to what? Every dataset embeds the world as it was, not the world as it should be. Every proxy measure (like "cost" for "health need") reflects an assumption. And most of the people affected by these systems never got to review those assumptions.
Back to Facebook's 2012 experiment. Here's what makes it particularly instructive for thinking about AI safety: the researchers didn't think they were doing anything wrong. They were studying real behavior in a real environment. They thought "terms of service" constituted consent. And in a narrow legal sense, maybe it did.
But there's a difference between legal consent and meaningful consent. When you click "I agree" on a 40-page terms-of-service document you haven't read (and that no one expects you to read), you haven't made a real choice. You've performed a ritual that provides legal cover for the company. The company knows this. You implicitly know this. And yet the ritual continues because it's convenient for everyone with power and inconvenient for everyone without it.
This pattern repeats throughout AI development. Data used to train large AI systems is often scraped from the internet — from photos people posted, text people wrote, art people made — without asking whether those people consented to train a commercial AI product. In 2023, artists including Sarah Andersen, Kelly McKernan, and Karla Ortiz filed a lawsuit against image-generation AI companies, arguing that their artwork had been used to train systems without consent or compensation. The legal questions are unresolved. The ethical question — whether you should be able to train a for-profit system on someone's creative work without asking — is one where reasonable people disagree intensely.
If an AI company trains on publicly available data — photos, writing, art that people chose to put on the internet — does that make it ethically acceptable? People made things public for one purpose; those things are now being used for a different purpose entirely. Is "it was public" the same as "it was available for any use"? And if not, how do you draw the line?
These questions are being argued in courts and legislatures right now. By the time you're working in any industry — not just tech — they will have shaped what AI systems exist, what they can do, and who owns the benefit of that. You don't have to wait until you're an adult to have an opinion about them. The decisions are being made now.
A health insurance company has built an AI to predict which policyholders will need expensive medical care in the next year, so they can offer "wellness programs" proactively. The company says the system uses only objective medical data: number of prior doctor visits, number of prior prescriptions, and prior emergency room visits.
REMI has access to background data on this system. Start by identifying which of those three inputs you think might be a problematic proxy — and why. REMI will test your reasoning and push you toward what the evidence actually shows.
In 2013, the Chicago Police Department began using an algorithm called the Strategic Subject List — also called the "heat list" — to predict which individuals were most likely to be involved in gun violence, either as perpetrators or victims. The list was meant to be used for "outreach." But what actually happened, as documented by reporting from the Chicago Tribune and later by academic researchers, was that people on the list were often subjected to increased police surveillance and stops. Being on the list increased your chances of being stopped by police. Being stopped by police increased the likelihood of an arrest — even for minor infractions. Arrests generated more data, which fed back into the algorithm and pushed people higher on the list. Some people on the list had been placed there partly because of prior police contact — contact that was itself a product of being over-policed. The algorithm was, in part, predicting the consequences of its own existence.
Chicago discontinued the Strategic Subject List in 2019. But the pattern it demonstrated — an AI system that shapes the behavior of people and institutions, which then generates the data the AI uses to make future predictions — is everywhere. It has a name: a feedback loop.
A feedback loop happens when an AI system's outputs influence the real world, and that changed world generates new data, which the AI uses to update its predictions. In a closed loop, the AI can end up amplifying whatever pattern it started with — whether that pattern was accurate or not.
Here's a simple version: imagine an AI that predicts which students will need tutoring. It flags certain students. Those students get tutoring. Their grades improve. The AI notes that its predictions were accurate — those students did struggle initially. But what about students who weren't flagged? They didn't get tutoring. Their grades stayed flat. The AI interprets this as confirmation that they didn't need help. The result: the AI learns to route resources to whoever it already thought needed resources, and systematically overlooks everyone else.
Now scale that up to criminal justice, hiring, lending, or healthcare. The same dynamic applies. And in each case, the people on the wrong end of the feedback loop often have no way to know the loop exists, let alone opt out of it.
Researcher Rashida Richardson and colleagues published a 2019 study documenting what they called "dirty data" in predictive policing systems. They found that cities including New Orleans, Los Angeles, and Chicago used historical crime data to train their AI tools — but that data had been generated by biased policing practices to begin with. The AI then directed more policing resources to the same areas, generating more arrests, which validated the original predictions. The loop was self-sustaining.
Feedback loops aren't only about policing. Every major recommendation system runs on them.
When you watch a YouTube video, the algorithm notes your engagement. When it recommends something similar and you watch that too, the system strengthens the connection between those categories and your profile. If you watch one video skeptical of climate science — not because you believe it, maybe just because the thumbnail was interesting — the algorithm may begin recommending more. Not because it wants to radicalize you. Because engagement creates a positive feedback signal, and clusters of related content tend to generate more engagement than random recommendations.
Researcher Manoel Horta Ribeiro and colleagues at EPFL published a 2020 study in Proceedings of the Web Conference tracking what they called "the alt-right pipeline" — a pattern in which users who engaged with mainstream conservative content on YouTube were, over time, increasingly recommended content from more extreme channels. The paper found significant migration between channels of escalating radicalism, consistent with the algorithmic recommendation structure.
YouTube disputes the characterization and has modified its recommendation algorithm repeatedly since 2019. But the underlying dynamic — a recommendation system that learns from engagement and therefore tends to push people toward more extreme versions of whatever they've already engaged with — is a structural feature, not a one-time bug. You can patch the specific channels. The incentive structure that produces the pattern remains.
This is where AI safety becomes a policy question, not just an engineering one. What rules should govern the feedback loops in widely-used recommendation systems? Should companies be required to disclose what their algorithms optimize for? Should users have a right to see why they're being recommended something? These decisions are being made at the EU, US Congress, and UN level right now — and they're shaping the information environment you live in.
Feedback loops are worst when the AI is measuring something that its own behavior changes. In economics, there's a concept called Goodhart's Law: "When a measure becomes a target, it ceases to be a good measure." The moment you start optimizing for something, you change the behavior of the thing you're measuring.
In 2019, the UK government used an algorithm to predict students' A-level exam grades after COVID-19 cancelled in-person exams. The algorithm used school-level historical performance to moderate individual teacher-predicted grades. Students at historically lower-performing schools — disproportionately working-class and minority students — had their grades downgraded. Students at private schools had their grades upgraded. The outcry was immediate. Boris Johnson's government abandoned the algorithm within days and reverted to teacher-assessed grades. But thousands of students had already been rejected from university places on the basis of the algorithmic grades.
The algorithm had been measuring "what grades this school's students historically get" and using that to predict "what grade this specific student deserves." Those are not the same thing. Every individual student was being assessed not on their own work but on the aggregate history of their institution. The measurement wasn't measuring what it claimed to measure.
If human teachers also have biases — and research shows they do, rating students differently based on race and socioeconomic status — is an algorithm that's at least consistent more fair? Even if its consistency encodes historical disadvantage? Is predictable bias better or worse than unpredictable bias? And who has the right to make that call on behalf of students?
You now know how to look at any AI system and ask: does this system's output change the data it uses to make future predictions? If yes, what pattern is it reinforcing? And who benefits from that reinforcement — and who doesn't? Most people using these systems never think to ask those questions. You do now.
A large urban school district uses an AI to allocate tutoring resources. The system monitors students' grades weekly and flags those who are falling behind for extra support. Teachers see the flags and spend more time with flagged students. End-of-year scores improve for flagged students. The district reports the system is a success.
But a researcher has noticed something: students who were not flagged in September — perhaps because their early grades were average, not failing — received no additional support. By June, the gap between flagged and unflagged students had grown, and the AI interpreted this as confirmation that its original risk assessments were accurate.
Work with REMI to analyze this loop: where does it start, what does it reinforce, and what would you actually change to break it?
Between 2013 and 2019, the Dutch tax authority ran an automated fraud detection system called SyRI (System Risk Indication) that flagged citizens for potential welfare fraud. The system analyzed data from multiple government sources — tax records, utility bills, residency registration — and assigned risk scores to individuals. People in low-income neighborhoods, many of them immigrants or from ethnic minority backgrounds, were disproportionately flagged. If you were flagged, government benefits could be halted and you could be subjected to intensive investigation — before any fraud was proven. Thousands of families lost child benefit payments they were legally entitled to. Some lost their homes. By the time the scandal fully broke in 2020, Prime Minister Mark Rutte's government had collapsed — the first Dutch government to fall since World War II due to a domestic policy scandal. A court had ruled SyRI illegal in 2020, finding it violated human rights. But the damage to tens of thousands of families had been accumulating for seven years. No one had been watching.
The Netherlands scandal illustrates something AI safety researchers call the human oversight problem: as AI systems make more decisions faster, the ability of humans to monitor and correct those decisions can break down. Not because anyone decided to stop watching — but because the volume and speed of automated decisions outstrips human capacity to review them.
SyRI ran for six years before a court stopped it. During that time, thousands of families were harmed. The people who were flagged had no right to see their risk score, no right to know which data points generated it, and no effective way to challenge it. The system was a black box used by the government against its own citizens, with no meaningful appeal process.
This is what happens when oversight fails at the institutional level. And it's not hypothetical — it happened in a wealthy, democratic country with a functioning legal system. The legal system eventually caught up. By then, the government had fallen and the harm was done.
A human bureaucrat making a mistake about your benefits affects one family. An automated system making the same mistake systematically affects thousands of families simultaneously, without any single error being visible enough to trigger review. Scale transforms individual mistakes into systemic harm.
AI safety researchers often talk about three properties that AI systems used for consequential decisions need to have. Each one matters, and they're related but distinct:
In 2018, the European Union's General Data Protection Regulation (GDPR) established a legal right to explanation for automated decisions — in EU law, if a company makes a significant decision about you using an algorithm, you have a right to ask for a human explanation. This is one of the first major regulatory frameworks to try to enforce explainability. Whether it actually works in practice is still being tested. But it establishes the principle: consequential automated decisions need to be explainable, or they shouldn't be made.
In the US, no equivalent national standard exists, though the Consumer Financial Protection Bureau has published guidance requiring that people denied credit must receive a specific reason — meaning lenders using AI for credit decisions have to be able to produce an explanation. This doesn't solve the black box problem for most AI applications. But it shows that explainability is achievable when regulators require it.
Here's the piece of AI safety that gets the least attention in mainstream coverage: the question of who gets to participate in deciding how AI systems are built in the first place.
In 2021, a document leaked from inside Google showed that the company had fired AI ethics researcher Timnit Gebru — one of the most prominent researchers on AI bias — after she submitted a research paper critical of large language models (the technology underlying systems like ChatGPT). The incident sparked enormous controversy in the AI community. Gebru had been one of few Black women in a senior AI research role at a major tech company. Her research focused on the ways AI systems could encode and amplify harm for marginalized communities — which happened to be the communities most often missing from the rooms where AI systems were being designed.
This isn't just about one person. It reflects a structural issue: the people who have the most to lose from AI systems that embed bias or fail to account for marginalization are also the least represented in the rooms where those systems are built. This isn't a coincidence. It's a self-reinforcing pattern with real consequences for what gets built, what gets questioned, and what gets ignored.
You've covered the full picture now: AI systems fail not just because of technical errors, but because of choices about objectives, data, measurement, feedback, oversight, and voice. Every one of those failure modes is a place where more people — including people your age — should be asking questions. The field of AI safety is still being defined. The norms around transparency and accountability are being established right now. You are not too young to have informed opinions about this. You're exactly the right age to be developing them.
If you build an AI system that causes harm to a group of people, and you didn't intend that harm, and you were following standard industry practices — what is your responsibility? Is "I didn't know" a valid defense when the tools to anticipate the harm existed? Does it matter whether the people most likely to be harmed were given a chance to warn you? Who bears responsibility when an institution adopts an AI system whose flaws were documented and knowable — the company that built it, the institution that deployed it, or the regulators who allowed it?
That question has no clean answer. But it's the right question to be asking about every consequential AI deployment you encounter for the rest of your life. The fact that you're asking it at all puts you ahead of most people making decisions about these systems professionally. Use that.
A city government plans to deploy an AI system that recommends which families should receive priority access to affordable housing. The system uses income data, employment history, family size, and current housing conditions. The city says it will make the process "faster and fairer" than the current system, where decisions are made by individual caseworkers and wait times exceed two years.
You're on the independent oversight board that must approve or reject the deployment — or approve it with conditions. You have one conversation with REMI, who has studied comparable systems deployed in other cities. You need to decide: what conditions, if any, do you require before this system goes live?