In March 2023, thousands of researchers, engineers, and technologists signed an open letter calling for a six-month pause on training AI systems more powerful than GPT-4. The signatories included Geoffrey Hinton, Yoshua Bengio, and Elon Musk — but also thousands of ordinary practitioners whose names carried no fame. The letter did not produce a pause. What it produced was something subtler: it moved the Overton window. Within weeks, the EU accelerated its AI Act negotiations, the White House convened an emergency meeting with AI lab CEOs, and public coverage of AI risk became front-page news rather than niche speculation. Individual voices, aggregated and directed, changed the political weather.
A common response to learning about the alignment problem is paralysis: the challenges seem so vast, so technical, so entangled with geopolitical competition, that individual action feels meaningless. This reaction is understandable but mistaken. It confuses scale with causation. Large outcomes emerge from aggregated small decisions — which products get used, which norms get enforced at work, which politicians hear which concerns, which researchers get hired and which fields get funded.
History offers a useful corrective. The nuclear non-proliferation regime was built partly through the sustained advocacy of ordinary scientists like Leo Szilard, who organized the first petition against using atomic bombs on civilian targets in 1945. The petition failed — but Szilard's subsequent work helped create the Bulletin of the Atomic Scientists and the Pugwash Conferences, institutions that shaped arms control for decades. Individual initiative, even when it fails locally, can create infrastructure that matters globally.
On November 1, 2018, approximately 20,000 Google employees walked out of offices in 50 cities to protest the company's handling of sexual harassment — and to demand changes to how the company operated internally. Within days, Google announced it would end forced arbitration for harassment claims. The walkout demonstrated that employees inside technology companies hold substantial leverage: they possess specialized knowledge, public trust, and the ability to generate press attention that external protesters rarely command.
Regardless of your professional role or technical expertise, three levers are available to nearly every person engaging with AI systems:
What you say in workplaces, public forums, to elected representatives, and in communities shapes the normative environment in which AI is developed and deployed.
Which AI products you use, which companies you work for or invest in, and which terms of service you accept all send market signals that compound across millions of users.
Technical literacy — even at a conceptual level — lets you evaluate claims, participate in governance debates, and identify failures that others might miss or dismiss.
In 2020, researcher Timnit Gebru and colleagues submitted a paper to an internal Google conference documenting risks in large language models — specifically harms to marginalized communities and the difficulty of auditing such systems. Google management attempted to suppress the paper. Gebru's subsequent firing — and the public outcry it provoked — drew global attention to questions of research independence inside AI companies that had previously been invisible to most observers.
Gebru's leverage came from knowledge: she understood the systems well enough to articulate specific risks, and she had documented them rigorously. That knowledge, shared publicly, triggered regulatory attention, legislative hearings, and industry-wide conversations about AI ethics governance that continue today. You do not need to be Timnit Gebru to exercise knowledge leverage — but her case illustrates why understanding the alignment problem is itself a form of power.
The alignment problem is not only a technical problem. It is a social, political, and economic problem — which means it has social, political, and economic solutions. Every domain in which you already operate is a domain in which alignment-relevant choices exist.
The AI assistant will help you map the three levers — Voice, Choice, and Knowledge — onto your specific situation. Describe your role (student, professional, consumer, citizen) and the assistant will help you identify concrete, realistic actions you can take. Complete at least 3 exchanges to finish this lab.
In June 2018, the American Civil Liberties Union tested Amazon's Rekognition facial recognition tool against photographs of every member of the U.S. Congress. The system incorrectly identified 28 sitting members of Congress as criminals. Disproportionate error rates fell on Black and Latino legislators. Amazon defended its product. But what followed matters more: employee pressure from inside Amazon — combined with public documentation of the errors — contributed to Amazon announcing a one-year moratorium on police use of Rekognition in June 2020, later extended indefinitely. The tool had not changed. The social response to irresponsible deployment had.
Responsible use is not simply avoiding obviously harmful applications. It involves a set of active practices that require effort and, sometimes, friction with social or professional expectations:
Verify outputs before acting on them. In 2023, New York attorney Steven Schwartz submitted a legal brief citing six precedents generated by ChatGPT. All six cases were hallucinated — they did not exist. Schwartz faced sanctions and public humiliation. The failure was not that he used AI; it was that he did not verify. Every domain has verification requirements appropriate to its stakes.
Disclose AI involvement where it matters. In research, journalism, legal work, and education, undisclosed AI generation undermines the epistemic basis of those fields — the assumption that claims have a responsible human author who can be held accountable. Disclosure is not about stigma; it is about maintaining trust infrastructure.
Understand the terms you accept. Most AI service agreements grant companies broad rights to training data derived from your inputs. Inputting sensitive personal information, confidential business data, or medical records into commercial AI systems often transfers that information to third parties in ways users do not intend or expect.
In April 2023, Samsung employees inadvertently leaked confidential semiconductor source code and internal meeting transcripts by entering them into ChatGPT for assistance. The information became part of OpenAI's training pipeline. Samsung subsequently banned the use of generative AI tools on internal networks. The incident illustrates a systemic risk: responsible use requires understanding not just what AI does, but what happens to what you give it.
There are contexts in which declining to use an AI system — or demanding that one not be used on your behalf — is the most alignment-relevant action available. These contexts share common features: high stakes, low AI reliability in the specific domain, absence of meaningful human oversight, or use against people who have not consented.
In 2019, the city of San Francisco became the first U.S. jurisdiction to ban government use of facial recognition technology, following advocacy by the Electronic Frontier Foundation and local civil society groups. The ban was enacted not because facial recognition never works, but because its error rates in high-stakes contexts — policing — were judged to create unacceptable risks. The decision to not deploy a technology is itself a governance decision, and citizens and workers can advocate for it.
At the individual level, refusal might mean: declining to use an employer's AI surveillance tool while raising concerns through legitimate channels; opting out of AI-based hiring screening where permitted; or refusing to use AI for tasks — medical diagnosis, legal advice, mental health support — where the cost of error falls on vulnerable people and no professional oversight is present.
Responsible use is not about technophobia. It is about matching the reliability and oversight of an AI system to the stakes of the task. Where that match fails, slowing down or refusing is not obstruction — it is appropriate caution.
Every interaction you have with an AI system generates data — about what works, what fails, and what users accept. Flagging errors, using reporting mechanisms, writing reviews, and publicly documenting failures contributes to the feedback infrastructure that AI developers depend on. The OpenAI "Superalignment" team's research program relies partly on human feedback at scale. That feedback comes from users — including you.
Describe an AI use case you encounter in your own life — at school, at work, or as a consumer — and the assistant will help you audit it against the responsible use principles from this lesson: verification, disclosure, data sovereignty, and appropriate stakes-matching. Complete at least 3 exchanges.
The European Union's AI Act, which became law in August 2024, is the world's first comprehensive AI regulatory framework. Its risk-tier structure — prohibiting some applications, requiring conformity assessments for high-risk ones — emerged from years of public consultation in which civil society organizations, academic researchers, and individual citizens submitted formal comments that shaped the final text. The Act's prohibition on real-time biometric surveillance in public spaces — a provision with direct alignment implications — was strengthened specifically because of sustained advocacy from digital rights organizations including AlgorithmWatch and the European Digital Rights network (EDRi), many of whose members were not lawyers or engineers but informed citizens who had made AI governance their focus.
AI governance operates at multiple levels, each with different entry points for citizen participation. Understanding which level of governance is most relevant to a specific concern is half the work:
Legislative hearings, executive agency rulemaking processes (public comment periods), and direct contact with elected representatives. In the U.S., the NIST AI Risk Management Framework included a public comment process in 2022.
City councils have enacted facial recognition bans (San Francisco 2019, Boston 2020, Portland 2020). Local government is often the most accessible point of entry for civic advocacy.
Employers, universities, and professional associations adopt AI policies that govern members directly. These are often more changeable than law and more immediately personal.
The UN AI Advisory Body, OECD AI Principles, and G7 Hiroshima AI Process all accept civil society input. These processes shape global norms even without enforcement mechanisms.
If you work in a field that is adopting AI — which by 2024 means most fields — your professional context is a governance context. Several documented mechanisms allow employees to shape how AI is deployed inside organizations:
Ethics review processes. Many technology companies have established internal AI ethics boards or review processes. These processes depend on employees raising concerns. At DeepMind, an internal Ethics & Society team was established in 2017 partly in response to researcher concerns about the speed of deployment without ethical review. The team's existence did not prevent all problematic deployments, but it created a documented channel through which concerns could be raised and recorded.
Professional codes of conduct. Engineering societies including the ACM and IEEE have adopted AI ethics guidelines. In fields with licensing requirements — medicine, law, engineering — professional ethics boards can open investigations when AI deployments violate professional standards. In 2023, the American Bar Association issued guidance on AI use in legal practice, creating accountability structures for attorneys.
Whistleblower protections. In the United States, certain categories of AI-related harms — particularly those involving financial fraud or securities violations — are covered by existing whistleblower statutes that offer legal protection and financial reward. The Securities and Exchange Commission received its first AI-specific whistleblower complaint in 2023 regarding AI-generated investment advice.
Illinois's Biometric Information Privacy Act, passed in 2008, requires companies to obtain consent before collecting biometric data — including facial recognition prints. The law was passed after sustained advocacy by a coalition that included labor unions concerned about workplace surveillance. By 2023, BIPA had generated billions of dollars in class action settlements against technology companies including Facebook (now Meta), which settled for $650 million in 2021. A state law driven by citizen advocacy became one of the most significant constraints on AI deployment in the United States.
Choose a specific AI governance issue that concerns you — facial recognition, algorithmic hiring, AI in healthcare, autonomous weapons, or another — and work with the assistant to design a realistic engagement plan. The assistant will help you identify the right governance level, relevant organizations, and concrete first steps. Complete at least 3 exchanges.
The career advice organization 80,000 Hours, which focuses on high-impact career paths, updated its guidance on AI safety careers in 2023 to explicitly note that non-technical roles are among the most talent-constrained in the field. The organization identified policy analysts, communications professionals, lawyers, organizational psychologists, and social scientists as particularly needed. Its analysis found that for every technical AI safety researcher, there were fewer than 0.1 policy professionals working on related governance questions — a severe imbalance that limits the field's ability to translate research into real-world safeguards.
The phrase "AI safety career" conjures images of machine learning researchers working on interpretability or RLHF at labs like Anthropic, DeepMind, or OpenAI. That image is incomplete. The alignment problem — understood broadly as ensuring AI systems do what is genuinely beneficial — requires contributions across at least six domains:
Interpretability, robustness, RLHF, scalable oversight, formal verification. Requires ML expertise but also benefits from philosophy, cognitive science, and mathematics.
Drafting regulation, advising legislators, litigating AI-related harms, developing international governance frameworks. Currently one of the most talent-constrained areas.
Understanding how AI systems affect human behavior, institutions, and power structures. Essential for predicting and mitigating social harms at scale.
Translating technical findings into public understanding. Investigative AI journalism — exemplified by reporters at ProPublica, The Markup, and MIT Technology Review — shapes what the public demands from regulators.
Teaching AI literacy at every level, from K-12 through graduate school. The long-run supply of informed citizens and professionals depends on education now.
Building internal structures — ethics review boards, red teams, incident reporting systems — that make safety practices sustainable inside institutions.
In May 2016, ProPublica reporters Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner published "Machine Bias," an investigation into the COMPAS recidivism prediction algorithm used in U.S. criminal sentencing. They obtained proprietary risk scores for over 7,000 defendants in Broward County, Florida, and found that the algorithm was nearly twice as likely to falsely flag Black defendants as future criminals compared to white defendants, while white defendants were more likely to be incorrectly labeled lower risk.
The investigation triggered legislative hearings, academic debate about fairness metrics in machine learning, and reforms to how algorithmic tools are evaluated in criminal justice. None of the four journalists had machine learning training — they were data journalists who understood enough about algorithms to ask the right questions, obtain the right data, and communicate findings to a general audience. This is a model for how non-technical expertise can produce alignment-relevant impact.
In 2023, Anthropic published a "Responsible Scaling Policy" — a commitment to evaluate AI systems against safety benchmarks before deploying more powerful versions. The policy was developed with significant input from people outside pure ML research: ethicists, policy analysts, and communications professionals helped translate technical safety thresholds into commitments legible to regulators and the public. The document's existence, and its public nature, creates accountability that purely internal safety work does not.
If you are a student: The most accessible entry is developing AI literacy while pursuing whatever you are already studying. The Georgetown Center for Security and Emerging Technology, the Oxford Internet Institute, and the Harvard Berkman Klein Center all offer fellowships and research opportunities that combine domain expertise with AI policy work.
If you are already in a profession: The most immediate contribution is often internal — becoming the person in your organization who understands alignment risks well enough to raise them intelligently. Lawyers who understand AI liability, doctors who understand diagnostic AI limitations, teachers who understand AI in education: these people are needed everywhere and are currently rare.
If you want to shift careers toward alignment: The field has grown rapidly since 2022. The AI Safety Support organization offers career advising specifically for people transitioning into alignment-relevant roles. The Machine Intelligence Research Institute, the Center for Human-Compatible AI at UC Berkeley, and the Future of Humanity Institute at Oxford all publish accessible reading lists for people building foundational knowledge.
The alignment problem is not a problem for a generation of specialists to solve in isolation. It is a civilizational challenge that will be navigated — for better or worse — through the accumulated choices of many millions of people: what they build, what they use, what they refuse, what they demand, and what they understand. This course has been about giving you the understanding. The rest is yours.
This is the capstone lab. Working with the assistant, you will design a personal plan for contributing to AI alignment — grounded in your actual background, skills, and available time. The plan should include at least one near-term action (this month), one medium-term development (next year), and one longer-term aspiration. Complete at least 3 exchanges.