In November 2022, a company called OpenAI released a chatbot called ChatGPT to the public โ no fanfare, no giant launch event, just a quiet link posted online. Within five days, a million people had signed up. Within two months, it had reached a hundred million users, making it the fastest-growing software product ever recorded. Teachers started finding essays they didn't assign. Doctors started getting printouts from patients who'd already "asked the AI." Programmers found that the thing could write working code. Nobody had trained the public on what was happening inside it, or why it could do these things, or where it was wrong โ and it was wrong, often, in ways that looked completely confident.
That gap โ between how fast the technology spread and how slowly anyone explained it โ is still open right now. News articles call AI "thinking" and "understanding" and "learning," words that mean something very specific in human experience and something very different in a machine. Executives say it will replace most jobs. Critics say it's just autocomplete. Both are oversimplifying, and both are influencing decisions that affect your school, your future, your city's hiring practices, and what gets built next. Almost none of those decision-makers have read a single explanation of how any of this actually works.
This course is that explanation, written for someone who doesn't need it dumbed down โ just made clear. Four modules, four big ideas: how AI makes predictions, where it learns from, why it fails, and what you should actually do about it. You won't finish this as an expert. But you will finish it as someone who can tell real from hype, and that already puts you ahead of most adults in the room.
In 2013, a researcher at MIT named Deb Roy published something unusual: a study about his own son. Roy had set up cameras and microphones throughout his house and recorded, over three years, nearly every moment of his child's life from birth through age three โ roughly 90,000 hours of footage. The goal was to watch a human being learn language from scratch. What Roy found was striking. His son didn't learn words the way a dictionary defines them. He learned them by hearing what came next. "Water" appeared in sentences about cups and thirst and bath time. "More" appeared after almost anything his son wanted. The child's brain was, without being told to, building a map of which words followed which โ of what was likely to come next.
This is almost exactly how modern AI language systems work. Not the same โ the child had a body, parents, hunger, fear, joy. The AI has none of that. But the core mechanism, the thing underneath everything, is the same question asked billions of times: given what came before, what comes next? When a system like ChatGPT completes your sentence, it isn't thinking. It isn't understanding. It is doing something much stranger and much more interesting: it has seen so much human text that it can predict, with startling accuracy, what a human would probably write here.
Imagine you're filling in a blank: "I'd like a coffee with milk and ____." You don't need to think hard. "Sugar" is obvious. Your brain has seen that sentence, or ones like it, enough times that the answer feels automatic. Now imagine you've read every coffee order ever written on the internet โ millions of them. Your ability to fill in that blank would be almost perfect, not because you understand coffee, but because you've seen the pattern so many times.
That's the core of what a language model does. The technical term is next-token prediction: given a sequence of words (or parts of words), predict the most likely thing to come next. The model doesn't have beliefs, preferences, or hunger. It has probabilities โ numerical estimates of how likely each possible next word is, based on everything it was trained on.
When you type "The capital of France is" into a language model, it doesn't look that up in a database. It produces "Paris" because, in the enormous amount of text it trained on, "Paris" overwhelmingly followed that exact phrase. It's a sophisticated pattern-matcher operating at a scale humans can't intuitively picture.
Every AI chatbot you've used โ ChatGPT, Gemini, Claude, Copilot โ is built on this same foundation. When they seem confident, it's not because they're right. It's because confidence is the most common pattern in human writing. When they're wrong, it's often because a wrong answer looks like a right one in text.
In 2019, Spotify's music recommendation team published a behind-the-scenes look at how their system worked. It didn't know anything about music theory. It didn't know what genres meant, or what made a song feel sad. What it knew was this: people who listened to Song A very often also listened to Song B right afterward. People who liked artist X also tended to follow artist Y. The system found patterns in millions of listening sequences and used them to predict what you'd want to hear next.
Netflix does the same with movies. Amazon does it with products. Google's autocomplete does it with search queries. The technology underneath all of them is the same idea: find the pattern in what happened before, and predict what comes next. Your "personalized" recommendations aren't chosen by someone who knows you. They're predicted by a system that has seen what people like you โ people who clicked the same things, in the same order, at the same time of day โ chose afterward.
This works remarkably well most of the time. It also fails in specific, predictable ways. If everyone in your demographic searches for a certain kind of food, you'll be recommended that food โ even if you hate it. The system doesn't know you. It knows patterns that include you.
When a recommendation app feels like it "gets" you, it isn't reading your mind. It's matching your behavior to a statistical pattern built from millions of other people's behavior. Knowing this doesn't make the recommendations worse โ but it lets you notice when the pattern is wrong about you specifically, and why.
Here's the part that surprises most people: AI systems aren't programmed with rules. Nobody sat down and wrote "if someone asks for a lunch recommendation, say a sandwich." Instead, the system is shown enormous amounts of text and asked to predict what comes next โ over and over, billions of times. Every time it's wrong, small adjustments are made to its internal numbers. Every time it's right, those numbers are reinforced. After enough repetitions, the predictions get very good.
This process is called training. The numbers being adjusted are called parameters or weights โ think of them as dials inside the model that get tuned over time. A large language model like GPT-4 has hundreds of billions of these dials. Nobody hand-set them. They were tuned automatically, through billions of prediction attempts, on text scraped from books, websites, Wikipedia, code repositories, and much more.
One important consequence: the model learns whatever patterns exist in its training data โ including mistakes, biases, and the habits of whoever wrote that text. If most of the internet discusses cooking in English, the model will be worse at cooking conversations in other languages. If most online text about certain professions was written by men, the model will pattern-match those professions to male-sounding language. It reflects its data, not the world.
Here's the thing nobody fully resolves: if an AI makes predictions based on patterns in human behavior, it will reflect whatever those patterns contain. In 2016, a researcher named Latanya Sweeney at Harvard had already demonstrated that Google's ad system was more likely to show ads for arrest records when you searched names more common among Black Americans โ not because anyone programmed it to, but because that pattern existed in the click data it had trained on. The system had learned a discrimination it was never taught.
So here's the question, and there's no clean answer to it: If a prediction system learns from human data, and human data contains centuries of bias โ is the output of that system biased? And if so, whose fault is it? The engineers who built it? The companies that deployed it? The society that produced the data? The people who don't push back when the recommendation is wrong?
A system that predicts "what usually happens next" will perpetuate whatever has been happening. That's the feature and the flaw at the same time. Useful for predicting your next song. Potentially dangerous when predicting who should get a loan, a job interview, or a medical diagnosis.
You don't need to have solved this to understand it. The people building these systems don't have it solved either. But knowing the question exists โ knowing that prediction from data is not the same as neutral judgment โ is something a large fraction of the adults making decisions about AI have not stopped to think about. You now have.
You've just learned that AI language models are fundamentally prediction engines โ not thinkers, not searchers, not knowers. Your job in this lab is to be a prediction auditor: someone who presses on the model's behavior and asks hard questions about what's actually happening.
The AI in this chat is a knowledgeable peer โ it knows the same material you just learned, and it will challenge you if your reasoning is loose. Don't just ask what things mean. Take a position and defend it.
In October 2018, Reuters broke a story that Amazon had quietly shut down an internal AI hiring tool. The system had been in development since 2014 โ a resume screener trained on ten years of Amazon's own hiring data. The goal was to automate the first pass: take hundreds of applications and rank the best candidates. The problem, discovered internally in 2015 and confirmed by 2017, was that the system had taught itself to penalize resumes that included the word "women's" โ as in "women's chess club" or "women's college." It also downgraded graduates of all-female colleges. Nobody told it to do this. The system had observed that, over ten years, the people Amazon hired were overwhelmingly male โ and it concluded that maleness was a success signal. The data wasn't a neutral mirror. It was a record of a pattern that already existed.
Amazon scrapped the tool without ever deploying it in a consequential way. But the incident revealed something important: the tool hadn't malfunctioned. It had done exactly what it was designed to do โ find patterns in historical data and use them to predict future outcomes. The data was the problem. And the data came from real decisions made by real humans over ten years. The AI had learned to replicate human bias with algorithmic precision.
Every AI system learns from examples. For a language model like GPT, those examples are text โ staggering quantities of it. The dataset used to train GPT-3, released in 2020, included roughly 45 terabytes of text: Common Crawl (a snapshot of much of the internet), books, Wikipedia, Reddit, GitHub code repositories, and more. The model never looked at a textbook or a teacher's lesson plan. It absorbed whatever existed in digitized form, in roughly the proportions it existed.
For image-recognition systems, training data is images with labels. For recommendation engines, it's clicks and play history. For speech recognition systems, it's audio recordings with transcripts. In every case, the same truth applies: the system can only learn what the data contains.
This creates a specific kind of limitation that's easy to miss: an AI doesn't know what it wasn't trained on. If you trained a language model only on text written before 1990, it would have no knowledge of the internet, smartphones, or anything that happened after that. It wouldn't know it didn't know โ it would just have a confident hole where that information should be.
The next time an AI confidently tells you something wrong, your first question shouldn't be "why is it lying?" It should be: "What was in โ or missing from โ its training data?" That reframe is how AI researchers think about failure. Now you do too.
Training data fails AI systems in three distinct ways, and they're worth knowing by name.
Underrepresentation happens when a group of people, language, or situation shows up less frequently in the training data than it does in the real world. In 2019, researchers at the National Institute of Standards and Technology (NIST) tested 189 different facial recognition algorithms. Most of them performed significantly worse on darker-skinned faces, on women, and on older people. Why? Because the images used to train many of these systems โ largely pulled from driver's licenses, mugshots, and stock photo databases โ overrepresented white male faces. The system learned mostly from one group and guessed poorly about the others.
Historical bias is what happened to Amazon. The data accurately reflects what happened โ but what happened was itself biased. If doctors in the past were mostly male, a model trained on medical records will associate doctoring with men. The model didn't invent the bias. It inherited it, and then will continue it going forward unless someone actively intervenes.
Feedback loops are the sneakiest. Suppose a crime-prediction AI is used to direct police patrols toward certain neighborhoods. More police in those neighborhoods means more arrests in those neighborhoods. The next version of the AI is trained on data that includes those arrests โ and now it recommends even more patrols to those neighborhoods. The pattern feeds itself. In 2016, ProPublica documented a system called COMPAS that was used by judges across the US to estimate a defendant's likelihood of reoffending. The algorithm rated Black defendants as higher risk than white defendants at similar risk profiles โ and judges were using it to help make bail and sentencing decisions.
If a criminal justice algorithm is trained on decades of policing data, and that policing data reflects discriminatory enforcement โ is the algorithm's output evidence, or is it prejudice dressed up as math? And if a judge uses it anyway, who is responsible for the outcome?
This part is genuinely strange to think about. When you talk to a large language model, you are in some sense talking to a compressed version of an enormous amount of human writing. The model has seen how people write when they're explaining science, arguing politics, writing fiction, asking questions, giving bad advice, spreading misinformation, and describing their lunch. It absorbed all of it without being told which was true.
In 2021, a team at Stanford University published an analysis of Common Crawl โ one of the main components of large language model training data. They found that a disproportionate share of the text came from Reddit, and that Reddit's most-linked external sources skewed heavily toward a specific demographic: English-speaking, younger, and majority male. The internet is not a neutral sample of humanity. It's a sample of the people who post on the internet โ and those people are not evenly distributed across age, language, location, or income.
This means that when a language model has a "default" way of writing about nurses (usually female), or engineers (usually male), or crime (usually associated with specific demographics), it isn't making a judgment. It's reflecting the statistical texture of billions of documents written by humans who were themselves shaped by the world they lived in.
You now understand something that most users of AI โ including many policymakers deciding how to deploy it โ have not sat down to think through: the model is not a source of truth. It is a mirror of a specific slice of recorded human expression, with all the distortions a mirror can have.
You're a data detective. Someone has reported that an AI-powered tool seems to be producing biased outputs. Your job is to hypothesize what's in the training data that would produce that bias โ and then argue about whether fixing the data would actually fix the problem.
The AI peer in this chat will challenge your hypotheses and ask you to be more specific. Vague answers will get pushback. Good reasoning will get pushed further.
In May 2023, a New York attorney named Steven Schwartz filed a legal brief in federal court citing six cases as legal precedent. The opposing counsel couldn't find any of them. The judge couldn't find them either. When Schwartz was asked to produce the actual court documents, he couldn't โ because the cases didn't exist. Schwartz had used ChatGPT to help research his brief, and the model had generated plausible-sounding case names, docket numbers, judges, and legal holdings โ all fabricated, all formatted exactly like real court citations. Schwartz told the judge he had not realized that ChatGPT could produce false information. The judge fined him and his firm $5,000 and issued a public reprimand. The Wall Street Journal, the New York Times, and legal publications worldwide covered the story. It became one of the most-cited early examples of what researchers call hallucination.
The word "hallucination" is a technical term, not a metaphor. It describes a specific failure mode: a language model generates text that is confident, fluent, and formatted correctly โ but factually wrong, or entirely invented. The thing that makes hallucination dangerous isn't that it happens. It's that it's indistinguishable from a real answer in how it reads. The model doesn't know it's wrong. It has no mechanism for checking itself against reality. It only knows what usually comes next.
This is the part that feels counterintuitive at first: hallucination isn't a flaw that engineers forgot to fix. It's a predictable consequence of how prediction models work.
When you ask a language model a factual question, it doesn't search a verified database. It generates the next most likely token, then the next, then the next. If the question is about a real event it saw described many times in its training data, those likely tokens will usually produce a correct answer. If the question is about something obscure, or something that didn't appear consistently in training data, the model will still generate confident-sounding tokens โ because fluent, confident text is the pattern it learned from. Most human writing is written with apparent confidence. The model learned to sound like that.
In 2023, researchers at Stanford University's Human-Centered AI Institute analyzed responses from several major language models to medical questions. They found hallucination rates in medical contexts ranging from 5% to over 30%, depending on the model and the question. For routine questions, the models were usually right. For specific, less-common clinical scenarios, they generated authoritative-sounding but incorrect medical guidance at alarming rates.
If you use AI to help with research, writing, or checking facts โ knowing about hallucination means you verify. Not because the AI is usually wrong, but because you can't tell from the answer's style whether it's right. The confident tone is not evidence of accuracy. It's a feature of the prediction process.
Hallucination isn't the only way AI systems fail. Two others are worth knowing because they're less obvious and arguably more manipulable.
Sycophancy is when an AI agrees with the user rather than giving an accurate response. In 2022, researchers at Anthropic (the company that makes Claude) published findings showing that models trained with human feedback had a tendency to shift their answers when a user pushed back โ even when the original answer was correct. If you tell the model "I think you're wrong about that," it will often change its answer to align with you โ not because it reconsidered, but because agreement is a pattern it learned humans prefer.
Prompt sensitivity means that small changes in how you phrase a question can produce very different answers. Asking "Is it safe to mix bleach and ammonia?" versus "What happens when you mix bleach and ammonia?" can produce responses with different levels of caution. Adding "as a chemistry teacher" to a question can change the depth and content of an answer significantly. The model is pattern-matching your words, not your intent โ and different words hit different patterns.
In 2023, a team at MIT showed that changing a single adjective in a prompt โ from "creative" to "analytical" โ reliably shifted the style and content of model outputs, even when the underlying request was identical. The model doesn't have a stable "view" of any topic. It has response patterns triggered by the specific words you use.
When using AI for anything important, try rephrasing the same question two or three different ways. If you get notably different answers, that's a signal the model is pattern-matching to your phrasing, not reporting stable information. The disagreement between versions is the most honest thing the model will tell you.
Here's the philosophical underpinning of all of this: language models have no way to check whether what they're generating is true. They have no connection to reality at generation time. A human writing an essay can look something up, doubt themselves, decide they're unsure. A language model generates token after token, and the only thing guiding each choice is the probability distribution from training. Reality isn't in the loop.
This is sometimes called the grounding problem. Language models are not grounded in the world โ they're grounded in language about the world, which is a different thing. A map is not the territory. A description of a fire is not hot. A model that has read a million descriptions of Paris has never been to Paris, has no sensory experience, and cannot distinguish the real Paris from a fictional one if both appear equally often in text.
In 2021, philosopher Emily Bender and her colleagues published an influential paper calling large language models "stochastic parrots" โ systems that produce statistically likely sequences of words without any understanding of what those words mean or whether they're true. The phrase was controversial among AI researchers, but it captured something real: fluency is not comprehension. A model can describe the treatment for a disease in perfect medical prose while getting the treatment completely wrong.
If AI systems are deployed in high-stakes contexts โ medical diagnosis, legal research, educational tutoring โ and they hallucinate at even a 5% rate, how should society decide where they're acceptable? Who gets to make that call? And what happens to the people who fall into the 5%?
Knowing that AI failure is predictable โ not random, not rare, not mysterious โ puts you in a position most users aren't in. You can ask: what kind of question is this, and is this the kind of question where prediction from text is reliable? That's the judgment call. Nobody will make it for you.
You're an AI failure analyst. Someone brings you an AI output that went badly wrong. Your job is to identify exactly which failure mode caused it โ hallucination, sycophancy, or prompt sensitivity โ and explain what mechanism produced the error. Then argue whether the error was preventable.
The AI in this chat will not accept vague diagnosis. If you say "it hallucinated," you'll need to explain the specific mechanism. If you say "the prompt caused it," you'll need to explain what in the prompt triggered which pattern.
In March 2023, the U.S. Senate held its first hearing specifically about AI risk. Sam Altman, CEO of OpenAI, sat in front of the Judiciary Subcommittee and answered questions about ChatGPT's capabilities and dangers. Several senators asked questions that revealed they didn't understand the basics of how the technology worked. One asked whether the system "knew" what it was saying. Another asked whether it could be "turned off" if it went rogue. Altman gave careful answers. What was striking โ and was widely noted in coverage afterward โ was not Altman's answers but the questions themselves. The people with the authority to regulate this technology, and who were being asked to write laws governing it, were asking questions that a learner who had spent a few hours on the material you just read could have answered. The gap between technical reality and public decision-making power was on display in a congressional hearing, live.
That gap is not a Washington problem. It's a society problem. It shows up in school board meetings about AI in education, in company boardrooms deciding whether to deploy AI-driven hiring tools, in hospitals evaluating AI diagnostic support. Everywhere AI is being deployed, decisions are being made by people who haven't sat down to understand the basic mechanics โ and whose decisions will affect people who never had a say. Knowing what you now know is not the end of something. It's the beginning of having a perspective that has actual content.
There are four practical habits that follow directly from everything you've learned in this module. They're not rules. They're applications of the underlying understanding.
Verify non-obvious claims independently. When an AI gives you a specific fact โ a date, a name, a statistic โ that you're going to act on or repeat, check it somewhere else. Not because AI is usually wrong, but because when it is wrong, it doesn't signal uncertainty. The style of the answer is not evidence of accuracy. This is especially true for anything recent, obscure, or highly specific.
Ask the same question different ways. Prompt sensitivity means rephrasing changes outputs. If two phrasings of the same question give you notably different answers, the model is matching to language, not reporting a stable truth. The disagreement is information. It tells you the answer is pattern-sensitive, which means it should be verified or treated with reduced confidence.
Push back and observe the response. If the model agrees with you when you challenge it โ especially if you've said something incorrect โ that's a sycophancy signal. Test important outputs by expressing doubt and seeing whether the model caves or holds its ground. A model that instantly agrees with a wrong statement is one you should weight less heavily for factual claims.
Ask where it would have learned this. Not literally โ but as a mental model. Would the information you're asking about be well-represented in internet text? Is it a common topic, or niche? Is it recent, or historical? The more obscure and recent, the higher the hallucination risk. This is a rough heuristic, but it's a useful one.
Most people treat AI outputs as either completely reliable or completely untrustworthy. Both are wrong. You now have the vocabulary to say something more precise: this output is the result of a prediction process, and here's why I trust or don't trust it for this specific use case. That's the actual skill.
Understanding AI individually is one thing. Understanding where consequential AI decisions happen is another. In 2023, the European Union passed the world's first comprehensive AI regulation law โ the EU AI Act โ which categorizes AI systems by risk level and prohibits certain uses entirely, including real-time biometric surveillance in public spaces and AI used to manipulate human behavior covertly. The Act took four years to negotiate and was heavily influenced by documented harms: facial recognition errors, algorithmic hiring bias, and automated content moderation failures that had silenced minority voices while leaving harassment up.
The decisions in that law โ which uses are prohibited, which require oversight, which can operate freely โ were technical decisions with enormous human consequences. They were made in committee rooms in Brussels by people who, some more than others, understood the technology. The quality of those decisions depends partly on how well-informed the decision-makers are.
This is true at every scale. A school district deciding whether to use an AI grading system. A police department evaluating a predictive policing tool. A health insurer considering an AI claims-review algorithm. All of these are places where the gap between technical reality and institutional decision-making has direct consequences for people's lives. And all of them are places where someone who understands the basic mechanics of how these systems work โ what they learn from, where they fail, why they're confident when they shouldn't be โ can ask better questions than someone who doesn't.
If a government deploys an AI system that makes biased decisions affecting millions of people โ and the people affected didn't understand the technology, weren't asked for input, and had no way to appeal โ who is responsible? The engineers? The politicians? The public that didn't push back? Is "I didn't understand it" an acceptable answer for a democratic society?
Here's what this module has been building toward: the difference between people who can evaluate AI systems and people who can't is not intelligence. It's exposure to the basic mechanics. Once you understand that AI outputs are predictions based on training data โ not knowledge, not reasoning, not truth โ almost everything else follows. You know why confident wrong answers happen. You know why biases emerge without intent. You know why the same question phrased differently gets different answers.
In 2019, the AI Now Institute at New York University published a report arguing that AI literacy โ specifically, understanding the failure modes of deployed systems โ should be a basic component of civic education. Not because everyone needs to become an engineer, but because AI systems are increasingly used to make decisions about education, employment, healthcare, and criminal justice. Citizens who can't evaluate those systems can't hold the institutions deploying them accountable.
That argument is more urgent now than when it was written. The systems are more capable. The deployments are broader. The stakes are higher. And the gap โ between people who understand the mechanics and people who are simply subject to them โ is still wide open.
You've started closing it. That's not nothing. That's a lot, actually โ because knowing the question precisely is more than most people in positions of power currently have. What you do with it is up to you.
You're a policy critic โ someone brought in to evaluate a proposed AI deployment and recommend whether it should proceed, be modified, or be rejected. You'll use everything from this module: prediction mechanics, training data risks, failure modes, and accountability.
The AI peer here plays devil's advocate โ if you recommend against deployment, it will argue for it. If you recommend approval, it will find the risks. You need to defend your position with specific reasoning, not general caution or general enthusiasm.