Module 4 · Lesson 1

The Spam Filter That Almost Worked

Two ways to teach a machine — and why one of them cracked under pressure

If you write down every rule you know, can you ever write enough?

By the spring of 2004, Gmail had not yet launched publicly. Engineers at Google were racing to solve a problem that had been quietly destroying email for years: spam. Not the occasional weird message — a tidal wave. By that year, researchers at Postini, a company that processed corporate email, reported that roughly 77% of all email sent on the internet was spam. More junk than real mail. Far more.

The first wave of spam filters worked by rules. Someone would sit down and write: if the subject line contains "FREE MONEY," block it. If the sender domain is from this list, block it. If there are more than three exclamation points, block it. Engineers called these rule-based filters, and for a while they worked. Then the spammers read the rules.

They started writing "FR€€ M0N€Y." They registered new domains every day. They added legitimate-looking sentences copied from news articles to fool the exclamation-point detector. Every time an engineer added a new rule, spammers found the gap around it. The rule list grew to thousands of entries. It could never grow fast enough.

Then Google's engineers tried something different. Instead of writing rules, they fed their system millions of examples — emails humans had already labeled as spam or not spam — and let the system figure out its own patterns. The result was a filter that could catch spam that no one had ever seen before, because it had learned what spam felt like across thousands of subtle signals at once. That approach — learning from examples instead of following written rules — is the core of what we now call machine learning.

Section 1: What a Rule-Based System Actually Is

A rule-based system — sometimes called an expert system or a symbolic AI system — is exactly what it sounds like. A human expert thinks through every scenario they can imagine, writes down a rule for each one, and the program follows those rules. No exceptions. No interpretation. If the situation matches a rule, the system applies it. If no rule matches, the system is stuck.

This is powerful in narrow, well-defined situations. The rules that control a traffic light — green for 45 seconds, yellow for 5, red for 40 — never need to learn anything. The situation is fully understood. But the moment the world gets complicated and unpredictable, rules alone start to crack.

Rule-based system:A program that makes decisions by checking a list of specific "if this, then that" instructions written by humans. It can only handle situations the rules were designed for.

Expert system:A type of rule-based AI from the 1970s–1990s where the rules were written by interviewing human experts in a field — doctors, lawyers, engineers — and encoding their knowledge directly.

In the 1980s, a company called XCON — built by Digital Equipment Corporation and Carnegie Mellon University — became famous for configuring computer orders automatically using about 2,500 rules. It saved DEC an estimated $25 million per year by 1986. It was celebrated as proof that AI had finally arrived. But when computer hardware changed faster than engineers could update the rules, XCON started making errors. The rules couldn't keep up with a world that kept moving.

Section 2: What a Learning System Actually Does

A machine learning system doesn't start with rules. It starts with data — thousands or millions of labeled examples — and finds patterns that humans might never think to write down. Instead of someone saying "spam emails often mention prizes," the system is shown 10 million emails and discovers on its own that spam tends to have certain word combinations, certain sender patterns, certain timing signals, all weighted together in ways too complex to describe as a simple rule.

Machine learning:A way of building AI where the system is trained on large amounts of example data and discovers its own internal patterns — instead of being given explicit rules by a programmer.

The key difference: in a rule-based system, a human has to understand the problem well enough to write every rule. In a machine learning system, the machine discovers patterns from examples even when humans can't fully articulate what those patterns are.

That's powerful — but it also means the system can find the wrong patterns. If most of your spam training examples happen to be written in a particular language, the system might associate that language with spam — not because the language is the problem, but because of an accident in your data. The machine learned, but it learned something you didn't intend.

Ethical Question

If a machine learning system finds a pattern you didn't put there — and that pattern turns out to be biased against a group of people — who is responsible? The programmer who built it? The company that deployed it? The person who collected the data? There's no clean answer here. This question is actively debated by researchers, courts, and governments right now.

Section 3: Why the Difference Matters

When you hear someone say "AI," they almost always mean a machine learning system — not a rule-based one. But rule-based systems haven't disappeared. They're still inside your GPS (the routing algorithm has explicit rules about traffic laws), inside airplane autopilots (hard-coded rules for certain emergencies), and inside most financial trading systems (rules for what a bank is legally allowed to do).

The choice between rules and learning is a real engineering decision with real consequences. Rules are transparent — you can read them and understand why the system decided what it decided. Learning systems are often opaque — the pattern exists as millions of tiny numerical weights inside the model, and no one can read them the way you'd read a sentence.

This is why courts, hospitals, and governments often require rule-based logic for high-stakes decisions: they need to be able to explain why a decision was made. A judge can't just say "the algorithm said so." A doctor can't either. But a spam filter? Nobody needs a legal explanation for why their email got flagged.

You Now See What Most People Miss

Every time someone says "AI made a mistake" or "AI is biased," the useful question is: is this a rule-based system or a learning system? If it's rule-based, someone wrote a bad rule. If it's learning-based, something went wrong in the data or the training process. The fix is completely different in each case — and most people reporting on AI don't know which one they're talking about.

Section 4: Two Systems, One World

Modern AI systems often combine both approaches. A self-driving car might use machine learning to recognize pedestrians and road signs — because no human could write enough rules to describe every possible visual — but use explicit rules to decide what happens when a pedestrian is detected: brake. That rule is written in code. It will not be overridden by a learning system. Engineers decided that some decisions need to be locked.

When Tesla's Autopilot system was investigated after accidents in 2016 and again in 2021, regulators had to determine whether the failure was in the learning part (the perception of the road) or the rule part (what the car was supposed to do once a hazard was detected). The distinction mattered enormously for who was responsible and how to fix it.

Understanding which kind of system you're dealing with is no longer a techie detail. It's the kind of thing that affects accident investigations, insurance claims, medical diagnosis, hiring decisions, loan approvals, and criminal sentencing. Every one of those domains has AI in it now. Every one of them has the rules-vs-learning question sitting just beneath the surface.

Lesson 1 Quiz

Rules vs. Learning — test your reasoning, not your memory

1. The early rule-based spam filters failed primarily because:

Correct. Rules are brittle when the adversary can see them. The spammers reverse-engineered the rules and changed their messages to slip past them. This is the core weakness of rule-based systems in adversarial environments.

Not quite. The fundamental problem was that rules are visible and fixable by anyone who wants to work around them. The spammers adapted faster than the engineers could add new rules.

2. A hospital wants to build a system that flags high-risk patients. Which approach would be easier to use in a legal proceeding to explain why a patient was flagged?

Correct. Rule-based decisions are explainable because every step is traceable. Machine learning models store patterns as numerical weights, not readable logic, making explanation difficult. This is called the "explainability problem" in AI.

Consider what "explaining" a decision means in court. If someone asks why the system flagged patient X, "because the model assigned a higher weight to those features" isn't really an explanation a judge or patient can act on. Rules, by contrast, can be read aloud.

3. DEC's XCON expert system eventually started failing. What does this tell you about rule-based systems?

Correct. XCON's rules described the computer hardware market of the early 1980s. When hardware changed, the rules became outdated and required expensive human updates. A machine learning system trained on new data could adapt — rules cannot update themselves.

The issue wasn't speed or rule count. The issue was that the world changed faster than humans could update the rules. The rules were a snapshot of knowledge at one point in time.

4. A new social media platform wants to detect hate speech. Engineers are deciding whether to use rules or machine learning. What is the most significant risk of using only machine learning?

Correct. This is a real documented problem. Facebook's own research in 2019 found that its AI moderation system flagged African American vernacular English at a disproportionately high rate compared to other dialects. The model learned patterns from biased labeled data.

Machine learning systems can certainly process text and do so at very high speed. The more serious concern is what patterns the system learns — especially when the training data reflects existing human biases.

5. Modern self-driving cars use machine learning to recognize pedestrians but explicit rules to decide what to do when a pedestrian is detected. Why not use machine learning for both?

Correct. This is the transparency and safety argument. A rule that says "always brake when a pedestrian is detected" is guaranteed. A learned policy might brake 99.7% of the time but occasionally do something unexpected. For safety-critical decisions, guaranteed behavior matters more than average accuracy.

The reason is about certainty and accountability, not speed or data availability. Engineers want some behaviors to be locked — not subject to whatever pattern a model happened to learn from data.

Lab 1: The Rule Auditor

You're auditing a content moderation system. Figure out what kind of AI it is — and what that means.

Your Role: Content Systems Auditor

A mid-size social platform called Tessera has just been sued because their moderation system incorrectly banned thousands of users over three months in early 2023. The company claims the system "uses AI to enforce community standards." Your job is to investigate whether their system is rule-based, learning-based, or a combination — and what that means for who is responsible for the errors.

Your AI contact is a fellow investigator who knows the technical side. They won't lecture you — they'll push back on weak arguments and ask you to defend your reasoning.

Start by telling your investigator what questions you'd ask the platform's engineers to figure out which kind of AI system they're using. Then work through what the answer changes about who's responsible.

Investigator Kai

Lab 1

Alright, you've got Tessera's engineering team in a conference room tomorrow. Before we walk in there, tell me: what are the first two questions you'd ask to figure out whether their moderation system is rule-based or learning-based? Don't just say "ask them which one it is" — that's too easy to dodge. What would actually reveal the truth?

Module 4 · Lesson 2

How a Machine Actually Learns

ImageNet changed everything — and nobody fully predicted how

What does it mean to "learn" if you have no idea what you're doing?

In 2009, a Stanford professor named Fei-Fei Li published something that looked, on the surface, like a very large spreadsheet. It was called ImageNet — a database of 14 million photographs, each one carefully labeled by humans. A photo of a golden retriever: labeled "dog." A photo of a Boeing 747: labeled "aircraft." An apple: labeled "fruit." Li had spent three years and crowdsourced the labeling work to tens of thousands of people through Amazon's Mechanical Turk platform, paying pennies per image.

Most computer vision researchers at the time were still trying to write rules for what a dog looks like — rules about ear shapes, fur textures, snout proportions. They were making slow progress. Then in 2012, a team from the University of Toronto — Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton — entered an annual competition called the ImageNet Large Scale Visual Recognition Challenge. Their system, called AlexNet, had not been given any rules about what a dog looks like. It had been shown millions of labeled images and trained to adjust its internal numbers until its error rate dropped.

AlexNet won the 2012 competition with an error rate of 15.3%. The second-place team had an error rate of 26.1%. It wasn't a close race — it was a rupture. In a single competition, a learning-based system had destroyed every rule-based approach that had come before. Within three years, nearly every computer vision lab in the world had switched approaches. The thing Fei-Fei Li had quietly assembled — millions of labeled examples — turned out to be the fuel that the learning systems needed.

Section 1: Training — The Core Process

Here's the fundamental loop that makes machine learning work. It has four steps, and they repeat thousands or millions of times.

Step 1 — Make a guess. The system looks at an example — say, a photo — and produces an output: "I think this is a cat." At the start of training, these guesses are essentially random, because the system's internal numbers haven't been adjusted yet.

Step 2 — Check the guess. The correct answer is known (because a human labeled the training data). The system compares its guess to the right answer and calculates how wrong it was. This measure of wrongness has a technical name: loss.

Loss:A number that measures how far a model's prediction is from the correct answer. High loss = very wrong. Low loss = close to right. The goal of training is to drive loss down.

Step 3 — Adjust. The system uses the loss number to figure out which internal numbers should be shifted — and by how much — to make the next guess slightly less wrong. This adjustment process has a name: backpropagation (usually called "backprop"). The shifts are tiny — fractions of a fraction.

Step 4 — Repeat. Do this for millions of examples, over and over, and the system's guesses get progressively better. The internal numbers settle into a configuration that captures the patterns in the data.

Backpropagation:The algorithm used to adjust a neural network's internal numbers after each wrong guess. It works backward through the network, figuring out which numbers contributed most to the error, then nudges them toward better values.

Section 2: What the Machine Is Actually Storing

After training, the machine hasn't stored a list of rules or a gallery of photos. It has stored a set of numbers — millions or billions of them — called weights. These weights define the patterns the system learned. When you show the trained system a new photo it has never seen, it runs that photo through its weights and produces a prediction.

Weights:The numbers stored inside a trained neural network. They represent the patterns the network learned during training. Changing the weights changes what the model "knows."

No one can look at those weights and read them the way you'd read a book. Researchers can analyze what patterns certain parts of the network seem to respond to — the first few layers of image classifiers often respond to edges and colors, deeper layers to shapes, even deeper layers to whole objects — but this analysis is never complete. The knowledge is distributed across millions of numbers in a way that has no clean human-readable translation.

This is what people mean when they say AI is a "black box." It's not that the math is secret. It's that the knowledge is stored in a form that humans can't directly read. You can see the input, you can see the output, but the middle is opaque.

Ethical Question

If a system's knowledge can't be read or explained in human terms, should it be allowed to make decisions that affect people's lives? Medical diagnosis. Loan approval. Parole decisions. All of these now have AI components. At what point does "the model says so" become an acceptable answer — and who decides where that line is?

Section 3: The Role of Data — And Why It's Not Neutral

Training data is not neutral. It is a snapshot of some part of the world, collected by specific people, in specific places, for specific purposes. Whatever biases exist in that collection get absorbed into the model's weights.

In 2018, researcher Joy Buolamwini at MIT published a study called "Gender Shades." She tested commercial face recognition systems from IBM, Microsoft, and Face++ on a set of faces representing different skin tones and genders. The systems were highly accurate — but not equally. Error rates were as low as 0.8% for light-skinned men and as high as 34.7% for dark-skinned women. The systems hadn't been told to be worse at recognizing darker-skinned faces. They had been trained on data that contained more light-skinned faces, and the patterns they learned reflected that imbalance.

This is a direct consequence of how machine learning works. The machine learned faithfully from its data. The problem was in what that data contained — and didn't contain.

You Now See What Most People Miss

When a company says "our AI is objective because it's just math," you now know that isn't quite right. The math is objective. But the data the math was trained on — and the choices about what to include and exclude — are human decisions with human biases baked in. The model is only as fair as the data it learned from.

Section 4: Generalization — Can It Handle What It's Never Seen?

One of the most important things a machine learning system has to do is generalize — perform well on new data it wasn't trained on. A spam filter trained on 2020 spam emails needs to catch spam written in 2024. An image classifier trained on photos from Europe needs to recognize objects in photos from other parts of the world.

When a model works beautifully on its training data but fails on new data, it has overfit — it memorized the training examples rather than learning general patterns. This is like a student who memorizes last year's exact test questions and then fails when the questions are slightly different.

Overfitting:When a model learns its training data too precisely — memorizing specific examples rather than general patterns — so it performs poorly on new, unseen data.

Preventing overfitting is one of the central challenges in machine learning. It's the reason researchers keep aside a "test set" — data the model never sees during training — to evaluate how well the model generalizes. If it scores 98% on training data but 71% on the test set, something has gone wrong in the learning process.

When you hear news stories about AI systems that "performed brilliantly in testing but failed in the real world," you're almost always reading about a generalization or overfitting problem. The real world is always more varied than any training set can capture.

Lesson 2 Quiz

How a Machine Learns — apply what you know

1. In the AlexNet story, the most important thing Fei-Fei Li contributed to the 2012 breakthrough was:

Correct. ImageNet was the data. AlexNet was the algorithm. Together they were unstoppable — but Li's dataset was the enabler that made the algorithmic leap possible. Without labeled data at that scale, no learning system could have been trained.

AlexNet's core innovation was its learning architecture, but that architecture needed something to learn from. The key enabler was Fei-Fei Li's 14-million-image labeled dataset. Data is the fuel; algorithms are the engine.

2. A model trained to detect spam achieves 99% accuracy on its training emails but only 68% accuracy on new emails it has never seen. This is most likely an example of:

Correct. When performance is high on training data but dramatically lower on new data, the model has overfit. It learned the specific quirks of its training examples — not the general patterns that would apply to new emails.

The gap between training performance and real-world performance is the classic signature of overfitting. The model memorized rather than generalized.

3. Joy Buolamwini's Gender Shades research showed face recognition error rates up to 34.7% for dark-skinned women versus 0.8% for light-skinned men. What does this tell you about how the systems were trained?

Correct. Machine learning systems reflect their training data. If the training data is imbalanced, the model's accuracy will be imbalanced too. No programmer needed to "intend" bias — it emerged from the data distribution.

The bias wasn't deliberately programmed. It emerged from what the data contained. If most training images show one type of face, the model learns patterns that work best for that type — and works less well for underrepresented groups.

4. Why do machine learning researchers keep a "test set" of data that the model never sees during training?

Correct. The test set is the reality check. High accuracy on training data is easy — the model may have simply memorized it. Only performance on data it has never seen tells you whether it has actually learned something useful and generalizable.

The test set's purpose is to measure generalization. It's intentionally kept separate so the model can't "study" it before the test — just like you can't memorize the answers if you've never seen the specific questions.

5. A company claims their hiring algorithm is "objective" because it uses pure mathematics. Based on Lesson 2, what's the most important question to ask them?

Correct. "Objective math" can encode subjective data. If the training data reflects historical hiring patterns that disadvantaged certain groups, the algorithm will learn to replicate those patterns — while appearing mathematically neutral.

The math itself is neutral. But what the math was trained on matters enormously. The critical question is always: what did the model learn from, and what biases were in that data?

Lab 2: The Training Data Detective

A hiring algorithm is flagged for bias. You need to figure out where the bias came from.

Your Role: Independent Bias Investigator

It's 2023. A large logistics company called Meridian Freight has been using an AI system to screen job applications since 2019. A watchdog group has filed a complaint: the system approved far fewer applications from candidates who attended historically Black colleges and universities (HBCUs) compared to candidates from other schools with similar average GPAs. Meridian says the system "just learned from historical hiring data." Your contact has deep technical knowledge and is here to help you figure out exactly where the bias entered the system.

Start by explaining to your contact what you think happened in the training process — what specific aspect of "learning from historical data" could produce this result. Then work through what you'd need to examine to confirm it.

Analyst Priya

Lab 2

Okay, so Meridian's defense is "the AI just learned from what worked in the past." That's worth unpacking carefully, because it might be true in a way that actually confirms the complaint rather than refuting it. Walk me through your theory: if the historical hiring data produced an AI that systematically disadvantages HBCU graduates, what does that tell you about what "worked in the past" actually meant at Meridian?

Module 4 · Lesson 3

When Rules Win

The Therac-25 disaster and why some decisions must never be learned

Are there some choices that should never be left to a machine that learns?

Between June 1985 and January 1987, a radiation therapy machine called the Therac-25 gave six patients massive radiation overdoses. At least three of them died. The machine was built by a Canadian company called Atomic Energy of Canada Limited. It was used in hospitals across the United States and Canada to deliver precisely calibrated doses of radiation to cancer patients.

The Therac-25 was, in its time, sophisticated — it replaced earlier models that used hardware safety locks with software-only safety checks. The previous machines had physical interlocks: mechanical switches that literally could not allow the high-power beam to fire if certain conditions weren't met. The Therac-25 eliminated those hardware locks and replaced them with software. The software was supposed to check the same conditions. But it had a race condition — a bug where if an operator typed a command too quickly after correcting an entry, the safety check would complete before the correction was registered, and the machine would fire the full beam at a patient who was supposed to receive a lower dose.

The machine wasn't learning. There was no AI involved. But the Therac-25 disaster became the foundational case study that engineers use when they talk about when explicit, locked, auditable rules must be used — and why some decisions should never be handed to any system, AI or otherwise, that cannot be fully read and verified by humans.

Section 1: The Case for Rules in Safety-Critical Systems

The Therac-25 wasn't a machine learning failure. But the lessons it produced are directly applicable to AI. The core principle that emerged from the investigation — published by Nancy Leveson and Clark Turner in 1993 — was this: in systems where a failure can kill someone, safety must be enforced by mechanisms that are verifiable, auditable, and independent of the primary system.

For AI, this translates directly. A machine learning model that recommends a radiation dose cannot be the final authority on whether that dose gets delivered. A rule-based safety check — one that can be read, tested, and certified — must sit between the model and the real-world action.

Safety interlock:A hard check — often implemented as explicit rules or physical mechanisms — that prevents a dangerous action regardless of what any other part of the system says. It's a last-resort guarantee, not a suggestion.

Today, the FDA requires that medical AI systems include exactly this kind of explainable, verifiable safety layer. When the agency approved the first AI-powered radiology tools in the 2010s, the requirements included that a human physician must review AI recommendations before any treatment decision — because the learned model, no matter how accurate in testing, cannot bear final responsibility.

Section 2: Transparency vs. Power — The Real Trade-Off

Here's the tension that engineers and policymakers are wrestling with right now. Machine learning systems are often more powerful at the actual task — better at spotting tumors, catching fraud, predicting failures — than rule-based systems. But they are less transparent. You can't read their decision logic. You can't audit their reasoning for a specific case.

Rule-based systems are the opposite. They're transparent — every decision can be traced. But they're limited by what humans could think to write down. And they require constant human maintenance as the world changes.

There is no solution that gives you both. This is a genuine trade-off, not a problem waiting for a clever fix. The choice of which approach to use — or how to combine them — is a design decision with moral weight. Getting it wrong in medicine means people die. Getting it wrong in hiring means careers are derailed. Getting it wrong in criminal justice means people go to prison who shouldn't.

Ethical Question

In 2016, ProPublica investigated a tool called COMPAS, used by courts in multiple U.S. states to assess the likelihood that a defendant would commit another crime. COMPAS used a learned model. Researchers found it rated Black defendants as higher risk than white defendants at similar rates of actual re-offense. The company that made COMPAS said their algorithm was proprietary and could not be released. A defendant's lawyer couldn't see how the score was calculated. Should a person be sentenced based on a score from a system whose logic is a trade secret? There is no clean answer. Courts are still deciding.

Section 3: The Regulatory Response — When Governments Write Rules About Rules

In 2021, the European Union proposed the AI Act — the world's first comprehensive legal framework for regulating AI systems. One of its central ideas, which became law in 2024, is a risk classification system. AI applications are sorted into categories based on the potential harm of a failure:

Unacceptable risk: Banned outright. Social scoring systems like China's, real-time facial recognition in public spaces by law enforcement — prohibited.

High risk: Allowed but heavily regulated. Medical devices, hiring tools, credit scoring, criminal justice tools — these must be transparent, must have human oversight built in, must be tested for bias before deployment.

Limited risk: Must disclose that the user is interacting with AI. Chatbots, deepfake-generating tools.

Minimal risk: AI in games, spam filters — essentially unregulated.

Notice what the EU's framework is doing: it's making the rules-vs-learning question into a legal question. High-risk AI systems must have explainability built in — which means rule-based components, or at minimum, methods for explaining what a learning system decided. The law is forcing transparency into systems that would otherwise be black boxes.

You Now See What Most People Miss

When you hear debates about "AI regulation," they are fundamentally about this: should AI systems that make consequential decisions be required to use approaches that can be audited and explained? That's the rules-vs-learning question in policy form. Knowing the technical distinction gives you the ability to actually understand what's being argued — not just the politics of it.

Section 4: What Good Design Looks Like

Good AI system design doesn't pit rules against learning — it layers them deliberately. A well-designed medical diagnostic AI might use a deep learning model to flag potential abnormalities in a scan (the learning part, powerful and accurate), pass those flags to a rule-based system that checks whether they meet clinical criteria for follow-up (transparent and auditable), and then route the result to a human physician who makes the final decision (the human override layer).

Each layer has a job. The learning model does what rules can't — recognize subtle, complex patterns across thousands of variables. The rule layer does what the model can't — provide a traceable, certifiable decision path. The human layer does what neither can — take responsibility and adapt to individual circumstances the system wasn't designed for.

This isn't a perfect setup. It's slower. It's more expensive. It sometimes means the human overrides a correct AI recommendation because they didn't trust it. But in high-stakes domains, that cost is considered worth it — because the alternative is a system that can fail in ways nobody can explain or fix.

Lesson 3 Quiz

When Rules Win — apply the trade-offs

1. The Therac-25 disaster is used as a case study for AI safety even though it didn't involve machine learning. Why is it still relevant?

Correct. The Therac-25 produced a design principle: critical safety checks must be independently verifiable. For AI systems, this translates to requiring transparent, auditable layers between a model's output and a real-world action that could harm someone.

The lesson isn't about software vs. hardware in general. It's about the need for verifiable, auditable safety mechanisms in any system where failure can cause serious harm — a principle that directly applies to AI.

2. Under the EU AI Act (2024), which application would face the MOST regulatory scrutiny?

Correct. Mortgage approval is a high-risk application under the EU AI Act. It directly affects people's financial lives, has potential for bias, and requires explainability and human oversight. Spam filters and chatbots are minimal or limited risk.

The EU AI Act classifies risk by the potential impact of a failure. Spam filters and chatbots cause minimal harm if wrong. A mortgage decision can determine whether a family buys a home — and if the AI is biased, whole communities can be affected.

3. The COMPAS recidivism tool used a proprietary learned model to produce criminal sentencing recommendations. A defense lawyer argued this violated their client's due process rights. What is the strongest version of that argument?

Correct. Due process includes the right to confront and challenge evidence. If the evidence is a score from a black-box model and the logic is trade-secret, the defendant cannot examine how it was produced — which legal scholars argue violates fundamental fairness principles.

The argument isn't about computers in general or profit motives. It's specifically about the right to examine and challenge evidence. A score you can't audit is evidence you can't challenge.

4. A hospital's AI diagnostic system is very accurate at detecting cancer from X-rays, but a radiologist can sometimes tell the AI is wrong even when the AI's confidence score is high. What does good system design do with this information?

Correct. The radiologist's ability to catch AI errors — even at high AI confidence — is exactly why human oversight matters in high-stakes AI systems. Good design uses AI to amplify human judgment, not replace it.

The radiologist catching AI errors is a feature, not a problem. In medical AI design, the human's ability to override the model is a critical safety layer — especially in cases where the model is confidently wrong.

5. You're designing an AI system for an autonomous ship. The ship uses machine learning to navigate around other vessels and obstacles, but you want to add a rule-based override for collision avoidance. What is the primary reason for the override?

Correct. The rule-based override exists to guarantee a specific behavior in an extreme case. Machine learning outputs are probabilistic — they produce the most likely answer based on training. In a rare, critical scenario the model has never seen, that isn't enough.

The issue isn't speed or accuracy in typical cases. It's about what happens in the edge case — the extremely dangerous scenario that may never have appeared in training data. Rules guarantee behavior; models approximate it.

Lab 3: The System Designer

Design an AI-assisted parole review system — and justify every decision.

Your Role: AI Policy Architect

A state department of corrections has asked your team to design an AI-assisted parole review system. The goal is to help parole board members process more cases without increasing error rates. The department wants to use a machine learning model that predicts the likelihood of reoffending. Your job is to design the safeguards, human oversight layers, and rule-based components that must accompany the model — and justify why each piece is necessary.

Your contact is a senior policy engineer who will push back on every design choice that seems underjustified. They believe AI in criminal justice deserves the highest level of scrutiny of any application domain.

Start by describing your overall architecture: what does the AI do, what rules govern its output, and where does human judgment enter the process? Be specific about each layer and why it's there.

Policy Engineer Samir

Lab 3

Before you tell me what you'd build, I want you to sit with the stakes for a second. We're talking about a system that influences whether a real person stays in prison or goes home to their family. COMPAS already showed us what happens when we get this wrong. So: what's your first design principle — the one thing you'd refuse to compromise on — before you've even thought about which algorithm to use?

Module 4 · Lesson 4

The New Frontier: Learning That Learns to Learn

GPT-3 changed the question — and nobody is sure what the answer is yet

What happens when a system is so good at learning that it can do things nobody trained it to do?

In May 2020, OpenAI released a research paper describing a language model called GPT-3. The model had been trained on roughly 570 gigabytes of text from the internet — about 300 billion words — and it had 175 billion internal weights. It had been trained to do one thing: predict the next word in a sequence.

Then researchers started testing it. GPT-3 could write Python code without having been specifically trained on code. It could translate between languages without having been trained as a translation system. It could answer arithmetic questions, write legal summaries, compose poetry in the style of specific poets, and pass the bar exam. None of these were the task it was trained on. Predicting the next word — done at sufficient scale — turned out to produce a system that could do things that no one had written rules for, and that no one had specifically labeled training data for.

Researchers called this emergent capability — abilities that appear in a model not because they were trained directly, but because the model became so powerful at its core task that related abilities emerged as a side effect. It was, in a technical sense, the most surprising result in machine learning history. And it broke every clean story about how learning systems work.

Section 1: Emergent Capabilities — When Learning Surprises Its Creators

Classical machine learning is relatively predictable. You train a model to classify spam, it gets better at classifying spam. You train a model to recognize cats, it gets better at recognizing cats. The capability you get out is roughly the capability you trained for. This is the regime where the rules-vs-learning framework is cleanest.

Large language models — the kind that power ChatGPT, Claude, Gemini, and their cousins — operate differently. Because they are trained on enormous, diverse datasets and have billions of parameters, they develop capabilities that were not explicitly trained in. A 2022 paper from Google Brain, titled "Emergent Abilities of Large Language Models," catalogued more than a hundred capabilities — from multi-step arithmetic to chain-of-thought reasoning — that appeared suddenly in models above a certain scale and were absent below it.

Emergent capability:An ability that appears in a large AI model without being explicitly trained for — it arises from the model becoming sufficiently powerful at a more general task. These are often unpredicted before they appear.

This creates a new problem: if you can't fully predict what a model will be capable of, how do you design rules and safeguards around it? Rule-based safety depends on being able to enumerate what a system can do. Emergent capabilities, by definition, resist enumeration.

Section 2: What "Learning to Learn" Actually Means

Modern large language models exhibit something researchers call in-context learning. You don't need to retrain the model to give it a new task. You just describe the task in the prompt — show it a few examples — and it performs the task. This is fundamentally different from classical machine learning, where new tasks require new training runs.

In-context learning:The ability of a large language model to pick up and perform a new task just from examples shown in the current conversation — without any update to its internal weights.

This blurs the rules-vs-learning line in an interesting way. A developer can now write a few examples in a prompt — effectively writing rules — and the model learns to apply them to new cases. It's neither pure rules nor pure training. It's something in between: a system that uses its learned general capability to follow specific instructions that look like rules but are interpreted flexibly.

This is why large language models can feel simultaneously like they're following rules (they respond consistently to instructions) and like they're doing something no one programmed (they generalize beyond the examples in ways that are sometimes surprising and sometimes wrong).

Section 3: The Control Problem — Governing What You Can't Fully Enumerate

In 2023, the developers of GPT-4 ran an internal red-team evaluation — hiring human testers to try to make the model do harmful things — before releasing it. They found the model could help synthesize information that might be useful for creating dangerous substances, could impersonate individuals in convincing ways, and could be prompted to behave in ways that violated its intended guidelines if the prompt was constructed cleverly enough.

OpenAI addressed this through a combination of rule-based filters (hard blocks on certain output categories), additional training on what to refuse (reinforcement learning from human feedback — RLHF), and ongoing monitoring after deployment. But their own documentation acknowledged something significant: they could not guarantee that all harmful behaviors had been found, because the model's capability space was too large and too complex to enumerate completely.

This is the modern form of the rules-vs-learning problem. It's no longer "do we use rules or learning?" It's "how do we govern a system whose capabilities we can't fully know in advance?"

Ethical Question

If a company releases an AI system knowing that it has capabilities they haven't fully mapped — and one of those unmapped capabilities causes harm — what is their moral responsibility? This is being debated in legislatures, law schools, and philosophy departments simultaneously. There is no consensus. The law hasn't caught up. You are now thinking about this at the frontier of where the real conversation is happening.

Section 4: What This Means for You — Reading the World

You now have the full framework. You can look at any AI system and ask: Is it rule-based or learning-based? If it's learning-based, what was it trained on, and what biases might that data contain? Does it have verifiable safety interlocks for high-stakes decisions? Is it the kind of large, general model that might have capabilities its creators didn't anticipate?

These aren't exotic questions. They're the questions that regulators at the EU, the FDA, the FTC, and the U.S. Congress are asking right now — often without the technical grounding to answer them well. Understanding the rules-vs-learning distinction, and what it implies at every level from spam filters to GPT-4, puts you in a position to understand what's actually at stake in those debates.

The AI industry in 2024 is generating hundreds of billions of dollars of investment, influencing elections, changing medicine, and reshaping labor markets. Most of the people making decisions in those domains — investors, politicians, executives, journalists — are working from a much shallower understanding of how these systems work than you now have. That is not a small thing.

You Now See What Most People Miss

The rules-vs-learning debate isn't a technical detail. It is the central argument inside every AI governance question being discussed anywhere in the world right now. Should AI systems that make decisions about people be required to explain themselves? That's a rules question. Can they be required to explain themselves, given how they work? That's a learning question. You now understand both sides well enough to actually engage with the debate — not just observe it.

Lesson 4 Quiz

The New Frontier — reason through emergent AI

1. GPT-3 was trained primarily to predict the next word in text. What made its performance on tasks like translation and coding surprising?

Correct. The emergence of translation, coding, and reasoning from a next-word prediction model was not anticipated at that level of proficiency. It's what makes large language models different from earlier narrow AI systems — their general training produces specific capabilities no one explicitly designed for.

The surprise isn't that translation involves words — it's that a system trained only to predict the next word became this competent at tasks requiring semantic understanding, translation, and logical reasoning. That wasn't confidently predicted before GPT-3 demonstrated it.

2. A researcher writes a prompt to a large language model that says: "Here are three examples of legal briefs. Now write one for this new case." This is an example of:

Correct. In-context learning is when a model uses examples in the current conversation to understand and perform a task — without any change to its underlying parameters. The "learning" happens in the context window, not in the weights.

The model's weights aren't being updated. No retraining is occurring. The model is using its existing capability to generalize from the examples you've given it in the conversation itself — that's in-context learning.

3. OpenAI ran red-team evaluations before releasing GPT-4 and found potentially harmful capabilities they then tried to limit. However, they acknowledged they couldn't guarantee all harmful behaviors were found. What does this tell you about governing large language models compared to governing rule-based systems?

Correct. This is the core governance challenge for large models. Rule-based systems have a fixed, readable decision space. Large models have capability spaces that emerge from training in ways that even their creators can't fully enumerate in advance — making comprehensive pre-deployment safety audits structurally difficult.

The issue isn't that the models are categorically unsafe — it's that their capability spaces can't be fully enumerated before release in the way a rule-based system's logic can be audited. This makes governance fundamentally harder.

4. A government wants to require all AI systems that make decisions affecting citizens to be "fully explainable." Based on this module, which type of AI would be most difficult to make compliant with this requirement?

Correct. Rule-based systems, decision trees, and simple classifiers can be made explainable because their decision logic is readable and traceable. Large models store knowledge as billions of distributed weights — explaining why a specific output was produced requires approximations, not true transparency.

Rule-based systems are already inherently explainable. Decision trees are designed to be explainable. Keyword spam filters are trivially explainable. Large language models are the hard case — their decision-making is distributed across billions of numbers in a way that resists straightforward human-readable explanation.

5. You're advising a city government that wants to use AI to help predict where water pipe failures are likely to occur, so they can do preventive maintenance. Which design approach is most appropriate?

Correct. This layered approach uses machine learning's strength — pattern recognition across complex historical data — while keeping human engineers in the loop for final decisions. Pipe failure prediction isn't as immediately life-critical as medical decisions, but human oversight ensures the AI's recommendations are contextualized with local knowledge.

Neither pure rules nor pure learning alone is optimal here. The best design uses machine learning to flag likely failures (pattern recognition at scale) and human engineers to make decisions (contextual judgment and accountability). This is the layered design principle from Lesson 3.

Lab 4: The Capability Mapper

A company wants to deploy a large language model. You have to map what could go wrong — before it does.

Your Role: AI Risk Analyst

It's 2024. A mid-size insurance company called Arbor Insurance wants to deploy a large language model to handle initial customer claims conversations — collecting information, asking follow-up questions, and producing a preliminary claim summary that human adjusters then review. The CTO says: "The model has passed all our internal tests. It's ready to go." Your job is to conduct a pre-deployment capability mapping — figure out what the model might be capable of doing that wasn't tested, and what safeguards are needed before it touches real customers.

Your contact is an AI safety researcher who has done this kind of analysis for financial institutions. They won't tell you what the risks are — they'll ask you to find them yourself.

Start by identifying two capabilities that a large language model used for insurance claims intake might have that weren't specifically trained for — and explain why each one creates a risk in this deployment context.

Researcher Yael

Lab 4

The CTO says the model passed all internal tests. Here's my first question for you: what does "passed all internal tests" actually tell us about a large language model? Think about what we know about emergent capabilities before you answer — and then tell me whether you'd feel confident deploying based on that assurance alone.

Module 4 Test

15 questions · Score 80% or above to pass · Rules vs. Learning: The Big Difference

1. The core weakness of rule-based AI systems in adversarial environments (like spam filtering) is:

Correct. Rules are static and readable. Once an adversary knows the rules, they can engineer around them — which is exactly what spammers did to early keyword filters.

The fundamental issue is that rules can be read and gamed by anyone who wants to work around them — not a resource or data problem.

2. In the AlexNet story (2012 ImageNet competition), AlexNet's error rate was 15.3% while the second-place team's was 26.1%. The most important implication was:

Correct. The gap was so large — more than 10 percentage points — that it was recognized immediately as a phase transition. Within years, the entire computer vision field had reorganized around deep learning.

The win was decisive enough to signal that the fundamental approach had changed — not that a small improvement had been made. This is what made it a historical turning point.

3. "Loss" in machine learning training refers to:

Correct. Loss is the error signal. The training loop exists to reduce loss — to make the model's predictions progressively closer to the correct answers in the training data.

Loss is the measure of how wrong the model is on any given prediction. Minimizing it over many training examples is the core of how models learn.

4. Joy Buolamwini's Gender Shades (2018) study is most directly evidence that:

Correct. The bias wasn't programmed — it emerged from training datasets that underrepresented darker-skinned faces. The models learned faithfully from biased data and produced biased results.

The study's finding is about data representation, not deliberate programming. When training data skews toward certain groups, model performance skews toward those groups too.

5. A model achieves 97% accuracy on its training data but 61% on the held-out test set. This is called:

Correct. The high training accuracy combined with dramatically lower test accuracy is the classic signature of overfitting. The model learned the training examples too specifically.

The large gap between training and test performance always points to overfitting — the model memorized instead of generalizing.

6. The Therac-25 medical radiation device (1985–1987) is used in AI safety education because it demonstrated that:

Correct. The Therac-25 replaced hardware safety interlocks with software checks that had a bug — and there was no independent fallback. The lesson is that safety must be built into verifiable, independent layers in any life-critical system.

The Therac-25 wasn't an AI failure. The lesson it provides is architectural: safety in critical systems requires independent, verifiable mechanisms — not reliance on the primary system behaving correctly.

7. The EU AI Act (2024) classifies AI applications into risk categories. A parole recommendation system would fall under which category?

Correct. Criminal justice applications are explicitly listed as high-risk in the EU AI Act. They require human oversight, explainability, and pre-deployment bias evaluation — for exactly the reasons the COMPAS controversy illustrated.

Parole systems directly affect people's liberty. The EU AI Act puts criminal justice AI squarely in the high-risk category, requiring significant safeguards before deployment.

8. The COMPAS recidivism tool controversy centered on which specific technical/legal tension?

Correct. COMPAS's opacity was the legal issue. When a black-box score influences a sentence and its logic is trade-secret, defendants have no way to meaningfully challenge its accuracy or fairness — a potential due process violation.

The controversy was specifically about a non-explainable model being used in a context — criminal sentencing — where defendants have a right to examine and challenge evidence against them.

9. "In-context learning" in large language models means:

Correct. In-context learning is a property of large language models: they can adapt to new tasks from examples in a prompt, without any change to their weights. The "learning" is in how they use their existing capabilities, not a change to the model itself.

No weights are updated during in-context learning. The model uses its trained general capability to generalize from examples you place in the conversation itself.

10. An "emergent capability" in a large AI model is best described as:

Correct. Emergent capabilities are unpredicted — they appear when models exceed a certain scale threshold. This is what makes governing large models different from governing classical machine learning systems.

Emergent means unexpected and untrained-for. The capability appears as a side effect of the model becoming generally powerful — it wasn't a goal of the training process.

11. A doctor reviewing an AI diagnostic system's recommendation and overriding it based on clinical experience serves which function in the overall system design?

Correct. Human oversight isn't a sign of AI failure — it's a deliberate design feature. The doctor's review layer catches edge cases, provides accountability, and handles situations the AI wasn't trained for. It's the safety interlock in the layered system design.

The human override layer is intentional and valuable. Even a highly accurate AI model will have cases where it's wrong — the human layer ensures those cases are caught by someone who can take responsibility for the decision.

12. Which of the following is the best real-world example of a system that appropriately combines rule-based and learning-based components?

Correct. This is the layered design principle: use learning where the problem is too complex for rules (visual recognition), and use locked rules where behavior must be guaranteed (braking near people). Neither approach alone handles both requirements well.

Good hybrid design uses each approach where it's strongest — learning for complex pattern recognition, rules for safety-critical guaranteed behaviors.

13. A company defends their hiring AI by saying "it is objective because it is mathematical." Based on this module, what is the most accurate response?

Correct. The objectivity of the math doesn't make the outcome objective. If the training data encodes historical patterns of who was hired and promoted — patterns that may reflect past discrimination — the model learns to replicate those patterns while appearing neutral.

Mathematical processes are neutral. But mathematical processes trained on biased data produce biased outputs. The math doesn't sanitize the data.

14. Why does the concept of "weights" make machine learning models difficult to audit compared to rule-based systems?

Correct. A rule like "if age > 65, recommend lower dosage" is readable and auditable. A set of 175 billion floating-point numbers that together encode medical knowledge is not — even though both systems might make the same recommendation. The form of storage determines what can be audited.

The issue isn't secrecy or hardware — it's that knowledge distributed across billions of numerical values doesn't translate into human-readable logic the way written rules do.

15. OpenAI's pre-deployment red-teaming of GPT-4 found harmful capabilities even after extensive testing — and acknowledged some unknown capabilities may remain. This most directly illustrates which key challenge of governing large language models?

Correct. This is the core governance challenge that separates large language models from earlier AI systems. You can enumerate all the rules in a rule-based system. You cannot enumerate all the capabilities that may emerge from training a model of sufficient scale on diverse data — which fundamentally changes how safety must be approached.

The point isn't about the timeline or the method. It's about structural limits: emergent capabilities mean you cannot guarantee you've found all risks before deployment, in a way that wasn't true for rule-based systems.