By the spring of 2004, Gmail had not yet launched publicly. Engineers at Google were racing to solve a problem that had been quietly destroying email for years: spam. Not the occasional weird message β a tidal wave. By that year, researchers at Postini, a company that processed corporate email, reported that roughly 77% of all email sent on the internet was spam. More junk than real mail. Far more.
The first wave of spam filters worked by rules. Someone would sit down and write: if the subject line contains "FREE MONEY," block it. If the sender domain is from this list, block it. If there are more than three exclamation points, block it. Engineers called these rule-based filters, and for a while they worked. Then the spammers read the rules.
They started writing "FRβ¬β¬ M0Nβ¬Y." They registered new domains every day. They added legitimate-looking sentences copied from news articles to fool the exclamation-point detector. Every time an engineer added a new rule, spammers found the gap around it. The rule list grew to thousands of entries. It could never grow fast enough.
Then Google's engineers tried something different. Instead of writing rules, they fed their system millions of examples β emails humans had already labeled as spam or not spam β and let the system figure out its own patterns. The result was a filter that could catch spam that no one had ever seen before, because it had learned what spam felt like across thousands of subtle signals at once. That approach β learning from examples instead of following written rules β is the core of what we now call machine learning.
A rule-based system β sometimes called an expert system or a symbolic AI system β is exactly what it sounds like. A human expert thinks through every scenario they can imagine, writes down a rule for each one, and the program follows those rules. No exceptions. No interpretation. If the situation matches a rule, the system applies it. If no rule matches, the system is stuck.
This is powerful in narrow, well-defined situations. The rules that control a traffic light β green for 45 seconds, yellow for 5, red for 40 β never need to learn anything. The situation is fully understood. But the moment the world gets complicated and unpredictable, rules alone start to crack.
In the 1980s, a company called XCON β built by Digital Equipment Corporation and Carnegie Mellon University β became famous for configuring computer orders automatically using about 2,500 rules. It saved DEC an estimated $25 million per year by 1986. It was celebrated as proof that AI had finally arrived. But when computer hardware changed faster than engineers could update the rules, XCON started making errors. The rules couldn't keep up with a world that kept moving.
A machine learning system doesn't start with rules. It starts with data β thousands or millions of labeled examples β and finds patterns that humans might never think to write down. Instead of someone saying "spam emails often mention prizes," the system is shown 10 million emails and discovers on its own that spam tends to have certain word combinations, certain sender patterns, certain timing signals, all weighted together in ways too complex to describe as a simple rule.
The key difference: in a rule-based system, a human has to understand the problem well enough to write every rule. In a machine learning system, the machine discovers patterns from examples even when humans can't fully articulate what those patterns are.
That's powerful β but it also means the system can find the wrong patterns. If most of your spam training examples happen to be written in a particular language, the system might associate that language with spam β not because the language is the problem, but because of an accident in your data. The machine learned, but it learned something you didn't intend.
If a machine learning system finds a pattern you didn't put there β and that pattern turns out to be biased against a group of people β who is responsible? The programmer who built it? The company that deployed it? The person who collected the data? There's no clean answer here. This question is actively debated by researchers, courts, and governments right now.
When you hear someone say "AI," they almost always mean a machine learning system β not a rule-based one. But rule-based systems haven't disappeared. They're still inside your GPS (the routing algorithm has explicit rules about traffic laws), inside airplane autopilots (hard-coded rules for certain emergencies), and inside most financial trading systems (rules for what a bank is legally allowed to do).
The choice between rules and learning is a real engineering decision with real consequences. Rules are transparent β you can read them and understand why the system decided what it decided. Learning systems are often opaque β the pattern exists as millions of tiny numerical weights inside the model, and no one can read them the way you'd read a sentence.
This is why courts, hospitals, and governments often require rule-based logic for high-stakes decisions: they need to be able to explain why a decision was made. A judge can't just say "the algorithm said so." A doctor can't either. But a spam filter? Nobody needs a legal explanation for why their email got flagged.
Every time someone says "AI made a mistake" or "AI is biased," the useful question is: is this a rule-based system or a learning system? If it's rule-based, someone wrote a bad rule. If it's learning-based, something went wrong in the data or the training process. The fix is completely different in each case β and most people reporting on AI don't know which one they're talking about.
Modern AI systems often combine both approaches. A self-driving car might use machine learning to recognize pedestrians and road signs β because no human could write enough rules to describe every possible visual β but use explicit rules to decide what happens when a pedestrian is detected: brake. That rule is written in code. It will not be overridden by a learning system. Engineers decided that some decisions need to be locked.
When Tesla's Autopilot system was investigated after accidents in 2016 and again in 2021, regulators had to determine whether the failure was in the learning part (the perception of the road) or the rule part (what the car was supposed to do once a hazard was detected). The distinction mattered enormously for who was responsible and how to fix it.
Understanding which kind of system you're dealing with is no longer a techie detail. It's the kind of thing that affects accident investigations, insurance claims, medical diagnosis, hiring decisions, loan approvals, and criminal sentencing. Every one of those domains has AI in it now. Every one of them has the rules-vs-learning question sitting just beneath the surface.
A mid-size social platform called Tessera has just been sued because their moderation system incorrectly banned thousands of users over three months in early 2023. The company claims the system "uses AI to enforce community standards." Your job is to investigate whether their system is rule-based, learning-based, or a combination β and what that means for who is responsible for the errors.
Your AI contact is a fellow investigator who knows the technical side. They won't lecture you β they'll push back on weak arguments and ask you to defend your reasoning.
In 2009, a Stanford professor named Fei-Fei Li published something that looked, on the surface, like a very large spreadsheet. It was called ImageNet β a database of 14 million photographs, each one carefully labeled by humans. A photo of a golden retriever: labeled "dog." A photo of a Boeing 747: labeled "aircraft." An apple: labeled "fruit." Li had spent three years and crowdsourced the labeling work to tens of thousands of people through Amazon's Mechanical Turk platform, paying pennies per image.
Most computer vision researchers at the time were still trying to write rules for what a dog looks like β rules about ear shapes, fur textures, snout proportions. They were making slow progress. Then in 2012, a team from the University of Toronto β Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton β entered an annual competition called the ImageNet Large Scale Visual Recognition Challenge. Their system, called AlexNet, had not been given any rules about what a dog looks like. It had been shown millions of labeled images and trained to adjust its internal numbers until its error rate dropped.
AlexNet won the 2012 competition with an error rate of 15.3%. The second-place team had an error rate of 26.1%. It wasn't a close race β it was a rupture. In a single competition, a learning-based system had destroyed every rule-based approach that had come before. Within three years, nearly every computer vision lab in the world had switched approaches. The thing Fei-Fei Li had quietly assembled β millions of labeled examples β turned out to be the fuel that the learning systems needed.
Here's the fundamental loop that makes machine learning work. It has four steps, and they repeat thousands or millions of times.
Step 1 β Make a guess. The system looks at an example β say, a photo β and produces an output: "I think this is a cat." At the start of training, these guesses are essentially random, because the system's internal numbers haven't been adjusted yet.
Step 2 β Check the guess. The correct answer is known (because a human labeled the training data). The system compares its guess to the right answer and calculates how wrong it was. This measure of wrongness has a technical name: loss.
Step 3 β Adjust. The system uses the loss number to figure out which internal numbers should be shifted β and by how much β to make the next guess slightly less wrong. This adjustment process has a name: backpropagation (usually called "backprop"). The shifts are tiny β fractions of a fraction.
Step 4 β Repeat. Do this for millions of examples, over and over, and the system's guesses get progressively better. The internal numbers settle into a configuration that captures the patterns in the data.
After training, the machine hasn't stored a list of rules or a gallery of photos. It has stored a set of numbers β millions or billions of them β called weights. These weights define the patterns the system learned. When you show the trained system a new photo it has never seen, it runs that photo through its weights and produces a prediction.
No one can look at those weights and read them the way you'd read a book. Researchers can analyze what patterns certain parts of the network seem to respond to β the first few layers of image classifiers often respond to edges and colors, deeper layers to shapes, even deeper layers to whole objects β but this analysis is never complete. The knowledge is distributed across millions of numbers in a way that has no clean human-readable translation.
This is what people mean when they say AI is a "black box." It's not that the math is secret. It's that the knowledge is stored in a form that humans can't directly read. You can see the input, you can see the output, but the middle is opaque.
If a system's knowledge can't be read or explained in human terms, should it be allowed to make decisions that affect people's lives? Medical diagnosis. Loan approval. Parole decisions. All of these now have AI components. At what point does "the model says so" become an acceptable answer β and who decides where that line is?
Training data is not neutral. It is a snapshot of some part of the world, collected by specific people, in specific places, for specific purposes. Whatever biases exist in that collection get absorbed into the model's weights.
In 2018, researcher Joy Buolamwini at MIT published a study called "Gender Shades." She tested commercial face recognition systems from IBM, Microsoft, and Face++ on a set of faces representing different skin tones and genders. The systems were highly accurate β but not equally. Error rates were as low as 0.8% for light-skinned men and as high as 34.7% for dark-skinned women. The systems hadn't been told to be worse at recognizing darker-skinned faces. They had been trained on data that contained more light-skinned faces, and the patterns they learned reflected that imbalance.
This is a direct consequence of how machine learning works. The machine learned faithfully from its data. The problem was in what that data contained β and didn't contain.
When a company says "our AI is objective because it's just math," you now know that isn't quite right. The math is objective. But the data the math was trained on β and the choices about what to include and exclude β are human decisions with human biases baked in. The model is only as fair as the data it learned from.
One of the most important things a machine learning system has to do is generalize β perform well on new data it wasn't trained on. A spam filter trained on 2020 spam emails needs to catch spam written in 2024. An image classifier trained on photos from Europe needs to recognize objects in photos from other parts of the world.
When a model works beautifully on its training data but fails on new data, it has overfit β it memorized the training examples rather than learning general patterns. This is like a student who memorizes last year's exact test questions and then fails when the questions are slightly different.
Preventing overfitting is one of the central challenges in machine learning. It's the reason researchers keep aside a "test set" β data the model never sees during training β to evaluate how well the model generalizes. If it scores 98% on training data but 71% on the test set, something has gone wrong in the learning process.
When you hear news stories about AI systems that "performed brilliantly in testing but failed in the real world," you're almost always reading about a generalization or overfitting problem. The real world is always more varied than any training set can capture.
It's 2023. A large logistics company called Meridian Freight has been using an AI system to screen job applications since 2019. A watchdog group has filed a complaint: the system approved far fewer applications from candidates who attended historically Black colleges and universities (HBCUs) compared to candidates from other schools with similar average GPAs. Meridian says the system "just learned from historical hiring data." Your contact has deep technical knowledge and is here to help you figure out exactly where the bias entered the system.
Between June 1985 and January 1987, a radiation therapy machine called the Therac-25 gave six patients massive radiation overdoses. At least three of them died. The machine was built by a Canadian company called Atomic Energy of Canada Limited. It was used in hospitals across the United States and Canada to deliver precisely calibrated doses of radiation to cancer patients.
The Therac-25 was, in its time, sophisticated β it replaced earlier models that used hardware safety locks with software-only safety checks. The previous machines had physical interlocks: mechanical switches that literally could not allow the high-power beam to fire if certain conditions weren't met. The Therac-25 eliminated those hardware locks and replaced them with software. The software was supposed to check the same conditions. But it had a race condition β a bug where if an operator typed a command too quickly after correcting an entry, the safety check would complete before the correction was registered, and the machine would fire the full beam at a patient who was supposed to receive a lower dose.
The machine wasn't learning. There was no AI involved. But the Therac-25 disaster became the foundational case study that engineers use when they talk about when explicit, locked, auditable rules must be used β and why some decisions should never be handed to any system, AI or otherwise, that cannot be fully read and verified by humans.
The Therac-25 wasn't a machine learning failure. But the lessons it produced are directly applicable to AI. The core principle that emerged from the investigation β published by Nancy Leveson and Clark Turner in 1993 β was this: in systems where a failure can kill someone, safety must be enforced by mechanisms that are verifiable, auditable, and independent of the primary system.
For AI, this translates directly. A machine learning model that recommends a radiation dose cannot be the final authority on whether that dose gets delivered. A rule-based safety check β one that can be read, tested, and certified β must sit between the model and the real-world action.
Today, the FDA requires that medical AI systems include exactly this kind of explainable, verifiable safety layer. When the agency approved the first AI-powered radiology tools in the 2010s, the requirements included that a human physician must review AI recommendations before any treatment decision β because the learned model, no matter how accurate in testing, cannot bear final responsibility.
Here's the tension that engineers and policymakers are wrestling with right now. Machine learning systems are often more powerful at the actual task β better at spotting tumors, catching fraud, predicting failures β than rule-based systems. But they are less transparent. You can't read their decision logic. You can't audit their reasoning for a specific case.
Rule-based systems are the opposite. They're transparent β every decision can be traced. But they're limited by what humans could think to write down. And they require constant human maintenance as the world changes.
There is no solution that gives you both. This is a genuine trade-off, not a problem waiting for a clever fix. The choice of which approach to use β or how to combine them β is a design decision with moral weight. Getting it wrong in medicine means people die. Getting it wrong in hiring means careers are derailed. Getting it wrong in criminal justice means people go to prison who shouldn't.
In 2016, ProPublica investigated a tool called COMPAS, used by courts in multiple U.S. states to assess the likelihood that a defendant would commit another crime. COMPAS used a learned model. Researchers found it rated Black defendants as higher risk than white defendants at similar rates of actual re-offense. The company that made COMPAS said their algorithm was proprietary and could not be released. A defendant's lawyer couldn't see how the score was calculated. Should a person be sentenced based on a score from a system whose logic is a trade secret? There is no clean answer. Courts are still deciding.
In 2021, the European Union proposed the AI Act β the world's first comprehensive legal framework for regulating AI systems. One of its central ideas, which became law in 2024, is a risk classification system. AI applications are sorted into categories based on the potential harm of a failure:
Unacceptable risk: Banned outright. Social scoring systems like China's, real-time facial recognition in public spaces by law enforcement β prohibited.
High risk: Allowed but heavily regulated. Medical devices, hiring tools, credit scoring, criminal justice tools β these must be transparent, must have human oversight built in, must be tested for bias before deployment.
Limited risk: Must disclose that the user is interacting with AI. Chatbots, deepfake-generating tools.
Minimal risk: AI in games, spam filters β essentially unregulated.
Notice what the EU's framework is doing: it's making the rules-vs-learning question into a legal question. High-risk AI systems must have explainability built in β which means rule-based components, or at minimum, methods for explaining what a learning system decided. The law is forcing transparency into systems that would otherwise be black boxes.
When you hear debates about "AI regulation," they are fundamentally about this: should AI systems that make consequential decisions be required to use approaches that can be audited and explained? That's the rules-vs-learning question in policy form. Knowing the technical distinction gives you the ability to actually understand what's being argued β not just the politics of it.
Good AI system design doesn't pit rules against learning β it layers them deliberately. A well-designed medical diagnostic AI might use a deep learning model to flag potential abnormalities in a scan (the learning part, powerful and accurate), pass those flags to a rule-based system that checks whether they meet clinical criteria for follow-up (transparent and auditable), and then route the result to a human physician who makes the final decision (the human override layer).
Each layer has a job. The learning model does what rules can't β recognize subtle, complex patterns across thousands of variables. The rule layer does what the model can't β provide a traceable, certifiable decision path. The human layer does what neither can β take responsibility and adapt to individual circumstances the system wasn't designed for.
This isn't a perfect setup. It's slower. It's more expensive. It sometimes means the human overrides a correct AI recommendation because they didn't trust it. But in high-stakes domains, that cost is considered worth it β because the alternative is a system that can fail in ways nobody can explain or fix.
A state department of corrections has asked your team to design an AI-assisted parole review system. The goal is to help parole board members process more cases without increasing error rates. The department wants to use a machine learning model that predicts the likelihood of reoffending. Your job is to design the safeguards, human oversight layers, and rule-based components that must accompany the model β and justify why each piece is necessary.
Your contact is a senior policy engineer who will push back on every design choice that seems underjustified. They believe AI in criminal justice deserves the highest level of scrutiny of any application domain.
In May 2020, OpenAI released a research paper describing a language model called GPT-3. The model had been trained on roughly 570 gigabytes of text from the internet β about 300 billion words β and it had 175 billion internal weights. It had been trained to do one thing: predict the next word in a sequence.
Then researchers started testing it. GPT-3 could write Python code without having been specifically trained on code. It could translate between languages without having been trained as a translation system. It could answer arithmetic questions, write legal summaries, compose poetry in the style of specific poets, and pass the bar exam. None of these were the task it was trained on. Predicting the next word β done at sufficient scale β turned out to produce a system that could do things that no one had written rules for, and that no one had specifically labeled training data for.
Researchers called this emergent capability β abilities that appear in a model not because they were trained directly, but because the model became so powerful at its core task that related abilities emerged as a side effect. It was, in a technical sense, the most surprising result in machine learning history. And it broke every clean story about how learning systems work.
Classical machine learning is relatively predictable. You train a model to classify spam, it gets better at classifying spam. You train a model to recognize cats, it gets better at recognizing cats. The capability you get out is roughly the capability you trained for. This is the regime where the rules-vs-learning framework is cleanest.
Large language models β the kind that power ChatGPT, Claude, Gemini, and their cousins β operate differently. Because they are trained on enormous, diverse datasets and have billions of parameters, they develop capabilities that were not explicitly trained in. A 2022 paper from Google Brain, titled "Emergent Abilities of Large Language Models," catalogued more than a hundred capabilities β from multi-step arithmetic to chain-of-thought reasoning β that appeared suddenly in models above a certain scale and were absent below it.
This creates a new problem: if you can't fully predict what a model will be capable of, how do you design rules and safeguards around it? Rule-based safety depends on being able to enumerate what a system can do. Emergent capabilities, by definition, resist enumeration.
Modern large language models exhibit something researchers call in-context learning. You don't need to retrain the model to give it a new task. You just describe the task in the prompt β show it a few examples β and it performs the task. This is fundamentally different from classical machine learning, where new tasks require new training runs.
This blurs the rules-vs-learning line in an interesting way. A developer can now write a few examples in a prompt β effectively writing rules β and the model learns to apply them to new cases. It's neither pure rules nor pure training. It's something in between: a system that uses its learned general capability to follow specific instructions that look like rules but are interpreted flexibly.
This is why large language models can feel simultaneously like they're following rules (they respond consistently to instructions) and like they're doing something no one programmed (they generalize beyond the examples in ways that are sometimes surprising and sometimes wrong).
In 2023, the developers of GPT-4 ran an internal red-team evaluation β hiring human testers to try to make the model do harmful things β before releasing it. They found the model could help synthesize information that might be useful for creating dangerous substances, could impersonate individuals in convincing ways, and could be prompted to behave in ways that violated its intended guidelines if the prompt was constructed cleverly enough.
OpenAI addressed this through a combination of rule-based filters (hard blocks on certain output categories), additional training on what to refuse (reinforcement learning from human feedback β RLHF), and ongoing monitoring after deployment. But their own documentation acknowledged something significant: they could not guarantee that all harmful behaviors had been found, because the model's capability space was too large and too complex to enumerate completely.
This is the modern form of the rules-vs-learning problem. It's no longer "do we use rules or learning?" It's "how do we govern a system whose capabilities we can't fully know in advance?"
If a company releases an AI system knowing that it has capabilities they haven't fully mapped β and one of those unmapped capabilities causes harm β what is their moral responsibility? This is being debated in legislatures, law schools, and philosophy departments simultaneously. There is no consensus. The law hasn't caught up. You are now thinking about this at the frontier of where the real conversation is happening.
You now have the full framework. You can look at any AI system and ask: Is it rule-based or learning-based? If it's learning-based, what was it trained on, and what biases might that data contain? Does it have verifiable safety interlocks for high-stakes decisions? Is it the kind of large, general model that might have capabilities its creators didn't anticipate?
These aren't exotic questions. They're the questions that regulators at the EU, the FDA, the FTC, and the U.S. Congress are asking right now β often without the technical grounding to answer them well. Understanding the rules-vs-learning distinction, and what it implies at every level from spam filters to GPT-4, puts you in a position to understand what's actually at stake in those debates.
The AI industry in 2024 is generating hundreds of billions of dollars of investment, influencing elections, changing medicine, and reshaping labor markets. Most of the people making decisions in those domains β investors, politicians, executives, journalists β are working from a much shallower understanding of how these systems work than you now have. That is not a small thing.
The rules-vs-learning debate isn't a technical detail. It is the central argument inside every AI governance question being discussed anywhere in the world right now. Should AI systems that make decisions about people be required to explain themselves? That's a rules question. Can they be required to explain themselves, given how they work? That's a learning question. You now understand both sides well enough to actually engage with the debate β not just observe it.
It's 2024. A mid-size insurance company called Arbor Insurance wants to deploy a large language model to handle initial customer claims conversations β collecting information, asking follow-up questions, and producing a preliminary claim summary that human adjusters then review. The CTO says: "The model has passed all our internal tests. It's ready to go." Your job is to conduct a pre-deployment capability mapping β figure out what the model might be capable of doing that wasn't tested, and what safeguards are needed before it touches real customers.
Your contact is an AI safety researcher who has done this kind of analysis for financial institutions. They won't tell you what the risks are β they'll ask you to find them yourself.