In 2009, Google engineers announced they could predict flu outbreaks across the United States by analyzing which search queries people typed — terms like "fever," "aching muscles," and "flu medication." Their model tracked 45 search term combinations and matched CDC flu reports with startling accuracy. The media called it a triumph of big-data pattern matching. Then, in 2013, the system dramatically overestimated flu prevalence by a factor of two. Investigators found that Google had quietly changed its search algorithm, altering which suggestions appeared as users typed — changing the very data stream the model depended on. The patterns it had learned were tied to a platform's behavior, not just human biology.
Every AI model that learns from experience — rather than following hand-coded rules — begins with a dataset. A dataset is a structured collection of examples: images paired with labels, sentences paired with translations, medical scans paired with diagnoses, user clicks paired with purchase outcomes.
The model studies these examples statistically, extracting recurring patterns. It never truly "understands" the examples the way a person might; it finds mathematical regularities. So the dataset is everything — it is the entire reality the AI gets to see during training. Whatever biases, gaps, or errors exist in that data will shape the model's behavior long after training ends.
Modern AI datasets draw from several major sources:
The open web. Common Crawl is a nonprofit that has archived petabytes of web pages since 2008. GPT-2, GPT-3, and most large language models draw heavily from it. One major sub-corpus, WebText, was built entirely from Reddit posts that received at least three upvotes — effectively letting Reddit's voting system act as a quality filter.
Books. The Books1 and Books2 corpora used in GPT-3 training contained hundreds of thousands of digitized novels and nonfiction texts, giving language models exposure to long-form coherent prose.
Wikipedia. At roughly 20 billion words in English alone, Wikipedia is clean, structured, and edited — making it a high-quality anchor in many training mixes.
Human-labeled data. For tasks requiring precision — medical imaging, legal document review, handwriting recognition — teams of human annotators label examples manually. Amazon Mechanical Turk and similar platforms have employed millions of workers globally to create labeled training sets.
Princeton researcher Fei-Fei Li and her team used Amazon Mechanical Turk workers in 167 countries to label 14 million images across 21,841 categories. The result, ImageNet, became the benchmark that triggered the deep learning revolution when Alex Krizhevsky's AlexNet model slashed the ImageNet error rate from 26% to 15.3% in 2012. The data collection effort itself took over two years.
The Google Flu Trends failure illustrated a principle engineers call distribution shift — when the data the model encounters after deployment differs from the data it trained on. The training patterns were real but fragile: they depended on a stable relationship between search queries and flu behavior that changed when the platform changed.
A more systemic form of this problem is historical bias. In 2018, researchers at MIT and Microsoft published a study showing that commercial facial recognition systems from IBM, Microsoft, and Face++ had error rates up to 34.7 percentage points higher for dark-skinned women than for light-skinned men. The models had been trained on datasets of faces that skewed heavily white and male — and they learned patterns that reflected that skew.
An AI model cannot be fairer, more accurate, or more representative than the data it was trained on. "Garbage in, garbage out" is the oldest rule in computing — in machine learning, it applies at civilizational scale.
You've learned that training data shapes everything. Now interrogate those ideas. Ask about specific datasets, data collection methods, bias origins, or the Google Flu Trends / ImageNet cases. Dig into anything from this lesson.
In October 2012, a doctoral student named Alex Krizhevsky submitted an entry to ImageNet's annual Large Scale Visual Recognition Challenge. His neural network, later called AlexNet, had been trained for about a week on two NVIDIA GTX 580 graphics cards — consumer gaming hardware — using 1.2 million labeled images. AlexNet achieved a top-5 error rate of 15.3%, crushing the second-place entry at 26.2%. Every image it had trained on came with a human-supplied label. Every time it guessed wrong during training, an algorithm called backpropagation adjusted its internal weights to make a better guess next time. It did this roughly 90 epochs — 90 complete passes through the entire dataset — before it was ready.
Supervised learning is training with labeled examples. Each training example consists of an input — an image, a sentence, a set of sensor readings — paired with the correct output a human has assigned. The model makes a prediction, compares it to the correct answer, measures its error, and adjusts its internal parameters to reduce that error. Repeat millions or billions of times.
The algorithm that performs the adjustment is typically called gradient descent: mathematically walking downhill on an error landscape, step by step, toward lower and lower mistake rates. The process that calculates which direction is "downhill" through all the model's layers is called backpropagation, invented in its modern form by David Rumelhart, Geoffrey Hinton, and Ronald Williams in 1986.
The model learns to assign inputs to categories. Is this email spam or not-spam? Does this X-ray show a tumor? Which digit is this handwritten number? The training label is a category name.
The model learns to predict a continuous value. What will this house sell for? How many units will we ship next quarter? The label is a number, and the model minimizes the gap between its guess and the true value.
One of the earliest and most consequential supervised learning deployments was email spam filtering. In 2002, Paul Graham published "A Plan for Spam," describing a Bayesian probabilistic classifier trained on personal email — legitimate messages labeled "ham," junk labeled "spam." The model learned that certain word combinations had high spam probability. Users who trained the filter on their own mail got personalized accuracy.
By 2004, Microsoft Research's teams and Google's Gmail engineers were training neural classifiers on hundreds of millions of labeled messages. The "label" was implicit: messages users marked as spam or moved to inbox. This is sometimes called implicit supervision — users' actions become labels without anyone explicitly sitting down to annotate data.
Created in 1998 by Yann LeCun, Corinna Cortes, and Christopher Burges at Bell Labs, MNIST is a dataset of 70,000 handwritten digits (0–9) with human-supplied labels. It became the standard benchmark for testing new supervised learning algorithms. As of 2024, the best models achieve error rates below 0.2% on MNIST — essentially matching human performance on an extremely clean task. The dataset's clarity made it useful precisely because the labels were unambiguous: a "7" is a 7.
Supervised learning has a fundamental cost: someone has to label everything. For ImageNet's 14 million images, that took two years of crowdsourced labor. For medical AI systems, labels must come from certified specialists — making them expensive. For rare diseases or unusual events, labeled examples may simply not exist in sufficient quantity.
When a model is trained to a very high accuracy on training data but fails on new examples, we call this overfitting — the model has memorized the specific examples rather than learning the general pattern. Preventing overfitting while maximizing accuracy on unseen data is the central challenge of supervised learning engineering.
Imagine studying for an exam by memorizing the exact questions from last year's test — verbatim. You'd score 100% on that old exam, but fail completely if the teacher changed any question. Overfitting is the mathematical equivalent: perfect memory, poor generalization.
Supervised learning is the backbone of most deployed AI. Ask about how backpropagation works, why overfitting happens, how many training epochs AlexNet needed, or how spam filters use implicit labels. Challenge the AI to explain gradient descent in plain language.
In October 2015, DeepMind's AlphaGo defeated the European Go champion Fan Hui five games to zero — the first time a computer had beaten a professional Go player on a full 19×19 board without handicap. AlphaGo was not given a list of rules for good Go strategy. Instead, it was trained first on a dataset of 30 million human expert moves — supervised learning — and then underwent reinforcement learning by playing millions of games against copies of itself, receiving a positive signal for winning and a negative signal for losing. The Go board's near-infinite configuration space made rule-based programming impossible. The algorithm had to discover strategy on its own through consequence.
In unsupervised learning, the model receives data with no labels at all. Its task is to find structure, clusters, or patterns on its own. There is no correct answer to compare against; the model must discover organization the data already contains.
Clustering algorithms, like k-means, group data points that are mathematically similar. Netflix's earliest recommendation work used collaborative filtering — clustering users by viewing behavior without anyone labeling user "types."
Dimensionality reduction techniques like PCA (Principal Component Analysis) and t-SNE compress high-dimensional data into lower-dimensional representations, revealing structure invisible in raw numbers. Biologists use t-SNE to visualize which gene expression patterns cluster together across thousands of cells.
Groups similar data points together. Used in customer segmentation, document topic modeling, and genetic analysis — anywhere you want to discover natural groupings humans haven't pre-defined.
Neural networks trained to compress data into a small representation and then reconstruct it. The bottleneck forces the model to learn what matters most. Used for anomaly detection: an input that reconstructs poorly is probably unusual.
Reinforcement learning (RL) takes a different approach entirely. An agent operates in an environment, takes actions, and receives rewards or penalties. Through trial and error across many episodes, it learns a policy — a strategy for choosing actions that maximize cumulative reward over time.
Unlike supervised learning, there is no teacher providing correct answers. Unlike unsupervised learning, there is feedback — but it comes from the environment's rules, not human labels on data.
OpenAI Five played 45,000 years of Dota 2 against itself in a single month using 256 GPUs and 128,000 CPU cores. The reward signal was simply winning or losing. No human ever labeled a good move or a bad move. By 2019, OpenAI Five defeated the reigning world champion team OG two games to zero. The agents discovered team coordination, resource management, and sacrifice strategies entirely through accumulated consequence — millions of games of trial, error, and reward.
Today's most powerful AI systems blend all three paradigms. GPT-4 was first pretrained on web text using a form of self-supervised learning (predicting the next word — which is unsupervised in spirit). It was then fine-tuned with supervised learning on human-written examples. Finally, it was shaped by Reinforcement Learning from Human Feedback (RLHF) — human raters scored outputs, and those scores became a reward signal for further training.
RLHF was introduced by OpenAI researchers in a 2017 paper and became central to making ChatGPT helpful rather than merely technically accurate. The human raters essentially acted as the "environment" giving rewards — making real people part of the reinforcement learning loop.
You've learned that AI can discover patterns without labels and learn strategy from nothing but win/loss signals. Ask the AI to explain any of these concepts more deeply — RLHF, how AlphaGo Zero worked differently from AlphaGo, what a policy is in RL, or how clustering reveals hidden groups in data.
In 2016, researchers at the University of Washington published a study showing that a deep learning skin cancer classifier performed impressively on test images — correctly identifying malignant melanomas at dermatologist-level accuracy. Then they examined what the model was actually using to make its decisions. Using a technique called LIME (Local Interpretable Model-Agnostic Explanations), they found that the classifier had learned to associate ruler markings in medical photographs with malignancy. Dermatologists routinely place rulers next to suspicious lesions to document size — meaning malignant-lesion images happened to contain rulers more often than benign-lesion images. The model learned a spurious correlation, not the clinical pattern. It passed the accuracy test for completely the wrong reasons.
When a model learns something that correlates with the right answer in training data but doesn't causally explain it, the result is a spurious correlation. The model works — sometimes brilliantly — until it encounters data where the spurious feature and the real answer come apart.
In 2020, MIT researchers tested several COVID-19 diagnostic AI models that had been trained on chest X-rays from different hospitals. Many had learned to identify scanner make, image resolution, or patient positioning artifacts that happened to correlate with COVID status in their training sets — because some hospitals had more COVID patients and also happened to use particular imaging equipment. These patterns had nothing to do with the virus and would fail immediately if deployed at a different hospital.
Reuters reported in October 2018 that Amazon had quietly abandoned a machine learning recruiting tool after discovering it systematically downgraded resumes from women. The model had been trained on ten years of Amazon's own hiring decisions — predominantly male hires, reflecting the tech industry at the time. The model learned that male-associated terms and features correlated with successful hiring, so it penalized resumes that included the word "women's" (as in "women's chess club") and downgraded graduates of two all-women colleges. Amazon scrapped the project rather than deploy a system it could not make gender-neutral.
In 2014, Ian Goodfellow, Jonathon Shlens, and Christian Szegedy at Google demonstrated that adding imperceptible noise to an image — changes invisible to human eyes — could cause a neural network to completely change its classification. A panda image with carefully computed pixel noise was classified as a gibbon with 99.3% confidence. The modified image looked identical to humans.
This revealed something profound: neural networks learn patterns that are mathematically real but perceptually alien — they pick up on statistical features in pixel space that don't map to how humans see the world. This isn't a bug in one model; it appears to be a structural property of gradient-descent-trained classifiers.
In 2019, researchers showed that printed stickers placed on a stop sign could cause a self-driving car's classifier to misread it as a speed limit sign at specific angles. Real physical adversarial patches — not just digital noise — could fool deployed systems.
Models exploit the fastest statistical path to reducing training error, not the most meaningful one. A "horse" classifier might learn to detect watermarks from a specific photo archive rather than actual horses — because every horse image happened to come from that archive.
When test data leaks into training data, models appear to perform better than they actually do. GPT models' benchmark scores have been questioned because pre-training on the web may include the benchmark questions themselves.
The skin cancer classifier failure was caught because researchers asked why the model made its predictions — not just whether it was accurate on a test set. This field is called explainable AI (XAI) or interpretability.
Techniques like LIME and SHAP (SHapley Additive exPlanations) try to attribute a model's decision to specific input features. Attention visualization shows which words a language model attends to most when generating an answer. None of these techniques is perfect — all are approximations — but they have caught real errors before deployment.
In 2022, the EU's AI Act proposed mandatory conformity assessments for "high-risk" AI systems — medical, hiring, law enforcement — requiring documentation of training data, testing procedures, and bias audits before deployment. The ruler-in-the-photo problem is precisely what such audits are designed to catch.
Accuracy on a test set is necessary but not sufficient. A model can score 95% and be learning completely the wrong thing. Understanding what patterns an AI has actually learned — not just how often it's right — is the frontier challenge of machine learning safety.
You've seen how AI can learn with high accuracy while being completely wrong about why. Now probe those ideas. Ask the AI to explain how LIME works, what SHAP values reveal, how physical adversarial patches fool self-driving cars, or how the Amazon recruiting failure could have been caught earlier.