Module 6 · Lesson 1

What a Classifier Actually Does

Every AI decision starts with a question that has a limited set of answers.

How does a machine look at something and put it into a category — and why does that process go wrong in ways that matter?

In the summer of 2015, Google Photos shipped a new feature it was proud of: automatic photo tagging. You upload pictures, the system scans them, and it assigns labels — "beach," "birthday party," "dog." The engineers tested it on thousands of images. The accuracy numbers were high. They shipped it.

Then, on June 28, 2015, a software developer named Jacky Alcine opened his Google Photos app and found that the system had automatically sorted photos of him and a friend — both Black — into a folder it had labeled "Gorillas."

It was not a fringe case. The classifier had looked at the images, extracted features — skin tone, hair texture, facial geometry — and matched those features to a category. It had done exactly what it was designed to do. The problem was not a bug in the traditional sense. The problem was in how the whole system had been built from the start.

The Core Job: Sorting the World Into Buckets

A classifier is any system — machine or human — that looks at something and assigns it to a category. That's the whole job. Is this email spam or not spam? Is this tumor malignant or benign? Is this photo a cat, a dog, or something else? Classifier. Classifier. Classifier.

What makes machine classifiers interesting — and dangerous — is that they don't reason their way to a category the way you might. They don't think, "Hmm, this looks like a dog because it has four legs, fur, and is panting." Instead, they identify features — numerical measurements extracted from the input — and use those features to decide which category the input belongs to.

A feature is just a number that captures something measurable about the input. For an image, features might include the average brightness of a region, the frequency of certain colors, or the angle of edges detected in the image. For an email, features might include how many times the word "free" appears, whether the sender address has numbers in it, or the length of the subject line. The machine never "sees" the image or "reads" the email the way you do. It sees a list of numbers and runs a calculation.

ClassifierA system that takes an input and assigns it to one of a fixed set of categories based on patterns in that input's features.

FeatureA single measurable property of the input — a number that the classifier uses as evidence for or against each category.

The Decision Boundary: Drawing a Line in Feature Space

Imagine you're trying to sort fruit. You measure two things about each piece of fruit: its weight in grams and its redness on a scale of 0 to 100. Apples tend to be moderately heavy and very red. Lemons tend to be light and not red at all. If you plotted every fruit on a chart — weight on one axis, redness on the other — you'd see clusters. Apples over here, lemons over there.

A classifier learns where to draw a line between those clusters. That line is called the decision boundary. Any fruit whose measurements land on one side of the line gets called an apple. Any fruit on the other side gets called a lemon. The machine doesn't know what fruit is. It just knows which side of the line the numbers fall on.

Now here's what's important: that line was drawn based on the training data — the specific fruits the system was shown before it was tested. If all the training apples were Granny Smith (pale green, not particularly red), the line might be drawn in completely the wrong place for red apples. The classifier learned a line that works for its training data. Whether that line works for reality depends entirely on how well the training data represents reality.

Feature Evidence Weights — Example: Email Spam Classifier

Word "free"

0.87

Caps ratio

0.74

Link count

0.58

Known sender

0.21

Subject length

0.13

Why the Google Photos Failure Was Structural, Not Random

Back to Jacky Alcine. When Google's classifier looked at his photos, it was doing exactly what it was trained to do: match visual features to the category in its training data that those features most resembled. The problem was that the system's training data severely underrepresented images of dark-skinned people. The features that distinguish human faces from non-human faces were, for many skin tones, poorly learned — because the system had barely seen them.

Google's response was, in its own words, imperfect: they removed the "gorilla" label entirely from the system. By 2023, researchers found that Google Photos still wouldn't label images of gorillas, chimps, or several other primates — the company had simply deleted those categories rather than fix the underlying representation problem. Eight years later.

This is what makes classifiers consequential beyond test scores. A system can have 95% accuracy overall and still systematically misclassify specific groups of people. The accuracy number hides whose accuracy it is.

Ethical Question — No Clean Answer

If a medical imaging classifier is 97% accurate on patients of European descent and 82% accurate on patients of African descent, should it be approved for use? Who gets to make that call — the company, a government regulator, the hospital, or the patients themselves? And if you say "fix it first," who pays for the additional data collection, and does that delay cost lives in the meantime?

What You Can Now See

When you hear that an AI system "classifies" something — a loan application, a medical scan, a social media post for moderation — you now know what that means at a mechanical level. The system is measuring features, comparing those measurements to a learned decision boundary, and outputting a category. It is not reasoning. It is not understanding. It is sorting numbers.

That means three things are always worth asking: What features was it measuring? Whose data drew the decision boundary? And whose experience gets averaged out by the accuracy number? Most people reading headlines about AI never think to ask those questions. You do now.

You Can Now See What Most People Miss

Every time you encounter a story about an AI system making a mistake — a wrong arrest, a misdiagnosis, a biased hiring decision — the underlying mechanism is almost always a classifier that generalized badly from its training data. Knowing this, you can ask the right questions instead of just being surprised.

Quiz — Lesson 1

What a Classifier Actually Does

1. What is a "feature" in the context of a machine classifier?

Correct. Features are numerical measurements — pixel brightness, word counts, frequency ratios — that the classifier uses as evidence. The machine never perceives the input directly; it operates on these numbers.

Not quite. A feature is a numerical measurement extracted from the input. Rules are written by humans; features are measured from data.

2. A hospital deploys a skin-cancer classifier that achieves 96% accuracy overall. A dermatologist notices it misses melanoma in darker-skinned patients at twice the rate. What does this most likely reveal about how the system was built?

Exactly right. High overall accuracy can coexist with systematic failure on underrepresented groups. The decision boundary was learned from data that skewed toward lighter skin tones, so it works poorly where the training data was thin.

Think about training data. If the training examples were mostly one type of patient, the decision boundary learned from them may not generalize to other patients — even if the overall accuracy number looks fine.

3. What is a decision boundary?

Correct. The decision boundary is the dividing line in the space of all possible feature combinations. Everything on one side gets one label; everything on the other side gets the other label.

The decision boundary is the dividing line in feature space. Inputs on one side get one label; inputs on the other side get a different label.

4. Google Photos responded to the 2015 gorilla misclassification primarily by removing the "gorilla" label entirely. What does this approach NOT solve?

Right. Deleting a label is a patch, not a fix. The training data still doesn't represent certain groups well, which means other misclassifications with other labels remain possible — and the system's accuracy is still unequal across groups.

Think about what removing a label does and doesn't change. It prevents that specific word from appearing, but it doesn't change the underlying data or how well the system learned to represent different groups of people.

5. You're designing a spam filter. Which feature would MOST likely help distinguish spam from legitimate email?

Good reasoning. Spam emails often use ALL CAPS for emphasis in subject lines ("CLAIM YOUR PRIZE NOW"). A high caps ratio is a measurable, consistent pattern. The other features are either too variable or too easy to game.

Think about what patterns actually differ between spam and real email in a consistent, measurable way. Some features are too noisy or too easy to fake to be useful.

Lab 1 — Feature Detective

You're auditing a classifier. Your job is to find its blind spots.

Your Role: Independent Auditor

A company has built an automated content moderation classifier. It scans social media posts and labels them either "safe" or "harmful." The company says it's 94% accurate. You've been asked to evaluate whether it should be deployed. Your lab partner will push back on your reasoning — that's the point.

Start by describing two specific features the classifier might be using to detect "harmful" content — and then explain why each feature could cause problems for certain groups of users. Take a position. Your partner will challenge it.

Lab Partner — CIPHER Auditor Mode

94% accuracy sounds pretty good, right? Before you start poking holes, tell me: what two features do you think this content classifier is probably using, and why would those features cause problems? Be specific — I'm not accepting vague answers.

Module 6 · Lesson 2

Training Data — The World You Show the Machine

A classifier learns from examples. Which examples you choose is everything.

If you could hand-pick every example a machine learned from, what would you be tempted to leave out — and who would pay for that choice?

In October 2018, Amazon quietly shut down a recruiting AI tool it had been developing for four years. The system was designed to do something Amazon desperately wanted: automatically review resumes and score candidates on a scale of one to five stars, filtering out the weak ones before a human recruiter ever looked at them.

The system had been trained on ten years of Amazon's own hiring data — resumes submitted between 2004 and 2014 and the hiring decisions made on them. It learned from what Amazon had actually done. The problem: Amazon's tech workforce during that decade was overwhelmingly male. The classifier learned, very efficiently, that certain patterns correlated with being hired. Those patterns included not going to all-women's colleges. They included not having the word "women's" anywhere on the resume — as in "Captain of Women's Chess Club" or "Women in Engineering scholarship recipient."

The system wasn't programmed to discriminate. Nobody wrote a rule that said "penalize women." It discovered the correlation on its own, from data that reflected a decade of human hiring decisions that had, themselves, systematically disadvantaged women. The machine learned the bias because the bias was in the data.

What Training Data Actually Is

When you train a classifier, you show it a collection of labeled examples: "this is spam," "this is not spam," "this resume got hired," "this one didn't." That collection is the training data. The classifier's entire understanding of the world comes from these examples. It has no other source of information. It cannot look outside the dataset. It cannot apply common sense. It learns exactly what the data teaches it — nothing more, nothing less.

This means training data is not neutral. It is a record of decisions, measurements, and observations made by people — people who had biases, made errors, worked in institutions with particular histories. When you hand that data to a machine and ask it to learn from it, you are asking it to replicate the patterns in those decisions, including the bad ones.

Amazon's data reflected ten years of human bias in tech hiring. The machine faithfully compressed that history into a scoring function. When Amazon's engineers found out, they tried to fix it — telling the system to ignore certain signals. But because gender correlates with so many other things (school names, club names, volunteer descriptions, writing style patterns), they couldn't fully remove the signal. They shut the tool down instead.

The Core Principle

A classifier trained on biased data doesn't become unbiased just because a computer is running it. Automation amplifies the patterns in training data. It makes them faster, more consistent, and harder to notice — which can make bias worse, not better.

Three Ways Training Data Goes Wrong

Training data fails in predictable patterns. Once you know them, you'll see them everywhere.

1. Historical bias. The data reflects past human decisions that were themselves unfair. Amazon's case is a textbook example. When you train on past outcomes in domains where discrimination existed — lending, hiring, criminal justice — you bake that discrimination into the model.

2. Representation bias. Some groups appear rarely or not at all in the training data. Recall Google Photos: when your training dataset has thousands of images of light-skinned faces and hundreds of dark-skinned faces, the decision boundary for "human face" gets drawn much more precisely for one group than the other. The system isn't trying to discriminate — it's just uncertain where it hasn't seen much data.

3. Measurement bias. The labels in the training data were assigned in a way that wasn't consistent across groups. Imagine a classifier trained to predict "creditworthiness" using historical loan data. But historically, banks scrutinized loan applications from Black applicants more aggressively than white applicants. That means more defaults from white applicants may have gone undetected and unrecorded. The data looks like one group is riskier, when really the measurement process was uneven.

Historical Bias

Data reflects past decisions made under conditions of unfairness. The classifier learns to reproduce those decisions.

Representation Bias

Some groups appear so rarely that the model's decision boundary is poorly defined for them. It works well on who it saw most.

Measurement Bias

Labels were assigned less consistently for some groups. The data looks like real signal but actually reflects inconsistent observation.

The Harder Problem: You Can't Always Fix It by Adding Data

A natural response to representation bias is: just add more data from underrepresented groups. Sometimes that works. But it doesn't work for historical bias, because the additional data you have access to was generated under the same unfair conditions. If you add more hiring decisions from 2015–2020 to fix Amazon's classifier, and those decisions were also made in an industry with gender imbalances, you've added more data with the same historical bias baked in.

And there's a deeper issue: sometimes the "correct" label is itself contested. Classifiers trained to predict recidivism (whether a convicted person will commit another crime) use data about who got re-arrested. But re-arrest rates are affected by how intensely police patrol different neighborhoods. The data reflects policing practices, not just individual behavior. You can't fix that with more data — you'd need different data collected under different conditions.

This is why the question of training data is not just a technical problem. It is a political and ethical one. The decisions about what data to collect, whose outcomes to treat as "ground truth," and what the training labels actually mean — those are all human choices that happen before the algorithm runs.

Institutional-Level Stakes

In 2019, the U.S. Department of Housing and Urban Development sued Facebook over its ad-targeting system, arguing that it functioned as an illegal housing discrimination tool. The algorithm hadn't been designed to discriminate. But it had learned, from user behavior data, that certain ads correlated with certain demographic groups — and it used that to target or exclude users from seeing housing ads. The training data was the world's existing segregation patterns. The algorithm made them faster and more efficient. Courts are still working out how the Fair Housing Act applies to machine classifiers. You are living in the era where these rules are being written.

Ethical Question — No Clean Answer

If a company can't build a hiring classifier that doesn't reproduce historical discrimination, should they use one at all — even if human recruiters are also biased, just more slowly and less consistently? Is a biased algorithm better or worse than a biased human, and does the answer change depending on who's asking?

Quiz — Lesson 2

Training Data — The World You Show the Machine

1. Amazon's recruiting AI penalized resumes that mentioned "women's" organizations. The engineers did NOT program this rule. Where did it come from?

Correct. The system discovered the correlation itself because the training data — real Amazon hiring decisions over ten years — underrepresented women who were hired. The bias wasn't programmed; it was learned.

Nobody programmed a rule. The classifier found the pattern in the training data automatically. That data reflected a decade of human hiring decisions made in an industry with documented gender imbalances.

2. A school district wants to use a classifier to predict which students are at risk of dropping out, trained on five years of attendance and grade data. What is the MOST important question to ask before deploying it?

Exactly. The most important question is whether the training data fairly represents the conditions different students faced. If certain groups had barriers (housing instability, language barriers, school resource gaps) that attendance records don't capture, the model will misread those students.

Technical performance questions matter, but the most consequential question is about the training data itself — specifically, whether the historical dropout patterns reflect systemic conditions that affect certain groups differently.

3. What is "measurement bias" in training data?

Correct. Measurement bias happens when the process of generating the labels wasn't applied equally. If one group was scrutinized more intensely, their data will show more "failures" — not because they fail more, but because they were watched more closely.

Measurement bias is about the labeling process, not the size or units of the data. It occurs when the same behavior gets labeled differently depending on who is doing the behavior or who is doing the observing.

4. Why might simply "adding more data" from underrepresented groups fail to fix historical bias?

Right. If the historical conditions that produced the bias are still operating when the new data is collected, the additional examples carry the same patterns. More data of the same biased type isn't a solution.

The issue isn't quantity — it's quality and conditions. If new data about underrepresented groups was collected under the same unfair circumstances (same biased hiring managers, same unequal policing patterns), adding it just reinforces the problem.

5. A recidivism classifier (predicting who will re-offend) is trained on re-arrest data from neighborhoods with very different levels of police presence. This is an example of which type of bias?

Correct. Re-arrest is a measurement that depends on police presence. In heavily policed areas, the same behavior generates more arrests. The data looks like it reflects individual behavior but actually reflects surveillance patterns — that's measurement bias.

Think about what the label "re-arrested" actually measures. Is it the same thing as "re-offended"? In heavily policed neighborhoods, the same behavior generates more arrests — so the label isn't applied consistently across areas.

Lab 2 — Training Data Auditor

You're deciding whether a training dataset is safe to use.

Your Role: Data Ethics Reviewer

A city wants to build a classifier to predict which neighborhoods need additional social services — food assistance, mental health support, youth programs. They propose to train it on five years of 911 call data, school suspension rates, and emergency room visits by zip code. They say it's objective because it's all real data. You've been asked to evaluate whether this training data should be used.

Identify at least one specific type of bias that could be embedded in this training data and explain how it would affect which neighborhoods the classifier recommends for services. Then take a position: is this dataset usable, fixable, or should it be abandoned? Defend your answer.

Lab Partner — CIPHER Data Ethics Mode

Interesting problem. Before you decide whether the dataset is usable, I need you to name a specific type of bias — historical, representation, or measurement — and explain exactly how it shows up in this particular data. Don't be abstract. Use the 911 call data, the suspension rates, or the ER visits specifically.

Module 6 · Lesson 3

How a Classifier Learns — Weights, Errors, and Iteration

Learning is not a moment. It is a process of making and correcting mistakes — thousands of times.

What actually happens inside a machine when it "learns" to classify something — and who decides when it's learned enough?

In the fall of 2012, a team of researchers led by Alex Krizhevsky, supervised by Geoffrey Hinton at the University of Toronto, entered a computer vision competition called ImageNet Large Scale Visual Recognition Challenge (ILSVRC). The competition had been running since 2010. The best systems that year were making classification errors on about 26% of images. Krizhevsky's system — a deep neural network he called AlexNet — achieved an error rate of 15.3%. The second-place team's error rate was 26.2%. AlexNet hadn't just won. It had nearly cut the error rate in half.

The result sent shockwaves through the field. Within two years, deep neural networks had replaced almost every other approach to image classification. The approach wasn't new — Hinton had been pushing it for decades. What changed in 2012 was that the hardware had finally caught up: AlexNet ran on two consumer graphics cards that could perform the billions of calculations required for training in a reasonable amount of time. The same training job that would have taken weeks now took days.

But what had AlexNet actually learned? Nobody could fully say. It had adjusted millions of numerical values — called weights — through a process that looked at each image, made a guess, measured how wrong the guess was, and nudged each weight in a direction that would make future guesses less wrong. Repeat that 1.2 million times per training pass, for dozens of passes. That was the learning.

Weights: What Gets Adjusted When a Classifier Learns

A weight is a number that controls how much influence one feature has on the final classification decision. Think of it as a volume knob on a mixing board. You have dozens of knobs — one for each feature. Turn a knob up, and that feature gets more say in the classification. Turn it down, and it barely matters.

When a classifier is initialized (created fresh), its weights are usually set to small random numbers. The system doesn't know anything yet. Then training begins: you show it a labeled example, it makes a prediction, and you calculate how wrong the prediction was. That wrongness is called the loss. The training algorithm then adjusts every single weight slightly — in the direction that would have produced less loss on that example. Then you show it the next example and do it again. This process, called gradient descent, repeats across the entire training dataset, often hundreds of times over.

By the end, the weights have settled into values that produce reasonably small loss across the training examples. The system has learned — not by understanding anything, but by nudging a million dials until the numbers worked out.

WeightA number that controls how much a given feature influences the classification output. Weights are what get adjusted during training.

LossA measure of how wrong the classifier's prediction was on a given example. Training minimizes the average loss across all examples.

Gradient DescentThe process of adjusting weights in the direction that reduces loss, one small step at a time, over thousands of iterations.

Overfitting: Learning the Wrong Lesson

Here is the most counterintuitive problem in machine learning: a classifier can get worse by learning too well from its training data. This is called overfitting.

Imagine you're studying for a test by memorizing every practice problem — not the underlying concept, just the exact answers. When the real test shows up with slightly different questions, you're lost. You learned the surface patterns of the practice problems, not the general principle. A classifier does the same thing when it adjusts its weights so precisely to the training examples that it starts capturing the noise — the random, specific quirks of those particular examples — rather than the general patterns that would apply to new data.

An overfit classifier has very low training error and very high real-world error. It looks brilliant on the data it was trained on and fails on anything else. This is why you always need to test a classifier on data it has never seen before — called a test set. If the performance gap between training data and test data is large, the model has overfit. It learned the training set, not the world.

The practical fix is usually to use simpler models (fewer weights), use more training data, or use techniques like regularization that deliberately penalize the classifier for having weights that are too extreme. All of these push the model toward learning general patterns rather than specific quirks.

For the Curious — What Gradient Descent Actually Computes

Imagine the loss as a landscape — a hilly surface where the height at any point represents how wrong the classifier is with those particular weight values. Gradient descent is a process of finding the lowest valley in that landscape by always stepping in the downhill direction. The "gradient" is just the slope — the direction of steepest descent. You follow the slope until you stop getting lower. That's the trained model.

Confidence Scores: The Classifier Is Always Making a Bet

Most classifiers don't just output a category — they output a probability. "I'm 91% confident this is spam." "I'm 63% confident this mole is benign." That number is not a guarantee. It's the classifier's internal accounting of where the input lands relative to its decision boundary. Inputs far from the boundary get high confidence scores. Inputs near the boundary get low confidence scores.

This matters enormously in deployment. A hospital that uses a cancer-detection classifier without checking confidence scores will treat a 91%-confident result the same as a 54%-confident result. The classifier is nearly guessing on the second one, but the output looks the same: "BENIGN." Many deployed systems show only the final label, hiding the confidence score. This is a design choice — and a consequential one.

It also matters that confidence doesn't equal calibration. A classifier that outputs "90% confident" should be right about 90% of the time on those cases. But many classifiers are overconfident — they say 90% when they're actually right only 70% of the time. Calibration — matching confidence scores to real accuracy — is a separate property from accuracy, and it requires separate evaluation.

What You Can Now See That Most People Miss

When a company says their AI is "94% accurate," you now know that number hides several things: whether that accuracy is equal across groups, what happens on inputs near the decision boundary, whether confidence scores are calibrated, and whether the model was tested on data that truly resembles the real world it will encounter. You have the framework to ask every question that number is designed to prevent you from asking.

Ethical Question — No Clean Answer

If a medical classifier outputs a confidence score of 57% but the label says "benign," should the doctor see the score? Some argue that showing uncertainty causes doctors to second-guess themselves unnecessarily. Others argue that hiding it is a form of deception that removes the doctor's ability to make informed decisions. Who should control what information a deployed AI shows its users?

Quiz — Lesson 3

How a Classifier Learns — Weights, Errors, and Iteration

1. What is the role of "weights" in a classifier?

Correct. Weights are the dials that control each feature's influence. Training adjusts them incrementally to reduce prediction error across the training examples.

Weights are not about the input data — they're about how the classifier processes that data. Each weight controls how much a given feature "matters" in making the final prediction, and those values are adjusted during training.

2. A classifier gets 99% accuracy on its training data but only 72% accuracy when tested on new data. What has most likely happened?

Correct. A large gap between training accuracy and test accuracy is the signature of overfitting. The model learned the quirks of the training examples rather than the general pattern that would apply to new data.

When a model performs dramatically better on training data than on new test data, it's almost always overfitting — learning the surface patterns of the specific training examples rather than the general signal that applies to new cases.

3. What does "gradient descent" accomplish during classifier training?

Right. Gradient descent is the mechanism by which weights get updated. At each step, the algorithm calculates which direction to adjust each weight to reduce loss, and takes a small step in that direction. Repeat thousands of times.

Gradient descent is specifically about weight adjustment. It's the process of finding the values of all the weights that produce the lowest possible prediction error on the training data, by following the mathematical slope (gradient) of the error surface.

4. A medical classifier outputs "BENIGN" with a confidence score of 55%. A doctor using the system only sees the label "BENIGN," not the score. What is the most significant risk this design creates?

Exactly. A 55% confidence score means the input was very close to the decision boundary — the classifier is barely leaning toward "benign." Without that information, the doctor has no reason to flag it for additional review.

Think about what the confidence score is telling you. 55% means the classifier is essentially flipping a coin. If the doctor can't see that, they can't make an informed decision about whether to seek a second opinion or run more tests.

5. What does it mean for a classifier to be well-calibrated?

Correct. Calibration means confidence scores are trustworthy as probability estimates. An overconfident classifier might say "90% sure" when it's only right 65% of the time on those cases — which is a dangerous gap in high-stakes applications.

Calibration is specifically about whether confidence scores are accurate probability estimates. A well-calibrated model's "80% confident" predictions should be correct about 80% of the time. A poorly calibrated model might be systematically overconfident or underconfident.

Lab 3 — Confidence Score Investigator

You're deciding how a deployed classifier should communicate uncertainty.

Your Role: System Design Critic

A company is deploying a loan approval classifier. It makes a binary decision — approve or deny — and outputs a confidence score. Leadership wants to hide the confidence score from loan officers, arguing that it "simplifies the process" and "prevents second-guessing." You've been asked whether this is acceptable.

Take a position on whether hiding confidence scores from loan officers is acceptable. Use at least one concept from Lesson 3 — calibration, overfitting, or loss — to support your argument. Then consider: does your answer change if the classifier is well-calibrated versus overconfident? Defend your reasoning.

Lab Partner — CIPHER System Design Mode

Interesting position to have to defend. Before I push back: what does hiding the confidence score actually do to the loan officer's ability to make a decision? And does it matter whether the classifier is well-calibrated or overconfident — or is that irrelevant to your argument? Be specific about the mechanism, not just the outcome.

Module 6 · Lesson 4

Building Your Own — From Scratch, In Your Head

You now have enough to design a real classifier — and to see exactly where it could go wrong.

If you had to build a classifier that affected real people's lives, what decisions would you make — and what would you be responsible for?

In January 2020, the city of New York began piloting a system called Ava — an AI tool developed by a company called Palantir — to help the Administration for Children's Services predict which families were at high risk of child abuse or neglect. The system pulled in data from dozens of city agencies: public housing records, family court filings, benefits enrollment, prior ACS contacts. It classified families and generated risk scores.

Critics, including researchers at New York University, pointed out a fundamental problem: the training data was overwhelmingly composed of families who had already had contact with ACS — which in New York City meant predominantly Black and Hispanic families in low-income neighborhoods. Families in wealthier neighborhoods who had similar dynamics but had never been flagged were absent from the training data entirely. The classifier had no way to learn from cases it had never seen.

A child welfare professor named Dorota Wiszniewski put it plainly: "The model can't predict what it hasn't been trained on. And what it hasn't been trained on is the rest of the city." The system was quietly suspended after the mayor who championed it left office. But the design decisions that led to its problems were not unusual. They were, in fact, the standard approach. That's what makes this worth understanding.

The Five Decisions You Make When You Build a Classifier

Building a classifier isn't just writing code. It's a series of design decisions, each of which shapes what the system can and can't do, who it will work well for, and who it might harm. Here they are in order.

Define the output categories. What labels will the system assign? This sounds obvious, but it's one of the most consequential choices. "At risk" versus "not at risk" sounds clean — but "at risk of what, exactly, on what timeline, as measured how?" Every ambiguity in the category definition gets inherited by everyone downstream who reads the output.
Choose your features. What measurable properties of the input will you use as evidence? This choice determines what the classifier pays attention to and what it ignores. Choosing "neighborhood" as a feature in a risk classifier encodes geography as a proxy for behavior. Choosing "prior agency contact" in a child welfare classifier encodes historical surveillance patterns as evidence of future risk.
Collect and audit your training data. What labeled examples will you train on? Who labeled them, under what conditions, with what instructions? This is where historical bias, representation bias, and measurement bias either get caught or get baked in. This step deserves more time than every other step combined.
Choose and train your model. How complex should the model be? What loss function are you minimizing? How will you prevent overfitting? This is the step most people think of as "the AI part," but by this point, the most important decisions have already been made.
Decide how to deploy the output. Will the classifier's output be one input to a human decision, or will it automate the decision entirely? Will confidence scores be shown? What happens when the system is wrong? Who bears the consequences — the people making the decision, or the people the decision is about?

Precision vs. Recall: You Can't Always Have Both

Here's a real engineering tension that has direct ethical consequences. When you're tuning a classifier, you can adjust the decision boundary to make two competing kinds of errors.

False positives happen when the classifier says "yes" but the truth is "no." In a disease classifier: telling a healthy person they're sick. In a fraud detector: blocking a legitimate transaction. In a child welfare system: flagging a family that's actually fine.

False negatives happen when the classifier says "no" but the truth is "yes." In a disease classifier: telling a sick person they're healthy. In a fraud detector: letting fraud through. In a child welfare system: missing a family that genuinely needs intervention.

Precision measures how often the classifier is right when it says yes. If it flags 10 cases and 9 are actually problems, precision is 90%. Recall measures how many of the actual problems the classifier caught. If there were 15 real problems and the classifier caught 9, recall is 60%.

Here's the tension: moving the decision boundary to catch more true positives (higher recall) almost always increases false positives too (lower precision). You can't maximize both simultaneously without infinite data and a perfect model. So which errors are acceptable? That is not a technical question. It is an ethical one — and the answer differs depending on whose false positives and false negatives you're counting.

Precision / Recall Tradeoff — Child Welfare Risk Classifier

High Recall

Catches more real cases — but flags more innocent families

Balanced

Some of each type of error

High Precision

Rarely flags innocent families — but misses more real cases

Responsibility: Who Is Accountable When a Classifier Is Wrong?

This is the question that no technical manual covers, but it's the one that matters most once a system is deployed. When New York's child welfare classifier generated a false positive and a family was investigated unnecessarily — who was responsible? The company that built the model? The city agency that deployed it? The caseworker who used the score without questioning it? The policymaker who approved the procurement?

Right now, the legal and regulatory answers to that question are genuinely unsettled. In the European Union, the AI Act (which became law in 2024) requires that high-risk AI systems — including those used in employment, education, social services, and law enforcement — meet standards for transparency, human oversight, and accuracy. In the United States, there is no equivalent federal law. The rules are being written now. Not in ten years. Now.

Every engineer, product manager, policy analyst, and journalist who understands how classifiers actually work — what features they use, where their training data came from, what their confidence scores mean, and where their decision boundaries sit — has a role in how those rules get written. You understand all of that now. Most people making these decisions professionally don't.

Ethical Question — No Clean Answer

If you were the engineer who built New York's child welfare classifier, and you knew your training data was biased toward already-surveilled families, would you ship the system? What if you were told it would still catch some cases that would otherwise be missed — cases where real children were harmed? Does the possibility of catching even one genuine case justify deploying a biased system? Who gets to make that calculation?

What You Now Understand

You have gone from "AI classifies things" to understanding the full chain: features are measured from inputs, weights are learned from labeled training data through gradient descent, decision boundaries are drawn in feature space, confidence scores quantify proximity to those boundaries, and all of it is shaped by choices made before the algorithm ran. That is the complete picture. Most people who talk about AI publicly — including many who are paid to — don't have it. You do.

Quiz — Lesson 4

Building Your Own — From Scratch, In Your Head

1. Why was New York's child welfare classifier (piloted in 2020) structurally unable to predict risk for families across the whole city?

Correct. A classifier can only learn patterns that are present in its training data. Families who had never been flagged by ACS — including many in wealthier neighborhoods — were invisible to the model. It learned what "risk looks like" only from families the system had already been watching.

The core problem was the training data's coverage, not the algorithm's complexity or who built it. A classifier that has only ever seen families flagged by ACS has no model of what "unflagged but at-risk" looks like — because that type of case was never in its training set.

2. A fraud detection classifier has high recall but low precision. In practice, this means:

Right. High recall means it catches a high proportion of actual fraud cases (few false negatives). Low precision means many of its positive flags are wrong (many false positives) — legitimate transactions get blocked alongside the fraud.

Recall measures how many real positives were caught. Precision measures how often a positive prediction was correct. High recall, low precision: catches most real fraud, but also raises many false alarms about legitimate transactions.

3. You're building a classifier to detect wildfires from satellite imagery. Which of the five design steps would have the MOST impact on whether the system works reliably across different climate zones?

Correct. A classifier trained only on fires in California chaparral will have a decision boundary poorly suited to fires in boreal forests or Mediterranean scrubland. Data diversity is the foundation; everything else builds on it.

Apply what you've learned about training data. A model can only learn patterns it has seen. If the training data is geographically narrow, the decision boundary will be drawn for that specific context — and may fail badly in other climate zones.

4. The EU AI Act (2024) classifies systems used in child welfare, employment screening, and law enforcement as "high-risk." What is the most likely reasoning behind singling out these applications?

Correct. The EU's reasoning is about consequence severity and power asymmetry. A false positive in a spam filter is inconvenient. A false positive in a child removal algorithm, a hiring screen, or a criminal risk score changes a person's life in ways they may never be able to challenge.

Think about what makes these applications different from, say, a music recommendation system. The errors have severe real-world consequences for specific people, and those people typically have little ability to understand, appeal, or correct the classifier's decision.

5. When choosing features for a classifier, using "zip code" or "neighborhood" as a feature is risky primarily because:

Exactly. Geographic features often encode demographic information because of historical patterns of residential segregation. Including zip code can introduce racial and economic proxies into the classifier's decision-making without anyone explicitly programming them there.

The problem isn't data quality or model complexity — it's what geography encodes. Because of decades of redlining and residential segregation, where someone lives is correlated with their race and income. A classifier using zip code is indirectly using those demographics whether or not they're explicitly listed as features.

Lab 4 — Design Your Own Classifier

You're the designer. You make the calls. Then defend them.

Your Role: System Architect

You've been asked to design a classifier for a real use case of your choice. It could be a school attendance risk predictor, a disease detection system, a content moderation filter, a loan approval system, or something you invent. Walk through all five design decisions: output categories, feature selection, training data plan, model choices, and deployment constraints. Your lab partner will pressure-test every decision.

Start by stating what you want to build and what your output categories are. Then name two features you'd use and explain why you chose them — and one feature you'd deliberately avoid and why. Your partner will push back on at least one of your choices.

Lab Partner — CIPHER Architect Mode

You're the designer now — that means you're responsible for every choice. Tell me what you're building, what the output categories are, two features you'd use, and one you'd exclude. I'll accept your reasoning only if it holds up. What are you building?

Module Test — Build a Mini Classifier Yourself

15 questions · Pass at 80% (12/15 correct)

1. What is the fundamental job of a classifier?

Correct.

A classifier assigns inputs to categories — spam/not-spam, cat/dog, approve/deny — based on measured features of the input.

2. In the 2015 Google Photos incident, what was the root cause of the misclassification?

Correct.

The cause was representation bias in the training data — not a bug, not intentional, but a failure to include diverse enough examples when drawing the decision boundary for human faces.

3. A decision boundary that was learned from training data primarily collected from wealthy urban neighborhoods will most likely fail on:

Correct. The decision boundary was learned from one context. When the input comes from a different context whose patterns weren't represented in training, the boundary is poorly positioned for those inputs.

Decision boundaries are only drawn well for the distribution of data that shaped them. Inputs from underrepresented contexts — different geography, income level, demographic background — land in regions of feature space where the model hasn't been well-calibrated.

4. Amazon's recruiting AI downgraded resumes mentioning "women's" organizations because:

Correct. No rule was written. The correlation was discovered automatically from training data that reflected a decade of human discrimination in tech hiring.

No explicit rule existed. The system found the statistical correlation in training data — women were hired less in tech, so features correlated with female applicants were weighted negatively by the learning process.

5. "Measurement bias" refers to which of the following?

Correct.

Measurement bias is about the labeling process — when some groups are monitored more intensively than others, their data appears to show more "failures," even if actual behavior is the same.

6. What does "loss" measure during classifier training?

Correct. Loss is the error signal. The training algorithm minimizes it by adjusting weights in the direction that would have produced a smaller loss on that example.

Loss is the measure of prediction error. It's what the training algorithm tries to minimize by nudging the model's weights — more wrong prediction means higher loss, which triggers bigger weight adjustments.

7. An overfit classifier is characterized by:

Correct. Overfitting means the model learned the specific examples rather than the general pattern — brilliant on training data, poor on anything new.

Overfitting produces a model that performs well on what it trained on (low training error) but fails on new data (high test error) because it learned quirks rather than general patterns.

8. A classifier is "well-calibrated" means:

Correct. Calibration is about whether the probability outputs are trustworthy estimates, not just whether the final binary prediction is accurate.

Calibration means confidence scores correspond to actual correctness rates. An overconfident but miscalibrated model might say "90% confident" on cases where it's only actually right 60% of the time.

9. In the precision/recall tradeoff, raising the classification threshold to reduce false positives will typically:

Correct. Moving the boundary to reduce false positives (higher precision) means the classifier only flags high-confidence cases — so it will miss more real positives (lower recall).

Precision and recall trade off. A stricter threshold means fewer false alarms (better precision) but also misses more true positives (worse recall). You can't optimize one without a cost to the other.

10. The New York child welfare classifier was "unable to predict risk for families across the whole city" primarily because:

Correct. The classifier's world was bounded by the cases it had seen. Families outside that surveillance history were outside its model of what "risk" looks like.

The issue was training data coverage. The model only learned from families already in the system — which skewed heavily toward lower-income communities of color. Families in wealthier, less-surveilled communities were absent from the training set.

11. Using "zip code" as a feature in a loan approval classifier creates risk primarily because:

Correct. Proxy features that correlate with protected characteristics introduce the discrimination through a side door — technically "race-neutral" while functionally race-correlated.

The problem is correlation: because of historical redlining and segregation, where someone lives correlates with their race and income. A classifier using location is indirectly using those demographics even if they were never explicitly included as features.

12. Google's response to the gorilla misclassification — removing the "gorilla" label entirely — was inadequate because:

Correct. Deleting a label prevents that specific output from appearing but leaves the underlying representation failure in place. As of 2023, Google Photos still would not label any gorillas, chimps, or similar primates — evidence that the root cause was never addressed.

Removing the label doesn't fix the fact that the training data underrepresented dark-skinned people. That gap in the training data means other misclassifications remain possible — the gorilla label was just the most visible symptom.

13. AlexNet's 2012 ImageNet breakthrough was significant primarily because:

Correct. AlexNet's 15.3% error rate versus the field's 26.2% was not a marginal improvement — it was a signal that the whole approach to image classification should change, and within two years, it had.

AlexNet cut the error rate nearly in half — from ~26% to 15.3%. That gap was so large it redirected an entire research field, demonstrating that deep neural networks running on consumer GPUs were a fundamentally superior approach.

14. The EU AI Act (2024) requires human oversight for "high-risk" AI applications. What concept from this module best explains why human oversight is especially important near the decision boundary?

Correct. High confidence inputs — those far from the decision boundary — are where the classifier is most reliable and automation adds the most value. Low confidence inputs — near the boundary — are where the classifier is uncertain and human judgment is most needed.

Think about what the decision boundary means. Inputs near it produce low confidence scores — the classifier is barely leaning one way. That's precisely where a human reviewer's judgment could change the outcome and where automation is riskiest.

15. A journalist writes: "The AI approved 91% of loan applications from Neighborhood A but only 67% from Neighborhood B." Based on what you've learned, which follow-up question is MOST important?

Excellent. The approval rate gap is the symptom. The question that gets to the cause is about training data representation and whether geographic proxies for race and income shaped the decision boundary. That's the question worth asking.

The most consequential question connects the symptom (approval rate gap) to the mechanism (training data and feature selection). Did the training data equally represent both neighborhoods? Is neighborhood — or a correlated proxy — embedded in the features? That's where discrimination enters the system.