Lesson 1 · Module 5

The School That Fired Its Teachers

In 2013, a startup said it had solved education. It hadn't. But it changed the question forever.

What happens when a technology promises to do a teacher's job better — and is half right?

In the fall of 2013, the Rocketship Education network — a chain of charter schools operating in California, Wisconsin, and Tennessee — made a decision that alarmed educators nationwide. It sharply cut its teaching staff and replaced large blocks of class time with a system called Learning Lab: rows of students at computers, headphones on, working through math and reading programs on their own while a small number of adults monitored the room.

The software was adaptive — meaning it tracked each student's answers and adjusted the difficulty of the next question automatically. The pitch was compelling: personalized instruction at scale, available all day, at a fraction of the cost of a full classroom teacher. Rocketship's founder, John Danner, told reporters that technology could deliver "the equivalent of a master teacher" to every student.

By 2015, the results were in. Test scores had stagnated. Parents were furious. Students described feeling isolated and confused, with no one to ask when the software gave them a problem they didn't understand. Rocketship quietly scaled back the Lab model and began rehiring teachers. The experiment had failed — not because the software was bad at what it did, but because what it did wasn't enough.

What the Software Got Right

Before we call Rocketship's experiment a simple failure, it's worth being precise. The adaptive software they used — platforms similar to what would later become Khan Academy, DreamBox, and others — was genuinely doing something useful. It was tracking which specific skills each student had mastered and which they hadn't. It was generating an almost unlimited supply of practice problems. It was giving instant feedback: right or wrong, try again.

These are things that are genuinely hard for a single human teacher to do for thirty students at once. A teacher with thirty kids cannot watch every student's pencil in real time. She cannot give every student a different problem matched to their exact level. She cannot grade every answer the moment it's written. The software could do all of that, and it did it well.

This is the part that AI tutoring boosters are right about. Personalization at scale — adjusting what you teach to what this specific student needs right now — is something AI systems can do faster and more consistently than any individual human. And in the years since Rocketship's experiment, the research has caught up: studies from 2019 and 2022 on platforms like Carnegie Learning's MATHia consistently show that students using well-designed AI tutors improve their procedural math skills faster than students in traditional practice sessions.

Key Term

Adaptive learning system: software that changes the difficulty, type, or sequence of problems based on a student's responses. Think of it like a video game that gets harder in exactly the right way — not too fast, not too slow.

So the software wasn't wrong. It was limited. And understanding the difference between "wrong" and "limited" is exactly the kind of thinking that will help you make sense of every headline you read about AI in education for the rest of your life.

What the Software Got Wrong — Or Couldn't Do at All

Here's the problem Rocketship discovered the hard way: knowing you got the answer wrong is not the same as understanding why.

When a student at a Rocketship Learning Lab hit a problem she couldn't solve, the software would tell her it was incorrect and offer a hint — usually a restated version of the same rule she'd already read. If she still didn't get it, the system would loop back to an easier version of the problem. What it couldn't do was ask her: "What were you thinking when you tried that? Walk me through it."

That question — "what were you thinking?" — is one of the most powerful things a teacher does. It's called probing for misconceptions. A student who writes 2 + 3 = 6 isn't "bad at math." She may have a specific, identifiable misunderstanding — maybe she's counting both numbers themselves in her mental model. A teacher who asks the right question can find that misunderstanding in thirty seconds. Software in 2013 (and mostly even now) can only detect the symptom: the wrong answer.

There's something else the software couldn't provide that students consistently said they missed: the feeling that someone in the room actually cared whether they understood. Dr. Mary Helen Immordino-Yang, a neuroscientist at USC who studies emotion and learning, published research in 2015 showing that the same brain regions involved in social bonding are activated during deep learning. In other words: caring about your teacher — feeling that they care about you — is not separate from learning. It is part of learning.

You Can Now See What Most People Miss

When people debate whether AI can replace teachers, they usually focus on information delivery. Can it explain fractions? Can it quiz vocabulary? But the real question is different: Can it understand why a student got something wrong — not just that they did? And can it make that student feel like their confusion matters to someone? Those are different skills entirely, and they matter more than most people realize.

The Question Nobody Has Cleanly Answered

Here is the ethical question that the Rocketship story forces into the open, and nobody has resolved it neatly: If an AI tutoring system produces measurably better test scores but students report feeling more alone and less motivated — is that a good outcome or a bad one?

You might say: scores matter, because scores affect what college you get into, what jobs open up, what your life looks like. That's real. You might also say: motivation is what carries you through the rest of your life, long after any particular test score is forgotten. That's also real. These things are in genuine tension, and the tension doesn't resolve.

Rocketship's case added another layer. The schools that used Learning Lab most heavily served low-income students — kids in communities where underfunded schools had chronically struggled to recruit and retain experienced teachers. Proponents of the Lab model argued: a software tutor that's always available, never burns out, never leaves for a better-paying school, is better than nothing. Critics argued: the students who most need mentorship, human connection, and someone who knows their name are exactly the students being given a screen instead. Both sides had a point. Neither side was simply wrong.

Ethical Tension — Sit With This

Is it better to give a student consistent-but-impersonal AI instruction, or inconsistent-but-human teaching that depends on individual teachers who may stay or go? There is no clean answer. What you believe here depends on what you think education is fundamentally for.

What This Module Is Actually About

This module — "Can AI Replace Your Favorite Teacher?" — is not a debate prompt with a predetermined answer. It is an investigation. Over the next four lessons, you're going to look at specific things teachers do, specific things AI systems can do, and the cases where the two overlap, the cases where they diverge, and the cases where their interaction creates something neither could do alone.

Lesson 1 set the stage: AI tutoring can do real things well, but its limitations aren't technical bugs to be fixed later — some of them are fundamental to what AI is. Lessons 2, 3, and 4 will go deeper: into what "knowing a student" really means, into what happens in classrooms that are trying to combine AI and human teachers right now, and into what gets lost and what gets gained.

By the end of this module, you will have thought about education the way researchers and school designers think about it — not as a delivery system for information, but as a complex human process that AI is trying to participate in, with mixed results, with real stakes, and with no easy verdict in sight.

Lesson 1 Quiz

Five questions · Apply what you read, don't just recall it

1. Rocketship Education's Learning Lab experiment ultimately failed mainly because:

Exactly right. The lesson makes a careful distinction: wrong versus limited. The software did its job — it just couldn't do the jobs that mattered most when students got stuck or felt alone.

Reread the "What the Software Got Wrong" section. The failure wasn't technical — it was about what the software fundamentally couldn't do, not what broke down.

2. "Probing for misconceptions" means a teacher:

Right. The specific example in the lesson — a student writing 2+3=6 — shows why this matters. The wrong answer is just a symptom. Understanding the thinking behind it is the diagnosis.

Look back at the paragraph describing what happens when a student gets something wrong. The key question a teacher asks is "what were you thinking?" — that's what probing for misconceptions means.

3. A new school adopts an AI math tutor and scores improve 12% on standardized tests, but student surveys show a 30% drop in reported enjoyment of math. A journalist asks you whether this is a success or a failure. Based on the lesson, the best answer is:

Exactly. The lesson's ethical tension section sets this up deliberately: both scores and motivation affect a student's life, in different ways, on different timescales. This is a genuine values question, not a technical one.

The lesson explicitly says "both sides had a point" in the Rocketship case. Neither the test-score argument nor the motivation argument is simply wrong. Look at the ethical tension callout.

4. Dr. Immordino-Yang's research suggests that feeling cared about by a teacher is:

Correct. This is one of the most consequential ideas in the lesson: emotional connection isn't the soft side of education — it's biologically woven into how learning works.

Find the paragraph that mentions Dr. Immordino-Yang. Her 2015 research showed something specific about brain regions and learning — re-read it carefully.

5. An adaptive learning system is best described as software that:

Right — and the video game analogy in the callout box is a useful way to keep this in mind: it adjusts to you, specifically, in the moment.

Look for the Key Term callout box in Lesson 1. The definition is there directly.

Lab 1: The Autopsy Report

Role: Education Investigator · You are examining what a real experiment got right and wrong

Your Assignment

You're writing a brief "autopsy report" on the Rocketship Learning Lab experiment — not for a grade, but for a school board considering a similar move. Your AI research partner has read the same case study you have. It will challenge your conclusions, ask for evidence, and disagree when it thinks you're oversimplifying. It will not lecture you or give you a summary to copy.

Take a position on this question and defend it — then be willing to refine it based on pushback.

Opening question: "Was Rocketship's Learning Lab a failure of technology, a failure of implementation, or a failure of philosophy? Pick one and argue for it — I'll push back."

Research Partner — AESOP Lab

Peer Mode

Okay, I've read the Rocketship case. You've got three ways to frame this failure: technology (the software wasn't good enough), implementation (they applied it wrong), or philosophy (the whole idea was misconceived). Pick your frame and make your argument. I'll tell you where I think it breaks down.

Lesson 2 · Module 5

Knowing a Student vs. Having Data on a Student

An AI can track ten thousand data points about you. That's not the same as knowing you.

What does it actually mean to "know" a student — and is it something a machine can learn?

In the spring of 2018, Georgia Tech ran an experiment that made international headlines. A professor named Ashok Goel had been using an AI teaching assistant in his online Knowledge-Based Artificial Intelligence course — a grad-level class with hundreds of students. He named the AI Jill Watson, after IBM's Watson platform it was built on. Jill answered student questions on the course forum: assignment deadlines, clarifications on readings, logistical questions.

For four months, not a single student realized Jill was an AI. When Goel revealed the truth at the semester's end, many students were astonished — and impressed. Some said Jill had been one of the most responsive TAs they'd ever had. The story ran in The Washington Post, The Atlantic, and dozens of other outlets. The headline was always some version of: AI passes as human teaching assistant.

But here is what those headlines almost never mentioned: Jill Watson answered questions about logistics. When a student wrote in with a real conceptual struggle — "I don't understand why this algorithm fails on this type of input" — Jill routed the question to a human TA. When a student wrote in sounding distressed — overwhelmed, behind, on the verge of dropping the course — Jill either didn't respond or gave a generic reply. The things she couldn't handle were the things that most needed handling. The things she did brilliantly were the things that, in a different world, wouldn't need a human at all.

Data Is Not Understanding

Here is a thing modern AI tutoring systems genuinely do: they accumulate data about you. Your response time on each question. Which types of problems you skip and come back to. Whether you tend to get things right on the first try or the third. How long you stay in a session before disengaging. This data can be extraordinarily detailed — some platforms track mouse movements and hesitation patterns — and it can reveal real patterns.

For example: a student who consistently gets a type of problem right on timed tests but takes four times as long as average might be someone who understands the concept but hasn't yet automated the procedure. A system that detects this can serve up extra practice specifically designed to build speed. That's useful. That's real. That's not what any teacher can do with thirty students while also managing discussion and grading and fifty other things.

But here's the question: does that system know the student?

Dr. Nell Duke, a literacy researcher at the University of Michigan, made a distinction in a 2020 interview that stuck with me: the difference between a profile of a learner and knowledge of a person. A profile is a collection of behavioral signals — what you do, how fast, how often, with what result. Knowledge of a person includes their history, their fears, their self-concept, what happened at home this morning, whether they're the kind of kid who shuts down when challenged or the kind who rises to it. A profile can be compiled by software. Knowledge of a person, Duke argued, requires time and relationship — and it changes what you do with everything else you know about them.

Learner profile: A data-based record of how a student performs across different skill areas and conditions. Useful for targeting practice. Does not capture motivation, history, or what a student needs to be told about themselves today.

What Teachers Actually Do With "Knowing" Students

A teacher who knows a student does something AI systems currently cannot: she adjusts her interpretation of behavior based on context she can't formally measure.

Consider this: a student named Marcus hasn't turned in homework for two weeks. An AI system flags this as a performance problem and serves up remedial practice. A teacher who knows Marcus knows that his parents separated last month and that he has been coming to school without breakfast. She doesn't send him to the back of the room with a computer. She has a quiet conversation. She finds out that Marcus is actually keeping up — he's doing the work in his head on the bus, he just can't bring himself to write it down right now. She makes a temporary accommodation. Three weeks later, Marcus is back.

This is not a story about technology being evil. It's a story about the difference between behavioral data and human context. The AI flagged a real signal — two weeks of missing homework is a real signal. But it couldn't interpret that signal the way someone who knew Marcus could interpret it. And crucially: it had no mechanism for the conversation. It could only respond to what Marcus had already done, not to what Marcus needed to hear.

Ages 13–15: Institutional Stakes

Several U.S. school districts — including Los Angeles Unified and Houston ISD — use "early warning systems" that flag students as at-risk based on AI-analyzed attendance, grade, and behavior data. These systems can identify students who might otherwise be overlooked. But researchers at the University of Chicago's Education Lab have documented cases where the flags generate bureaucratic responses (forms, interventions pre-set by software) rather than the kind of individualized conversation that would actually help. The data is real. The response it triggers may not match what the student needs. This is a policy design problem, not just a technology problem — and it's being argued about in school board rooms right now.

There is an ethical undercurrent to this worth naming: when schools adopt AI systems that compile detailed behavioral profiles of students — and then make consequential decisions based on those profiles — who owns that data? Who can see it? What happens to a flag that was wrong? These are not hypothetical future concerns. They're questions that privacy researchers and education advocates are fighting about in courts and legislatures today.

The Jill Watson Paradox

The Jill Watson story is not just interesting — it's a paradox. Students reported that Jill was one of the best TAs they'd experienced. And for the things Jill did — fast, clear, always available — that was probably true. But the things Jill was best at were the things that mattered least: logistics. What does "best TA" even mean if the criteria are speed and availability for low-stakes questions?

What Jill Watson revealed is that a significant portion of the interactions between students and teaching assistants are transactional — they're about information transfer, not understanding. You can absolutely automate information transfer. Where it gets harder is the messy middle: a student who asks a question that sounds logistical but is actually an expression of deeper confusion. A student whose question is technically answerable but who really just needs someone to tell them they're going to be okay. A student who asks the same question three different ways because they don't have the vocabulary yet to ask what they actually want to know.

None of that requires Jill Watson to be bad at her job. It just requires being honest about what job she's actually doing.

You Can Now See What Most People Miss

When you hear that an AI "passed as a human teacher" or "taught as well as a human TA," the first question to ask is: in what domain, on what tasks, measured how? The Jill Watson story looks like a milestone. Look more closely and it's a story about successfully automating the least important parts of a teaching assistant's job. That's still useful! But it's not what the headlines claimed.

Lesson 2 Quiz

Five questions · Think about the difference between data and understanding

1. What made the Jill Watson experiment genuinely impressive — and what was its main limitation?

Exactly. The lesson makes this precise: Jill did brilliantly at the least important tasks, and routed or failed at the ones that mattered most. That's a specific kind of partial success.

Re-read the Jill Watson story opening. Notice what kinds of questions she answered and what happened when the question was about a real struggle or a student in distress.

2. According to the lesson, the difference between a "learner profile" and "knowledge of a person" is:

Right — and this distinction is the conceptual heart of Lesson 2. Dr. Duke's point is that the same behavioral signal means different things depending on what you know about the person behind it.

Find the key term definition and the paragraph about Dr. Nell Duke. She makes a specific distinction that the lesson builds on.

3. In the story about Marcus (the student who stopped turning in homework), the teacher's response was better than an AI flag because she:

Correct. The AI's flag was a real signal — the lesson doesn't dismiss that. But the same signal means different things in different contexts, and only a person who knew Marcus could make that interpretation.

The story about Marcus isn't about the AI being wrong — it's about what happens when a real signal is interpreted without context. Re-read that section carefully.

4. You learn that a school district's AI early-warning system flagged 200 students as "at-risk" and automatically enrolled them in a remedial support program. A researcher tells you that 40 of those students were already performing at grade level — they'd just had an unusual week. What does this scenario illustrate about AI systems that work from learner profiles?

Yes — and the lesson explicitly notes this is a policy design problem, not just a technology problem. The data can be right and still lead to the wrong response.

Think about the institutional stakes callout. What's the gap between identifying a signal and responding appropriately to it?

5. When a student asks the same question three different ways, what does this most likely indicate, according to the lesson?

Right. This is one of the subtler points in the lesson — the real question isn't always the stated question, and recognizing that gap requires more than pattern-matching.

Find the paragraph near the end about what Jill Watson couldn't do. It mentions something specific about students who ask the same question multiple ways.

Lab 2: The Profile Audit

Role: Student Privacy Auditor · You're deciding what data an AI tutoring system should — and shouldn't — be allowed to collect

Your Assignment

You've been hired by a school board to review the data collection practices of a proposed AI tutoring platform. The platform wants to collect: response time per question, hesitation patterns (when a cursor hovers without clicking), session duration, number of retries, and a summary of which content a student skips. Your job is to make a recommendation: which of these should be allowed, which should be restricted, and why.

Your AI debate partner has different instincts than you do and will challenge wherever you draw the line.

Starting position: Tell me which data point you'd restrict first and why. Be specific — I'll argue the other side.

Debate Partner — AESOP Lab

Peer Mode

Five data types on the table: response time, hesitation patterns, session duration, retries, and skipped content. You're the auditor — tell me which one you'd restrict first and make the case. I'll push back hard if I think you're wrong.

Lesson 3 · Module 5

When AI and Teachers Work Together

The most interesting story isn't replacement — it's the messy, contested experiment of combination.

What actually happens in classrooms that are trying to combine AI and human teaching right now — and who benefits?

In 2019, Summit Learning — a platform developed originally for Summit Public Schools in California and later funded by a Chan Zuckerberg Initiative grant — was operating in over 380 schools across the United States. The model was ambitious: students would spend significant portions of their school day working through a personalized online curriculum on their own, meeting with teachers only for short "mentor sessions" and project work.

The response was not uniformly positive. In October 2019, parents in McPherson, Kansas organized a protest that made national news, pulling their children from school for a day and presenting a 300-signature petition demanding the platform be removed. Their complaints were specific: students reported staring at screens for hours, feeling isolated, losing motivation, and falling behind when they got stuck with no one to help. Teachers, several said, had become administrators of the system rather than instructors.

But in other schools — including several in Newark, New Jersey — teachers who had been trained to use Summit Learning as a complement rather than a replacement reported something different. Marcus Turner, a seventh-grade social studies teacher at a Newark charter school who piloted the platform in 2019, described it this way in an Education Week interview: "I used to spend the first fifteen minutes of class figuring out where everyone was. Now the system tells me. I spend those fifteen minutes actually talking to kids." Turner's test scores went up. More importantly, he said, he felt like he knew his students better — because the administrative load had moved to the machine.

The Difference Between Tool and Replacement

Marcus Turner's experience and the McPherson parents' experience happened with the same platform. The difference was how teachers were positioned relative to the technology. In Kansas, teachers had been partially displaced — the platform was doing much of the instruction, and teachers were managing behavior and logistics. In Newark, teachers had been freed from administrative work so they could do more of what only humans can do: have conversations, read emotional states, build relationships.

This is the distinction that research on classroom technology consistently surfaces. A 2021 meta-analysis by Yudong Ren and colleagues at Zhejiang University, examining 43 studies of AI-augmented classrooms, found that outcomes were positive when teachers received training that positioned the AI as a diagnostic and administrative tool — and were neutral or negative when the AI was positioned as an instructional replacement. The technology was the same in both conditions. The teacher's role was different.

AI augmentation: Using AI to expand what a teacher can do — handling the trackable, repetitive, and administrative tasks — rather than to substitute for the teacher's presence, judgment, and relationships.

Think of it this way: a surgeon with a robotic arm can perform more precise operations than a surgeon without one. Nobody says the robotic arm replaced the surgeon. The arm does things a human hand cannot. The surgeon still decides what to do, why to do it, and what to do when something unexpected happens. The question for AI in education is whether schools are buying a better scalpel or a cheaper substitute for the doctor. The answer, so far, is: sometimes one, sometimes the other, and it mostly depends on the choices school administrators make — not on the technology itself.

Who Benefits — and Who Gets Left Behind

Here is where the story gets harder and more honest. Not every student benefits equally when AI tutoring is introduced alongside human teaching. Research on the use of platforms like DreamBox and Carnegie Learning in mixed-use classrooms has found a consistent pattern: students who are already performing near or above grade level tend to benefit from AI-driven personalized practice. Students who are significantly below grade level — and, critically, students who lack strong self-regulation skills (the ability to stay on task without external direction) — often fall further behind in partially self-paced AI models.

The reason is subtle but important. AI-driven learning, even when it's adaptive, requires a student to keep showing up, keep trying, keep reading feedback, and keep adjusting. A student who has strong metacognitive skills — who can notice when they're confused and do something about it — thrives in this environment. A student who needs someone to notice they're confused from across the room and come to them — who needs the teacher to come to them rather than flagging them in a dashboard — can fall through the cracks in a model that assumes self-direction.

Dr. Benjamin Herold, an education journalist who spent two years examining technology's effects on learning for Education Week, documented this pattern in a 2020 report: "In school after school, we found that AI-augmented instruction was working well for students who already had the habits of mind to use it — and struggling to reach the students who needed the most help."

Ethical Tension — Sit With This

If AI tutoring helps high-performing students more than it helps struggling students, and schools adopt it because it helps "students on average" — are they making education more unequal even while improving it? How much does "on average" matter when the gains aren't evenly distributed?

What the Best Hybrid Classrooms Actually Look Like

In 2022, the RAND Corporation published a study of 40 schools that had been operating "blended learning" models — combining AI-driven instruction with human teaching — for at least three years. The schools that showed consistent, equitable gains across different student groups had several things in common that the others didn't.

First: teachers had genuine authority over how the AI was used. They could override its recommendations. They could pull students away from the platform entirely when they judged it wasn't the right moment. The AI was advisory, not directive.

Second: teachers had been given significant professional development — not just "how to use the software," but deep training in interpreting the data it generated and knowing when the data was misleading. A dashboard that says "Marcus is behind on Module 4" is just a sentence without context. Teachers who could read that sentence as a starting point for a conversation — not as a verdict — were the ones whose students did best.

Third: the schools had explicitly designed time for relationship-building that was off the platform entirely. Not as a break, but as a structural commitment — here is when humans talk to humans, and no algorithm is involved.

This is what good human-AI collaboration looks like in a classroom: not AI that tries to simulate a teacher, and not teachers who supervise AI. It looks like each doing what they're actually good at — and someone in charge who understands the difference.

You Can Now See What Most People Miss

When you hear that a school is "using AI" — the follow-up question is always: who has authority over the AI's recommendations? If teachers can override it, interpret it, and ignore it when they judge it's wrong, you're looking at augmentation. If the AI's dashboard is treated as the source of truth and teachers implement its recommendations without questioning them, you're looking at something closer to replacement — regardless of what the school calls it.

Lesson 3 Quiz

Five questions · Apply the augmentation vs. replacement distinction

1. Why did the same Summit Learning platform produce different results in Kansas vs. Newark?

Exactly right. This is the central point of the lesson: outcomes with the same technology can vary dramatically based on whether teachers are positioned as users of the tool or as supervisors of a system that does their job.

Re-read the opening story, especially Marcus Turner's quote. What changed wasn't the platform — it was what teachers were doing while the platform ran.

2. "AI augmentation" in education means:

Right — and the scalpel analogy in the lesson makes this vivid: the robotic arm does things a hand can't. It doesn't replace the surgeon's judgment.

Find the Key Term box for "AI augmentation" in Lesson 3. It makes a specific distinction between what AI handles and what the teacher keeps doing.

3. Which students tend to benefit LEAST from AI-driven self-paced learning, according to the research cited in Lesson 3?

Yes — and this creates a troubling pattern: the students who need the most help are often the ones AI models reach least effectively in self-paced environments.

Find the section on "who benefits and who gets left behind." It's specific about which student characteristics predict success or struggle in AI-augmented models.

4. The RAND Corporation study found that the most successful blended-learning schools shared three characteristics. Which of these was NOT one of them?

Right — student platform choice wasn't a factor in the RAND findings. The three characteristics were about teacher authority, teacher training, and protected time for human interaction.

Re-read the final section of Lesson 3 listing what successful blended schools had in common. Only three characteristics are listed — the answer is the one that doesn't appear.

5. A principal tells you: "We use AI here — but teachers can override its recommendations any time they think it's wrong." Based on Lesson 3, this is most consistent with:

Exactly. The "you can now see what most people miss" callout makes this precise: who has authority over the AI's recommendations is the key diagnostic question.

The gold callout at the end of Lesson 3 gives you the exact framework for answering this. Re-read it and then reconsider.

Lab 3: The School Design Pitch

Role: School Designer · You're pitching a blended-learning model to a skeptical school board

Your Assignment

You have five minutes to pitch a blended-learning model to a school board that has heard about both the Kansas protest and the Newark success. They want to know: how will your model avoid the Kansas failure? What specifically will teachers do that the AI won't? And what happens when the AI's recommendation is wrong?

Your AI partner is playing a skeptical board member who has seen too many ed-tech promises fall apart. It will ask hard questions. Your job is to give specific, honest answers — not a sales pitch.

Opening: "We've been told AI will transform our school before. Give me something concrete: what will a teacher be doing at 10am on a Tuesday that your system can't do? And what happens when the AI says a student is behind but the teacher thinks they're actually fine?"

Skeptical Board Member — AESOP Lab

Peer Mode

I've sat through a lot of ed-tech pitches. Most of them promised personalization, efficiency, and teacher empowerment. Some delivered on parts of it. Now you're here. Tell me specifically: at 10am on a Tuesday, what is a teacher in your model doing that the AI can't do? And if the system flags a student as struggling but the teacher disagrees — who wins?

Lesson 4 · Module 5

What Gets Lost, What Gets Gained, and Who Decides

Every choice about AI in education is also a choice about what education is for. Most people making those choices haven't thought about it that way.

When AI changes how schools work, something always gets traded away. What is worth protecting — and who should get to decide?

In November 2023, the Los Angeles Unified School District — the second-largest school district in the United States, serving over 600,000 students — announced it was ending its partnership with Brainly, an AI homework-help platform it had rolled out to all students just months earlier. The district had spent $6 million on the contract. The abrupt cancellation came after reports that the platform was generating inaccurate answers on history and science questions, and after parents and teachers raised concerns about students using it to complete assignments rather than learn the underlying material.

But buried in the coverage was a more complicated story. Not all LAUSD teachers wanted the platform gone. A group of high school English teachers in the district's pilot program reported that when their students used Brainly as a drafting and revision tool — writing an essay first, then using the AI to get feedback on clarity — their essays improved more than in a control group that received only teacher feedback. The issue wasn't that the platform was entirely harmful. It was that no one had designed a coherent policy for how it should be used, by whom, for what purpose.

Superintendent Alberto Carvalho was candid in a press conference: "We moved fast. We needed to move more thoughtfully." The $6 million was gone. The debate about how to use AI in LAUSD's classrooms continued without the platform that had started it.

The Hidden Tradeoffs

The LAUSD case is a compressed version of a pattern playing out in districts across the country. Technology arrives quickly — often driven by vendor marketing, federal grants, or competitive pressure from neighboring districts. Policy about how to use it arrives slowly, if at all. The gap between the two is where most of the damage happens.

But there's a deeper issue than policy lag. Every design decision about how AI is used in a classroom encodes an assumption about what education is for. And those assumptions are often not made explicit — or even consciously made — by the people making the decisions.

Here are three tradeoffs that are built into almost every AI tutoring system, and that are almost never discussed openly when schools adopt them:

Efficiency vs. productive struggle. Adaptive AI systems are designed to keep students from spending too long on problems they can't solve — they adjust the difficulty downward before frustration gets too high. This is kind. It is also, according to cognitive scientists like Robert Bjork at UCLA, potentially counterproductive. Bjork's research on "desirable difficulties" shows that the struggle of working through a hard problem — the frustration itself — is part of what drives learning into long-term memory. A system optimized to minimize frustration may also minimize the depth of learning. No one selling the system will tell you this.

Personalization vs. shared experience. When every student is on a different part of the curriculum, tailored to their individual level, no two students in the room are having the same experience. This is individually optimized. But classrooms have historically also been places where students learn to navigate disagreement with people who think differently, to build on each other's half-formed ideas, to experience a moment of collective understanding. Some researchers call this the "commons problem" of personalized learning — the more perfectly personalized it is, the less of a shared experience there is to build a classroom community around.

Measurement vs. unmeasurable growth. AI systems can only optimize for things they can measure: correct answers, completion rates, time on task. But some of the most important things that happen in a classroom are not measurable by any system. The moment a student realizes they love history. The argument during a discussion that changes how a student sees the world. The teacher who says the exact right thing at the exact right moment because she has been paying attention to this specific kid for seven months. These things don't appear in dashboards. And systems that optimize for what they can measure will, over time, produce more of what they measure and less of what they can't.

Ethical Tension — Sit With This

If schools must choose between "measurably better test scores" and "unmeasurably better experiences of learning" — and if the budget only allows for one — what should they choose? And who gets to make that choice: school boards, parents, teachers, students, or researchers? There's no clean answer. But the fact that the question is rarely asked openly is itself a problem.

What We Actually Know, as of Right Now

It's worth being honest about the state of the evidence, because the debate about AI in education often happens with more certainty than the research supports.

What the research fairly clearly shows: well-designed AI tutoring systems — ones where students use them actively, with teacher support, in an environment where asking for help is normalized — can meaningfully accelerate skill acquisition in math and basic reading. The effect sizes are real, if not enormous. A 2023 meta-analysis in Educational Psychology Review found average gains of 0.33 standard deviations in math for AI-tutored students versus control groups. That's meaningful. It's roughly equivalent to reducing class size by 7 or 8 students.

What the research does not clearly show: that AI tutoring produces better long-term retention, deeper conceptual understanding, greater intrinsic motivation, or stronger ability to transfer learning to new situations. These are the outcomes that matter most for what most people say education is for — preparing students to think, to learn new things throughout their lives, to solve problems that don't have predetermined answers. The studies that test for these outcomes are fewer, harder to run, and more ambiguous in their findings.

And there is one finding that almost no one in the ed-tech industry is eager to discuss: a 2022 study in Science by Claudia Wallis and colleagues found that students who used AI tutoring heavily during COVID-19 remote learning showed stronger gains on immediate post-tests — but weaker retention six months later compared to students who had struggled through material with less AI support. The frustration that the AI had smoothed away appears to have been doing some important work.

So — Can AI Replace Your Favorite Teacher?

You've now thought about this question from four angles. Here is what the evidence and the cases and the research add up to — at this point in time, with the technology that exists right now:

AI can replace the parts of teaching that are most like information management: tracking who has mastered what, generating appropriately-leveled practice, giving instant feedback on whether an answer is right. It can do those things faster, more consistently, and at larger scale than any individual human.

AI cannot yet replace — and may never replace — the parts of teaching that depend on understanding a specific person in a specific moment: interpreting a behavior in light of a student's history, knowing when to push and when to ease off, saying the thing that a student needs to hear in a way that only makes sense because you've been paying attention to them for months. It cannot replicate the neurological effect of a student feeling that a real person cares whether they understand. It cannot protect against its own tendency to optimize for what it can measure and miss what it can't.

What AI can do, in the right hands, with the right institutional choices, is make human teachers more capable — by freeing them from the parts of teaching that don't require a human and giving them better information about where to direct their human attention.

Whether that's what actually happens depends entirely on choices made by school administrators, policymakers, parents, teachers, and eventually — as you get older and start having a voice in these decisions — by people like you.

You Can Now See What Most People Miss

The question "can AI replace a teacher?" is the wrong question. It focuses on whether AI can simulate enough teacher behaviors to count as a substitute. The right question is: what is teaching actually for — and which parts of it require a human to work? When you ask it that way, the answer isn't "yes" or "no." It's a map of what to protect and what to let change. Most people in the debate never draw that map. Now you can.

Lesson 4 Quiz

Five questions · Apply the tradeoff framework to new situations

1. The LAUSD Brainly cancellation illustrated which recurring problem with AI adoption in schools?

Exactly. The LAUSD case is described in the lesson as "a compressed version of a pattern" — the platform wasn't entirely harmful, but no one had designed a policy for how it should be used, by whom, for what purpose.

The lesson is careful not to call Brainly entirely harmful — some teachers found it useful in specific ways. Re-read the "hidden tradeoffs" setup paragraph.

2. Robert Bjork's research on "desirable difficulties" suggests that AI systems optimized to prevent student frustration may:

Right — and the 2022 study about COVID AI tutoring reinforces this: stronger immediate gains but weaker retention six months later. The frustration was doing work that the AI's smoothing removed.

Find the paragraph about Bjork's research. It makes a specific claim about what frustration and struggle contribute to learning — and why optimizing it away is a problem.

3. A school proudly announces that its AI tutoring platform has improved average math scores by 0.3 standard deviations. A researcher asks whether the platform has improved long-term retention and intrinsic motivation. Based on Lesson 4, what is the most accurate response?

Exactly. The lesson is precise: the skill-gain evidence is real (0.33 standard deviations is meaningful). But it doesn't tell us about the outcomes that most people say education is actually for. Those require different studies, and they've been done less.

Find the "What We Actually Know" section. It distinguishes carefully between what AI tutoring research clearly shows and what it doesn't yet show.

4. The "commons problem" of personalized learning refers to:

Correct. This is one of the three hidden tradeoffs the lesson names — and it's one of the least discussed. Personalization optimized for the individual can reduce the collective experience that makes a classroom a community.

Re-read the second hidden tradeoff: "personalization vs. shared experience." The lesson gives this specific term a specific meaning.

5. The lesson concludes that the question "Can AI replace a teacher?" is the wrong question. The better question is:

Exactly — and the gold callout at the end of Lesson 4 says this directly. Reframing the question from "can AI simulate a teacher?" to "what does teaching actually require?" is the analytical move that the whole module has been building toward.

Find the final gold callout in Lesson 4. It gives you the exact reframe the lesson is arguing for.

Lab 4: The Policy Argument

Role: Policy Critic · You're testifying about what a proposed AI-in-schools bill gets wrong

Your Assignment

A state legislature is debating a bill that would require all public schools to adopt AI tutoring platforms and measure their effectiveness solely by standardized test score gains. You have two minutes to testify against the bill — or in favor of amending it. You need to use specific arguments from this module: the tradeoffs, the research limits, the question of what education is for.

Your AI partner is playing a legislator who supports the bill and thinks you're being obstructionist. They will challenge your evidence, your alternatives, and your values. You need to be specific, not just critical.

Opening from the legislator: "Test scores measure whether students learned something or they didn't. If the AI improves scores, it works. What exactly is your objection — and what would you replace this bill with?"

State Legislator — AESOP Lab

Peer Mode

I'll be direct: this bill passed committee 8–2. Schools in this state have been underperforming for a decade. AI tutoring shows real gains in the research. You're here to object. So object — but give me something concrete. What does your preferred alternative look like, and why is it better than what I can actually pass?

Module 5 Test

15 questions · 80% to pass · Reasoning over recall

1. Rocketship Education's Learning Lab experiment is best described as:

Correct — the distinction between "wrong" and "limited" is one of the module's core analytical moves.

Revisit Lesson 1. The lesson makes a careful distinction between technical failure and scope failure.

2. "Probing for misconceptions" is something AI systems currently struggle with because:

Right — the "what were you thinking?" question is the core capability gap.

Review Lesson 1's section on what the software got wrong.

3. Dr. Immordino-Yang's neuroscience research is relevant to the AI-vs-teacher debate because it shows:

Yes — this is one of the most important scientific points in the module. The social bond isn't extra. It's mechanistic.

Review the Immordino-Yang paragraph in Lesson 1.

4. Jill Watson, the Georgia Tech AI teaching assistant, was indistinguishable from a human TA for four months mainly because:

Correct — the paradox is that she succeeded precisely because her role was limited to easy tasks.

Re-read the Jill Watson opening story. Notice what she was and wasn't asked to handle.

5. The distinction Dr. Nell Duke draws between a "learner profile" and "knowledge of a person" matters because:

Right — the Marcus example illustrates this precisely. Two weeks of missing homework is a real signal. What it means depends entirely on what you know about Marcus.

Review Lesson 2's section on data vs. understanding.

6. A student consistently answers math questions correctly on first attempt but takes three times the average time to do so. An AI system that detects this pattern should ideally:

Exactly — this is the scenario described in Lesson 2 as a genuine strength of AI learner profiling. It can catch specific patterns a busy teacher would miss.

Re-read the Lesson 2 section about what AI systems genuinely do well with learner data.

7. The Kansas vs. Newark comparison with Summit Learning shows that technology outcomes in education depend primarily on:

Correct — same platform, different outcomes, different teacher positioning. This is the central lesson of Lesson 3.

Review the Lesson 3 opening and the "tool vs. replacement" section.

8. Research on AI tutoring consistently finds that students who benefit least from self-paced AI learning are those who:

Right — and this creates the equity problem: the students who need the most help often benefit least from the models being adopted.

Review Lesson 3's "who benefits and who gets left behind" section.

9. The three characteristics the RAND Corporation found in successful blended-learning schools were: teacher authority to override AI, deep training in data interpretation, and:

Correct — the third characteristic is one people most often overlook: the intentional separation of time where humans interact without algorithmic mediation.

Review the final section of Lesson 3 on what successful blended schools had in common.

10. Robert Bjork's concept of "desirable difficulties" suggests that AI systems designed to minimize student frustration may:

Yes — and the COVID retention study is the empirical evidence that this isn't theoretical. Smoother learning produced weaker retention.

Review Lesson 4's hidden tradeoffs section, specifically the efficiency vs. productive struggle tradeoff.

11. The "commons problem" in personalized learning is the risk that:

Correct — perfect individual optimization can degrade the collective experience. Both matter for what school is for.

Find the "personalization vs. shared experience" tradeoff in Lesson 4.

12. A 2023 meta-analysis in Educational Psychology Review found AI tutoring produced average math gains of 0.33 standard deviations. The lesson's most accurate characterization of this finding is:

Exactly — the lesson affirms the finding while contextualizing it: skill gains are real but separate from the harder-to-measure outcomes that education is ultimately for.

Review the "What We Actually Know" section in Lesson 4.

13. The LAUSD Brainly case shows that the same AI platform can be harmful in one use case (completion tool) and potentially beneficial in another (revision feedback tool). This suggests:

Right — Superintendent Carvalho's own words: "We moved fast. We needed to move more thoughtfully." Policy design, not technology quality, was the failure point.

Review the LAUSD story and the paragraph about policy lag in Lesson 4.

14. AI tutoring systems can only optimize for what they can measure. The most significant educational consequence of this limitation is:

Exactly — this is one of the most important structural critiques in the module. What gets measured gets optimized. What can't be measured gradually disappears from the design.

Find the third hidden tradeoff in Lesson 4: "measurement vs. unmeasurable growth."

15. The module argues that "Can AI replace a teacher?" is the wrong question. You are advising a school board member who asks it anyway. The most useful reframe is:

Exactly right — this is the analytical move the entire module has been building toward. The reframe from "can it simulate?" to "what does it actually require?" changes what decisions look like.

Review the final gold callout in Lesson 4. The reframe is stated directly there.