In the fall of 2013, the Rocketship Education network — a chain of charter schools operating in California, Wisconsin, and Tennessee — made a decision that alarmed educators nationwide. It sharply cut its teaching staff and replaced large blocks of class time with a system called Learning Lab: rows of students at computers, headphones on, working through math and reading programs on their own while a small number of adults monitored the room.
The software was adaptive — meaning it tracked each student's answers and adjusted the difficulty of the next question automatically. The pitch was compelling: personalized instruction at scale, available all day, at a fraction of the cost of a full classroom teacher. Rocketship's founder, John Danner, told reporters that technology could deliver "the equivalent of a master teacher" to every student.
By 2015, the results were in. Test scores had stagnated. Parents were furious. Students described feeling isolated and confused, with no one to ask when the software gave them a problem they didn't understand. Rocketship quietly scaled back the Lab model and began rehiring teachers. The experiment had failed — not because the software was bad at what it did, but because what it did wasn't enough.
Before we call Rocketship's experiment a simple failure, it's worth being precise. The adaptive software they used — platforms similar to what would later become Khan Academy, DreamBox, and others — was genuinely doing something useful. It was tracking which specific skills each student had mastered and which they hadn't. It was generating an almost unlimited supply of practice problems. It was giving instant feedback: right or wrong, try again.
These are things that are genuinely hard for a single human teacher to do for thirty students at once. A teacher with thirty kids cannot watch every student's pencil in real time. She cannot give every student a different problem matched to their exact level. She cannot grade every answer the moment it's written. The software could do all of that, and it did it well.
This is the part that AI tutoring boosters are right about. Personalization at scale — adjusting what you teach to what this specific student needs right now — is something AI systems can do faster and more consistently than any individual human. And in the years since Rocketship's experiment, the research has caught up: studies from 2019 and 2022 on platforms like Carnegie Learning's MATHia consistently show that students using well-designed AI tutors improve their procedural math skills faster than students in traditional practice sessions.
Adaptive learning system: software that changes the difficulty, type, or sequence of problems based on a student's responses. Think of it like a video game that gets harder in exactly the right way — not too fast, not too slow.
So the software wasn't wrong. It was limited. And understanding the difference between "wrong" and "limited" is exactly the kind of thinking that will help you make sense of every headline you read about AI in education for the rest of your life.
Here's the problem Rocketship discovered the hard way: knowing you got the answer wrong is not the same as understanding why.
When a student at a Rocketship Learning Lab hit a problem she couldn't solve, the software would tell her it was incorrect and offer a hint — usually a restated version of the same rule she'd already read. If she still didn't get it, the system would loop back to an easier version of the problem. What it couldn't do was ask her: "What were you thinking when you tried that? Walk me through it."
That question — "what were you thinking?" — is one of the most powerful things a teacher does. It's called probing for misconceptions. A student who writes 2 + 3 = 6 isn't "bad at math." She may have a specific, identifiable misunderstanding — maybe she's counting both numbers themselves in her mental model. A teacher who asks the right question can find that misunderstanding in thirty seconds. Software in 2013 (and mostly even now) can only detect the symptom: the wrong answer.
There's something else the software couldn't provide that students consistently said they missed: the feeling that someone in the room actually cared whether they understood. Dr. Mary Helen Immordino-Yang, a neuroscientist at USC who studies emotion and learning, published research in 2015 showing that the same brain regions involved in social bonding are activated during deep learning. In other words: caring about your teacher — feeling that they care about you — is not separate from learning. It is part of learning.
When people debate whether AI can replace teachers, they usually focus on information delivery. Can it explain fractions? Can it quiz vocabulary? But the real question is different: Can it understand why a student got something wrong — not just that they did? And can it make that student feel like their confusion matters to someone? Those are different skills entirely, and they matter more than most people realize.
Here is the ethical question that the Rocketship story forces into the open, and nobody has resolved it neatly: If an AI tutoring system produces measurably better test scores but students report feeling more alone and less motivated — is that a good outcome or a bad one?
You might say: scores matter, because scores affect what college you get into, what jobs open up, what your life looks like. That's real. You might also say: motivation is what carries you through the rest of your life, long after any particular test score is forgotten. That's also real. These things are in genuine tension, and the tension doesn't resolve.
Rocketship's case added another layer. The schools that used Learning Lab most heavily served low-income students — kids in communities where underfunded schools had chronically struggled to recruit and retain experienced teachers. Proponents of the Lab model argued: a software tutor that's always available, never burns out, never leaves for a better-paying school, is better than nothing. Critics argued: the students who most need mentorship, human connection, and someone who knows their name are exactly the students being given a screen instead. Both sides had a point. Neither side was simply wrong.
Is it better to give a student consistent-but-impersonal AI instruction, or inconsistent-but-human teaching that depends on individual teachers who may stay or go? There is no clean answer. What you believe here depends on what you think education is fundamentally for.
This module — "Can AI Replace Your Favorite Teacher?" — is not a debate prompt with a predetermined answer. It is an investigation. Over the next four lessons, you're going to look at specific things teachers do, specific things AI systems can do, and the cases where the two overlap, the cases where they diverge, and the cases where their interaction creates something neither could do alone.
Lesson 1 set the stage: AI tutoring can do real things well, but its limitations aren't technical bugs to be fixed later — some of them are fundamental to what AI is. Lessons 2, 3, and 4 will go deeper: into what "knowing a student" really means, into what happens in classrooms that are trying to combine AI and human teachers right now, and into what gets lost and what gets gained.
By the end of this module, you will have thought about education the way researchers and school designers think about it — not as a delivery system for information, but as a complex human process that AI is trying to participate in, with mixed results, with real stakes, and with no easy verdict in sight.
You're writing a brief "autopsy report" on the Rocketship Learning Lab experiment — not for a grade, but for a school board considering a similar move. Your AI research partner has read the same case study you have. It will challenge your conclusions, ask for evidence, and disagree when it thinks you're oversimplifying. It will not lecture you or give you a summary to copy.
Take a position on this question and defend it — then be willing to refine it based on pushback.
In the spring of 2018, Georgia Tech ran an experiment that made international headlines. A professor named Ashok Goel had been using an AI teaching assistant in his online Knowledge-Based Artificial Intelligence course — a grad-level class with hundreds of students. He named the AI Jill Watson, after IBM's Watson platform it was built on. Jill answered student questions on the course forum: assignment deadlines, clarifications on readings, logistical questions.
For four months, not a single student realized Jill was an AI. When Goel revealed the truth at the semester's end, many students were astonished — and impressed. Some said Jill had been one of the most responsive TAs they'd ever had. The story ran in The Washington Post, The Atlantic, and dozens of other outlets. The headline was always some version of: AI passes as human teaching assistant.
But here is what those headlines almost never mentioned: Jill Watson answered questions about logistics. When a student wrote in with a real conceptual struggle — "I don't understand why this algorithm fails on this type of input" — Jill routed the question to a human TA. When a student wrote in sounding distressed — overwhelmed, behind, on the verge of dropping the course — Jill either didn't respond or gave a generic reply. The things she couldn't handle were the things that most needed handling. The things she did brilliantly were the things that, in a different world, wouldn't need a human at all.
Here is a thing modern AI tutoring systems genuinely do: they accumulate data about you. Your response time on each question. Which types of problems you skip and come back to. Whether you tend to get things right on the first try or the third. How long you stay in a session before disengaging. This data can be extraordinarily detailed — some platforms track mouse movements and hesitation patterns — and it can reveal real patterns.
For example: a student who consistently gets a type of problem right on timed tests but takes four times as long as average might be someone who understands the concept but hasn't yet automated the procedure. A system that detects this can serve up extra practice specifically designed to build speed. That's useful. That's real. That's not what any teacher can do with thirty students while also managing discussion and grading and fifty other things.
But here's the question: does that system know the student?
Dr. Nell Duke, a literacy researcher at the University of Michigan, made a distinction in a 2020 interview that stuck with me: the difference between a profile of a learner and knowledge of a person. A profile is a collection of behavioral signals — what you do, how fast, how often, with what result. Knowledge of a person includes their history, their fears, their self-concept, what happened at home this morning, whether they're the kind of kid who shuts down when challenged or the kind who rises to it. A profile can be compiled by software. Knowledge of a person, Duke argued, requires time and relationship — and it changes what you do with everything else you know about them.
A teacher who knows a student does something AI systems currently cannot: she adjusts her interpretation of behavior based on context she can't formally measure.
Consider this: a student named Marcus hasn't turned in homework for two weeks. An AI system flags this as a performance problem and serves up remedial practice. A teacher who knows Marcus knows that his parents separated last month and that he has been coming to school without breakfast. She doesn't send him to the back of the room with a computer. She has a quiet conversation. She finds out that Marcus is actually keeping up — he's doing the work in his head on the bus, he just can't bring himself to write it down right now. She makes a temporary accommodation. Three weeks later, Marcus is back.
This is not a story about technology being evil. It's a story about the difference between behavioral data and human context. The AI flagged a real signal — two weeks of missing homework is a real signal. But it couldn't interpret that signal the way someone who knew Marcus could interpret it. And crucially: it had no mechanism for the conversation. It could only respond to what Marcus had already done, not to what Marcus needed to hear.
Several U.S. school districts — including Los Angeles Unified and Houston ISD — use "early warning systems" that flag students as at-risk based on AI-analyzed attendance, grade, and behavior data. These systems can identify students who might otherwise be overlooked. But researchers at the University of Chicago's Education Lab have documented cases where the flags generate bureaucratic responses (forms, interventions pre-set by software) rather than the kind of individualized conversation that would actually help. The data is real. The response it triggers may not match what the student needs. This is a policy design problem, not just a technology problem — and it's being argued about in school board rooms right now.
There is an ethical undercurrent to this worth naming: when schools adopt AI systems that compile detailed behavioral profiles of students — and then make consequential decisions based on those profiles — who owns that data? Who can see it? What happens to a flag that was wrong? These are not hypothetical future concerns. They're questions that privacy researchers and education advocates are fighting about in courts and legislatures today.
The Jill Watson story is not just interesting — it's a paradox. Students reported that Jill was one of the best TAs they'd experienced. And for the things Jill did — fast, clear, always available — that was probably true. But the things Jill was best at were the things that mattered least: logistics. What does "best TA" even mean if the criteria are speed and availability for low-stakes questions?
What Jill Watson revealed is that a significant portion of the interactions between students and teaching assistants are transactional — they're about information transfer, not understanding. You can absolutely automate information transfer. Where it gets harder is the messy middle: a student who asks a question that sounds logistical but is actually an expression of deeper confusion. A student whose question is technically answerable but who really just needs someone to tell them they're going to be okay. A student who asks the same question three different ways because they don't have the vocabulary yet to ask what they actually want to know.
None of that requires Jill Watson to be bad at her job. It just requires being honest about what job she's actually doing.
When you hear that an AI "passed as a human teacher" or "taught as well as a human TA," the first question to ask is: in what domain, on what tasks, measured how? The Jill Watson story looks like a milestone. Look more closely and it's a story about successfully automating the least important parts of a teaching assistant's job. That's still useful! But it's not what the headlines claimed.
You've been hired by a school board to review the data collection practices of a proposed AI tutoring platform. The platform wants to collect: response time per question, hesitation patterns (when a cursor hovers without clicking), session duration, number of retries, and a summary of which content a student skips. Your job is to make a recommendation: which of these should be allowed, which should be restricted, and why.
Your AI debate partner has different instincts than you do and will challenge wherever you draw the line.
In 2019, Summit Learning — a platform developed originally for Summit Public Schools in California and later funded by a Chan Zuckerberg Initiative grant — was operating in over 380 schools across the United States. The model was ambitious: students would spend significant portions of their school day working through a personalized online curriculum on their own, meeting with teachers only for short "mentor sessions" and project work.
The response was not uniformly positive. In October 2019, parents in McPherson, Kansas organized a protest that made national news, pulling their children from school for a day and presenting a 300-signature petition demanding the platform be removed. Their complaints were specific: students reported staring at screens for hours, feeling isolated, losing motivation, and falling behind when they got stuck with no one to help. Teachers, several said, had become administrators of the system rather than instructors.
But in other schools — including several in Newark, New Jersey — teachers who had been trained to use Summit Learning as a complement rather than a replacement reported something different. Marcus Turner, a seventh-grade social studies teacher at a Newark charter school who piloted the platform in 2019, described it this way in an Education Week interview: "I used to spend the first fifteen minutes of class figuring out where everyone was. Now the system tells me. I spend those fifteen minutes actually talking to kids." Turner's test scores went up. More importantly, he said, he felt like he knew his students better — because the administrative load had moved to the machine.
Marcus Turner's experience and the McPherson parents' experience happened with the same platform. The difference was how teachers were positioned relative to the technology. In Kansas, teachers had been partially displaced — the platform was doing much of the instruction, and teachers were managing behavior and logistics. In Newark, teachers had been freed from administrative work so they could do more of what only humans can do: have conversations, read emotional states, build relationships.
This is the distinction that research on classroom technology consistently surfaces. A 2021 meta-analysis by Yudong Ren and colleagues at Zhejiang University, examining 43 studies of AI-augmented classrooms, found that outcomes were positive when teachers received training that positioned the AI as a diagnostic and administrative tool — and were neutral or negative when the AI was positioned as an instructional replacement. The technology was the same in both conditions. The teacher's role was different.
Think of it this way: a surgeon with a robotic arm can perform more precise operations than a surgeon without one. Nobody says the robotic arm replaced the surgeon. The arm does things a human hand cannot. The surgeon still decides what to do, why to do it, and what to do when something unexpected happens. The question for AI in education is whether schools are buying a better scalpel or a cheaper substitute for the doctor. The answer, so far, is: sometimes one, sometimes the other, and it mostly depends on the choices school administrators make — not on the technology itself.
Here is where the story gets harder and more honest. Not every student benefits equally when AI tutoring is introduced alongside human teaching. Research on the use of platforms like DreamBox and Carnegie Learning in mixed-use classrooms has found a consistent pattern: students who are already performing near or above grade level tend to benefit from AI-driven personalized practice. Students who are significantly below grade level — and, critically, students who lack strong self-regulation skills (the ability to stay on task without external direction) — often fall further behind in partially self-paced AI models.
The reason is subtle but important. AI-driven learning, even when it's adaptive, requires a student to keep showing up, keep trying, keep reading feedback, and keep adjusting. A student who has strong metacognitive skills — who can notice when they're confused and do something about it — thrives in this environment. A student who needs someone to notice they're confused from across the room and come to them — who needs the teacher to come to them rather than flagging them in a dashboard — can fall through the cracks in a model that assumes self-direction.
Dr. Benjamin Herold, an education journalist who spent two years examining technology's effects on learning for Education Week, documented this pattern in a 2020 report: "In school after school, we found that AI-augmented instruction was working well for students who already had the habits of mind to use it — and struggling to reach the students who needed the most help."
If AI tutoring helps high-performing students more than it helps struggling students, and schools adopt it because it helps "students on average" — are they making education more unequal even while improving it? How much does "on average" matter when the gains aren't evenly distributed?
In 2022, the RAND Corporation published a study of 40 schools that had been operating "blended learning" models — combining AI-driven instruction with human teaching — for at least three years. The schools that showed consistent, equitable gains across different student groups had several things in common that the others didn't.
First: teachers had genuine authority over how the AI was used. They could override its recommendations. They could pull students away from the platform entirely when they judged it wasn't the right moment. The AI was advisory, not directive.
Second: teachers had been given significant professional development — not just "how to use the software," but deep training in interpreting the data it generated and knowing when the data was misleading. A dashboard that says "Marcus is behind on Module 4" is just a sentence without context. Teachers who could read that sentence as a starting point for a conversation — not as a verdict — were the ones whose students did best.
Third: the schools had explicitly designed time for relationship-building that was off the platform entirely. Not as a break, but as a structural commitment — here is when humans talk to humans, and no algorithm is involved.
This is what good human-AI collaboration looks like in a classroom: not AI that tries to simulate a teacher, and not teachers who supervise AI. It looks like each doing what they're actually good at — and someone in charge who understands the difference.
When you hear that a school is "using AI" — the follow-up question is always: who has authority over the AI's recommendations? If teachers can override it, interpret it, and ignore it when they judge it's wrong, you're looking at augmentation. If the AI's dashboard is treated as the source of truth and teachers implement its recommendations without questioning them, you're looking at something closer to replacement — regardless of what the school calls it.
You have five minutes to pitch a blended-learning model to a school board that has heard about both the Kansas protest and the Newark success. They want to know: how will your model avoid the Kansas failure? What specifically will teachers do that the AI won't? And what happens when the AI's recommendation is wrong?
Your AI partner is playing a skeptical board member who has seen too many ed-tech promises fall apart. It will ask hard questions. Your job is to give specific, honest answers — not a sales pitch.
In November 2023, the Los Angeles Unified School District — the second-largest school district in the United States, serving over 600,000 students — announced it was ending its partnership with Brainly, an AI homework-help platform it had rolled out to all students just months earlier. The district had spent $6 million on the contract. The abrupt cancellation came after reports that the platform was generating inaccurate answers on history and science questions, and after parents and teachers raised concerns about students using it to complete assignments rather than learn the underlying material.
But buried in the coverage was a more complicated story. Not all LAUSD teachers wanted the platform gone. A group of high school English teachers in the district's pilot program reported that when their students used Brainly as a drafting and revision tool — writing an essay first, then using the AI to get feedback on clarity — their essays improved more than in a control group that received only teacher feedback. The issue wasn't that the platform was entirely harmful. It was that no one had designed a coherent policy for how it should be used, by whom, for what purpose.
Superintendent Alberto Carvalho was candid in a press conference: "We moved fast. We needed to move more thoughtfully." The $6 million was gone. The debate about how to use AI in LAUSD's classrooms continued without the platform that had started it.
The LAUSD case is a compressed version of a pattern playing out in districts across the country. Technology arrives quickly — often driven by vendor marketing, federal grants, or competitive pressure from neighboring districts. Policy about how to use it arrives slowly, if at all. The gap between the two is where most of the damage happens.
But there's a deeper issue than policy lag. Every design decision about how AI is used in a classroom encodes an assumption about what education is for. And those assumptions are often not made explicit — or even consciously made — by the people making the decisions.
Here are three tradeoffs that are built into almost every AI tutoring system, and that are almost never discussed openly when schools adopt them:
Efficiency vs. productive struggle. Adaptive AI systems are designed to keep students from spending too long on problems they can't solve — they adjust the difficulty downward before frustration gets too high. This is kind. It is also, according to cognitive scientists like Robert Bjork at UCLA, potentially counterproductive. Bjork's research on "desirable difficulties" shows that the struggle of working through a hard problem — the frustration itself — is part of what drives learning into long-term memory. A system optimized to minimize frustration may also minimize the depth of learning. No one selling the system will tell you this.
Personalization vs. shared experience. When every student is on a different part of the curriculum, tailored to their individual level, no two students in the room are having the same experience. This is individually optimized. But classrooms have historically also been places where students learn to navigate disagreement with people who think differently, to build on each other's half-formed ideas, to experience a moment of collective understanding. Some researchers call this the "commons problem" of personalized learning — the more perfectly personalized it is, the less of a shared experience there is to build a classroom community around.
Measurement vs. unmeasurable growth. AI systems can only optimize for things they can measure: correct answers, completion rates, time on task. But some of the most important things that happen in a classroom are not measurable by any system. The moment a student realizes they love history. The argument during a discussion that changes how a student sees the world. The teacher who says the exact right thing at the exact right moment because she has been paying attention to this specific kid for seven months. These things don't appear in dashboards. And systems that optimize for what they can measure will, over time, produce more of what they measure and less of what they can't.
If schools must choose between "measurably better test scores" and "unmeasurably better experiences of learning" — and if the budget only allows for one — what should they choose? And who gets to make that choice: school boards, parents, teachers, students, or researchers? There's no clean answer. But the fact that the question is rarely asked openly is itself a problem.
It's worth being honest about the state of the evidence, because the debate about AI in education often happens with more certainty than the research supports.
What the research fairly clearly shows: well-designed AI tutoring systems — ones where students use them actively, with teacher support, in an environment where asking for help is normalized — can meaningfully accelerate skill acquisition in math and basic reading. The effect sizes are real, if not enormous. A 2023 meta-analysis in Educational Psychology Review found average gains of 0.33 standard deviations in math for AI-tutored students versus control groups. That's meaningful. It's roughly equivalent to reducing class size by 7 or 8 students.
What the research does not clearly show: that AI tutoring produces better long-term retention, deeper conceptual understanding, greater intrinsic motivation, or stronger ability to transfer learning to new situations. These are the outcomes that matter most for what most people say education is for — preparing students to think, to learn new things throughout their lives, to solve problems that don't have predetermined answers. The studies that test for these outcomes are fewer, harder to run, and more ambiguous in their findings.
And there is one finding that almost no one in the ed-tech industry is eager to discuss: a 2022 study in Science by Claudia Wallis and colleagues found that students who used AI tutoring heavily during COVID-19 remote learning showed stronger gains on immediate post-tests — but weaker retention six months later compared to students who had struggled through material with less AI support. The frustration that the AI had smoothed away appears to have been doing some important work.
You've now thought about this question from four angles. Here is what the evidence and the cases and the research add up to — at this point in time, with the technology that exists right now:
AI can replace the parts of teaching that are most like information management: tracking who has mastered what, generating appropriately-leveled practice, giving instant feedback on whether an answer is right. It can do those things faster, more consistently, and at larger scale than any individual human.
AI cannot yet replace — and may never replace — the parts of teaching that depend on understanding a specific person in a specific moment: interpreting a behavior in light of a student's history, knowing when to push and when to ease off, saying the thing that a student needs to hear in a way that only makes sense because you've been paying attention to them for months. It cannot replicate the neurological effect of a student feeling that a real person cares whether they understand. It cannot protect against its own tendency to optimize for what it can measure and miss what it can't.
What AI can do, in the right hands, with the right institutional choices, is make human teachers more capable — by freeing them from the parts of teaching that don't require a human and giving them better information about where to direct their human attention.
Whether that's what actually happens depends entirely on choices made by school administrators, policymakers, parents, teachers, and eventually — as you get older and start having a voice in these decisions — by people like you.
The question "can AI replace a teacher?" is the wrong question. It focuses on whether AI can simulate enough teacher behaviors to count as a substitute. The right question is: what is teaching actually for — and which parts of it require a human to work? When you ask it that way, the answer isn't "yes" or "no." It's a map of what to protect and what to let change. Most people in the debate never draw that map. Now you can.
A state legislature is debating a bill that would require all public schools to adopt AI tutoring platforms and measure their effectiveness solely by standardized test score gains. You have two minutes to testify against the bill — or in favor of amending it. You need to use specific arguments from this module: the tradeoffs, the research limits, the question of what education is for.
Your AI partner is playing a legislator who supports the bill and thinks you're being obstructionist. They will challenge your evidence, your alternatives, and your values. You need to be specific, not just critical.