In October 2023, radiologists at Mass General Brigham in Boston began working alongside an AI system trained to flag potential lung nodules in CT scans. The radiologists did not lose their jobs. Their interpretation speed increased by roughly 30 percent and they caught more clinically significant findings than before. The AI handled the mechanical pass; the physicians handled the judgment call. This is augmentation — not replacement.
Researchers at MIT's Work of the Future task force distinguish three broad modes of human-AI task division. At one end sits full automation — the AI completes a task end-to-end with no human in the loop. At the other end sits full human control — AI provides no direct help. Between those poles lies the most consequential territory: augmented collaboration, where AI handles specific sub-tasks while humans retain authority over context, judgment, and final decisions.
The key insight from MIT's 2022 "The Work of the Future" report is that the boundary between these modes is not fixed by technology — it is set by institutional design choices. Organizations that consciously place the dividing line tend to get better outcomes than those that let it drift.
Based on documented deployments across healthcare, law, finance, and manufacturing, four recurring archetypes emerge:
A 2023 study by Erik Brynjolfsson and colleagues at Stanford's Digital Economy Lab examined 5,000 customer service agents at a large U.S. software company using a generative AI tool. Productivity rose 14% on average — but crucially, the gains were concentrated among less-experienced workers who learned from AI suggestions. Experienced workers improved only marginally, and one group — those who stopped engaging critically with AI outputs — saw quality decline. The finding: augmentation works best when the human continues to learn, not when they delegate judgment entirely.
This aligns with research from Harvard Business School's Ethan Mollick and Lilach Mollick showing that AI "tutors" that explain their reasoning produce more durable skill gains than AI tools that simply provide answers. The mode of collaboration shapes what the human becomes.
Among 5,179 customer service agents at a software company, those who engaged critically with AI suggestions improved 14% on productivity metrics. Those who accepted AI outputs without review showed no significant gain and some quality regression. Critical engagement is not optional — it is the mechanism of augmentation.
The most durable human-AI collaboration models are designed around what the human learns, not just what the AI produces. If the collaboration does not build human capability over time, you have automation with extra steps.
You will describe a real or hypothetical workplace workflow to the AI lab assistant. Together, you will identify which collaboration archetype it fits (Screener, Amplifier, Monitor, or Coach), explain the task decomposition logic, and discuss where the HITL checkpoint should be placed.
Complete at least 3 exchanges to finish this lab.
In 2005, an unusual chess tournament called Freestyle Chess allowed any combination of human and computer players. The winners were not grandmasters. They were not supercomputers. They were two amateur players who had developed an exceptional process for switching fluidly between human intuition and engine calculation at precisely the right moments. Garry Kasparov, who observed the tournament, coined the term "centaur" for this model: a hybrid entity more powerful than either component alone.
In the centaur model, human and AI alternate control based on task type. The human drives when contextual judgment, ethical reasoning, or relational intelligence is required. The AI drives when pattern-matching, data synthesis, or computational precision is required. The handoff between them is explicit and deliberate.
This model was documented at scale when Boston Consulting Group ran a controlled experiment in 2023 involving 758 consultants using GPT-4. Consultants using a centaur strategy — dividing tasks between AI and human work — outperformed both those who used AI for everything and those who used no AI. The centaur group completed 12.2% more tasks, did so 25.1% faster, and produced results rated 40% higher in quality by independent evaluators.
758 BCG consultants, GPT-4 access. Three groups: no AI, AI for everything, deliberate centaur strategy (explicit human/AI task division). Centaur group: +12.2% tasks completed, +25.1% speed, +40% quality ratings. The deliberate division of labor — not just AI access — drove the gains.
In the cyborg model, human and AI work simultaneously and continuously, with no hard handoff. The AI is embedded in the human's workflow as a persistent cognitive layer — autocompleting, suggesting, flagging, and translating in real time while the human continues to act. The human does not "hand off" to AI; they think with it.
GitHub Copilot operating inside VS Code is the canonical example. Developers do not stop coding to consult the AI; suggestions appear inline as they type. A 2023 GitHub study showed developers accepted approximately 30% of Copilot suggestions and completed coding tasks 55% faster — but the key design feature is continuous presence, not periodic consultation.
Microsoft's integration of Copilot into Word, Excel, and Outlook follows the same cyborg logic: the AI is present in every document, every spreadsheet, every email, available without context-switching.
High-stakes decisions where errors are costly. Complex projects requiring clear accountability. Tasks where human ethical or relational judgment is non-negotiable. Legal, medical, and financial domains.
High-volume knowledge work where speed matters. Creative and writing tasks. Software development. Any workflow where context-switching between AI and human tools creates friction that degrades output quality.
The BCG study also contained a cautionary finding. When consultants were given tasks that fell outside GPT-4's actual competence — but inside its confident-sounding output range — those who relied most heavily on the AI performed worse than the no-AI group. Dell'Acqua called this the "jagged frontier" problem: AI capability is uneven, but AI confidence is uniform. Workers who had not internalized where the frontier lay trusted AI outputs even when they should not have.
This produced what the researchers called an "AI halo effect" — the illusion of competence conferred by fluent AI-generated text. The implication for collaboration design is direct: centaur and cyborg strategies require workers to maintain a calibrated mental model of where AI is reliable and where it is not. That calibration is itself a skill that must be developed and maintained.
Centaur and cyborg are not competing strategies — they are tools for different contexts. Skilled practitioners learn which to deploy and develop accurate maps of the jagged frontier in their domain. The BCG data suggests this calibration skill matters more than AI access itself.
The assistant will present you with two real-world task scenarios. For each, you will decide whether a centaur or cyborg strategy is more appropriate, justify your choice, and identify where the jagged frontier risk is highest. The goal is to build calibration — knowing when and why each model fits.
Complete at least 3 exchanges to finish this lab.
Amazon spent three years building an AI recruiting tool designed to rate job candidates on a scale of one to five stars. By 2018, the company had quietly abandoned the project. The system had learned to penalize résumés that included the word "women's" — as in "women's chess club" — and to downgrade graduates of all-women's colleges. The training data reflected Amazon's historical hiring patterns, which were male-dominated. The AI had learned and replicated discrimination. No single human had decided this; no single human had been watching. The accountability gap was designed in.
When AI systems make consequential decisions — or support humans making them — two failure modes emerge. The first is diffusion of responsibility: because multiple parties (data scientists, product managers, deploying managers, end users) all touched the system, no one feels fully responsible for an outcome. The second is automation bias: humans defer to AI outputs even when their own judgment would be better, precisely because the AI creates a sense of institutional legitimacy.
Research by Madeleine Clare Elish published in 2019 in the journal Big Data & Society identified a third mode: the "moral crumple zone." In automated systems, when something goes wrong, accountability collapses onto the last human in the chain — often the least powerful person — regardless of where the actual failure originated. The human becomes the crumple zone that absorbs the impact of a system-level failure.
The U.S. Department of Defense's 2023 "Responsible AI" implementation guidelines identify five properties required for trustworthy human-AI collaboration: reliability, security, explainability, traceability, and governability. Of these, explainability — the ability to understand why the AI produced a given output — has the most direct impact on human-AI collaboration quality.
Studies of radiologists working with AI diagnostic tools find that when AI systems provide explanation (e.g., highlighting which regions of an image drove a classification) rather than just conclusions, radiologists are better calibrated — more likely to override AI errors and less likely to override correct AI findings. Explanation enables accurate rather than blanket trust.
NASA aviation safety research documents that pilots who understand why an autopilot system made a given decision maintain higher situation awareness and catch system errors earlier than pilots who treat autopilot as a black box. Explained AI and explained autopilot produce the same pattern: understanding the reasoning, not just the output, is the mechanism of calibrated trust.
Three structural practices are documented to reduce accountability gaps in deployed human-AI systems:
Trust in a human-AI system is not a property of the AI — it is a property of the system design, including the humans, processes, and accountability structures around the AI. Trustworthy AI requires trustworthy institutional infrastructure.
The assistant will present a short description of a deployed human-AI system. You will identify the accountability gaps — diffusion of responsibility, automation bias risks, or moral crumple zone conditions — and propose specific structural remedies (red-team review, outcome audit, override culture design, or others).
Complete at least 3 exchanges to finish this lab.
In February 2024, Klarna announced that its AI assistant — built on OpenAI's technology — was handling the equivalent work of 700 customer service agents in its first month of deployment. By May 2024, CEO Sebastian Siemiatkowski announced the company had cut its workforce from 5,000 to 3,800 employees and would reduce to 2,000. But the story had a second chapter: by September 2024, Klarna was publicly advertising to rehire human customer service agents, with Siemiatkowski acknowledging that AI had performed worse than expected on complex, nuanced customer issues. The company had moved too fast to full automation and was walking back to augmentation.
Klarna's experience illustrates what researchers at the Oxford Internet Institute call the "automation overshoot" pattern: organizations automate too fast, discover degraded quality on complex tasks, then rebuild human capacity — often at higher cost than the original workforce. The pattern is well-documented in call centers, content moderation, and financial services.
The organizational playbook that avoids overshoot has three phases, documented across successful deployments at organizations including Spotify, Unilever, and the U.S. Air Force's AI integration programs:
A 2023 McKinsey Global Institute report on AI and the workforce identifies a structural risk in rapid AI adoption: skill atrophy. When humans stop performing a task because AI has taken it over, the human capacity to perform that task — and to catch AI errors in it — degrades. This creates a fragile dependency: if the AI fails, no one can fill the gap.
Unilever's AI governance framework addresses this explicitly. For each task classified as "AI primary," the company maintains a designated human who continues to perform the task at reduced frequency, solely to preserve the organizational competence to audit the AI and recover if needed. They call these positions "skill anchors."
The organizational framework scales down to the individual. Harvard Business School's Ethan Mollick recommends that individual professionals build a personal "AI map" covering: (1) which of their tasks AI currently performs reliably, (2) which it performs unreliably, (3) which judgment calls they must never delegate, and (4) which skills they must maintain through deliberate practice regardless of AI capability — their personal skill anchors.
This personal map should be updated quarterly, because the jagged frontier moves. AI capabilities in specific domains improve rapidly and unevenly. The professional who mapped their frontier in early 2023 and stopped updating it found by late 2024 that large sections had shifted — sometimes dramatically. Calibration is not a one-time exercise; it is an ongoing discipline.
Effective human-AI collaboration is not a technology problem — it is an institutional design problem. The organizations and individuals that perform best with AI are those that map the frontier honestly, choose collaboration archetypes deliberately, build accountability structures proactively, and update their maps continuously. The tools are means; the design is the discipline.
In this final lab, you will work with the assistant to build a personal AI collaboration map for your own role. You will identify sub-tasks, classify them by archetype and reliability, flag your personal skill anchors, and choose your centaur vs. cyborg strategy for each major workflow.
Complete at least 3 exchanges to finish this lab and unlock the Module Test.