Module 4 · Lesson 1

The Augmentation Spectrum

From automation to amplification — mapping how AI and humans divide labor

When does AI replace human effort — and when does it multiply it?

In October 2023, radiologists at Mass General Brigham in Boston began working alongside an AI system trained to flag potential lung nodules in CT scans. The radiologists did not lose their jobs. Their interpretation speed increased by roughly 30 percent and they caught more clinically significant findings than before. The AI handled the mechanical pass; the physicians handled the judgment call. This is augmentation — not replacement.

Defining the Spectrum

Researchers at MIT's Work of the Future task force distinguish three broad modes of human-AI task division. At one end sits full automation — the AI completes a task end-to-end with no human in the loop. At the other end sits full human control — AI provides no direct help. Between those poles lies the most consequential territory: augmented collaboration, where AI handles specific sub-tasks while humans retain authority over context, judgment, and final decisions.

The key insight from MIT's 2022 "The Work of the Future" report is that the boundary between these modes is not fixed by technology — it is set by institutional design choices. Organizations that consciously place the dividing line tend to get better outcomes than those that let it drift.

AugmentationUsing AI to extend or enhance human capability, not substitute for it. The human retains decision authority; AI provides faster or more complete information.

Task DecompositionBreaking a job into sub-tasks so each can be assigned to whoever — human or AI — can do it best. Essential first step in designing any collaboration model.

Human-in-the-Loop (HITL)A system architecture in which a human must review or approve AI outputs before they take effect. Balances speed with accountability.

The Four Collaboration Archetypes

Based on documented deployments across healthcare, law, finance, and manufacturing, four recurring archetypes emerge:

Screener

AI filters; human decides. AI processes large volumes and surfaces candidates, anomalies, or risks. Human makes the final call. Example: JPMorgan Chase's COIN system in 2017 reviewed 12,000 commercial credit agreements per year in seconds — work that had taken 360,000 lawyer-hours annually — but lawyers still approved terms.

Amplifier

AI drafts; human refines. AI generates a first version — text, code, analysis — that the human critiques and improves. GitHub Copilot users at Microsoft in 2023 reported completing tasks 55% faster without reducing code quality; the human remained the final editor and architect.

Monitor

AI watches; human intervenes. AI provides continuous surveillance of systems, patients, or processes and alerts humans when thresholds are crossed. ICU monitoring at Emory Healthcare uses AI to flag deterioration 12 hours earlier than traditional methods, but nurses and physicians still direct interventions.

Coach

AI teaches; human grows. AI provides real-time feedback to improve human performance over time. Duolingo's internal teams and Georgia Tech's AI teaching assistant "Jill Watson" — which answered 40% of student questions in 2016 without students detecting it was AI — both fit this model when the goal is human skill development.

Why Augmentation Often Beats Automation

A 2023 study by Erik Brynjolfsson and colleagues at Stanford's Digital Economy Lab examined 5,000 customer service agents at a large U.S. software company using a generative AI tool. Productivity rose 14% on average — but crucially, the gains were concentrated among less-experienced workers who learned from AI suggestions. Experienced workers improved only marginally, and one group — those who stopped engaging critically with AI outputs — saw quality decline. The finding: augmentation works best when the human continues to learn, not when they delegate judgment entirely.

This aligns with research from Harvard Business School's Ethan Mollick and Lilach Mollick showing that AI "tutors" that explain their reasoning produce more durable skill gains than AI tools that simply provide answers. The mode of collaboration shapes what the human becomes.

Key Finding — Brynjolfsson et al., 2023

Among 5,179 customer service agents at a software company, those who engaged critically with AI suggestions improved 14% on productivity metrics. Those who accepted AI outputs without review showed no significant gain and some quality regression. Critical engagement is not optional — it is the mechanism of augmentation.

Design Principle

The most durable human-AI collaboration models are designed around what the human learns, not just what the AI produces. If the collaboration does not build human capability over time, you have automation with extra steps.

Lesson 1 Quiz

The Augmentation Spectrum · 5 questions

1. In the Brynjolfsson 2023 study of customer service agents, which group showed the greatest productivity gains from AI assistance?

Correct. Less-experienced workers gained 14% productivity improvements by critically engaging with AI suggestions, in effect learning from them. Experienced workers improved only marginally.

Not quite. The Brynjolfsson study found that less-experienced workers who engaged critically with AI outputs showed the greatest gains, because they were effectively learning from the AI's suggestions.

2. JPMorgan Chase's COIN system is an example of which collaboration archetype?

Correct. COIN reviewed 12,000 commercial credit agreements, surfacing candidates for human review. The lawyers retained final decision authority — the classic Screener model.

Not quite. COIN fits the Screener archetype: it processed massive volumes of contracts and surfaced results, but lawyers retained authority over final terms.

3. What does "Human-in-the-Loop" (HITL) mean in a collaboration architecture?

Correct. HITL means the human reviews or approves outputs before they have real-world effect — balancing AI speed with human accountability.

Not quite. HITL specifically means a human must review or approve AI outputs before they take effect — it is a checkpoint in the workflow, not just prompt writing.

4. Georgia Tech's "Jill Watson" AI teaching assistant best exemplifies which archetype?

Correct. Jill Watson answered 40% of student questions to support their learning, fitting the Coach archetype — AI teaches, human grows.

Not quite. Jill Watson answered 40% of student questions — a coaching function aimed at developing student knowledge, not filtering or monitoring.

5. According to MIT's Work of the Future research, what primarily determines where the boundary between human and AI work is set?

Correct. MIT's 2022 report emphasizes that the boundary is set by institutional design choices, not fixed by technology — organizations that consciously place the line achieve better outcomes.

Not quite. MIT's key finding is that the boundary is not fixed by technology — it is determined by institutional design choices. Organizations that consciously set it outperform those that let it drift.

Lab 1: Mapping Collaboration Archetypes

Identify which archetype fits a given workflow — and why

Your Lab Task

You will describe a real or hypothetical workplace workflow to the AI lab assistant. Together, you will identify which collaboration archetype it fits (Screener, Amplifier, Monitor, or Coach), explain the task decomposition logic, and discuss where the HITL checkpoint should be placed.

Complete at least 3 exchanges to finish this lab.

Start by describing a workflow from your own field or industry — or ask the assistant to give you an example scenario to analyze. Then work through: What sub-tasks could AI handle? Which archetype applies? Where should the human checkpoint be?

AI Lab Assistant Collaboration Archetypes

Welcome to Lab 1. We're going to practice mapping workflows onto collaboration archetypes. Describe a workflow from your field — or I can give you a scenario to analyze. Which would you prefer?

Module 4 · Lesson 2

Centaur and Cyborg Strategies

Two proven team-design patterns for combining human and machine intelligence

Should humans and AI alternate roles — or fuse into a single continuous workflow?

In 2005, an unusual chess tournament called Freestyle Chess allowed any combination of human and computer players. The winners were not grandmasters. They were not supercomputers. They were two amateur players who had developed an exceptional process for switching fluidly between human intuition and engine calculation at precisely the right moments. Garry Kasparov, who observed the tournament, coined the term "centaur" for this model: a hybrid entity more powerful than either component alone.

The Centaur Model

In the centaur model, human and AI alternate control based on task type. The human drives when contextual judgment, ethical reasoning, or relational intelligence is required. The AI drives when pattern-matching, data synthesis, or computational precision is required. The handoff between them is explicit and deliberate.

This model was documented at scale when Boston Consulting Group ran a controlled experiment in 2023 involving 758 consultants using GPT-4. Consultants using a centaur strategy — dividing tasks between AI and human work — outperformed both those who used AI for everything and those who used no AI. The centaur group completed 12.2% more tasks, did so 25.1% faster, and produced results rated 40% higher in quality by independent evaluators.

BCG Study · Fabrizio Dell'Acqua et al., 2023

758 BCG consultants, GPT-4 access. Three groups: no AI, AI for everything, deliberate centaur strategy (explicit human/AI task division). Centaur group: +12.2% tasks completed, +25.1% speed, +40% quality ratings. The deliberate division of labor — not just AI access — drove the gains.

The Cyborg Model

In the cyborg model, human and AI work simultaneously and continuously, with no hard handoff. The AI is embedded in the human's workflow as a persistent cognitive layer — autocompleting, suggesting, flagging, and translating in real time while the human continues to act. The human does not "hand off" to AI; they think with it.

GitHub Copilot operating inside VS Code is the canonical example. Developers do not stop coding to consult the AI; suggestions appear inline as they type. A 2023 GitHub study showed developers accepted approximately 30% of Copilot suggestions and completed coding tasks 55% faster — but the key design feature is continuous presence, not periodic consultation.

Microsoft's integration of Copilot into Word, Excel, and Outlook follows the same cyborg logic: the AI is present in every document, every spreadsheet, every email, available without context-switching.

Centaur — Best For

High-stakes decisions where errors are costly. Complex projects requiring clear accountability. Tasks where human ethical or relational judgment is non-negotiable. Legal, medical, and financial domains.

Cyborg — Best For

High-volume knowledge work where speed matters. Creative and writing tasks. Software development. Any workflow where context-switching between AI and human tools creates friction that degrades output quality.

The Dell'Acqua Warning: "AI Halos"

The BCG study also contained a cautionary finding. When consultants were given tasks that fell outside GPT-4's actual competence — but inside its confident-sounding output range — those who relied most heavily on the AI performed worse than the no-AI group. Dell'Acqua called this the "jagged frontier" problem: AI capability is uneven, but AI confidence is uniform. Workers who had not internalized where the frontier lay trusted AI outputs even when they should not have.

This produced what the researchers called an "AI halo effect" — the illusion of competence conferred by fluent AI-generated text. The implication for collaboration design is direct: centaur and cyborg strategies require workers to maintain a calibrated mental model of where AI is reliable and where it is not. That calibration is itself a skill that must be developed and maintained.

Jagged FrontierThe uneven boundary of AI competence — AI can perform some very hard tasks excellently and some apparently easy tasks poorly, with no reliable external signal to distinguish them.

AI Halo EffectThe tendency to over-trust AI outputs because they are fluently expressed, even when the underlying content is incorrect or outside the model's actual competence.

Strategic Takeaway

Centaur and cyborg are not competing strategies — they are tools for different contexts. Skilled practitioners learn which to deploy and develop accurate maps of the jagged frontier in their domain. The BCG data suggests this calibration skill matters more than AI access itself.

Lesson 2 Quiz

Centaur and Cyborg Strategies · 5 questions

1. Garry Kasparov coined the term "centaur" after observing which event?

Correct. The 2005 Freestyle Chess tournament revealed that amateur players with good human-AI switching processes outperformed both grandmasters and supercomputers alone.

Not quite. The centaur concept emerged from the 2005 Freestyle Chess tournament, where amateurs with excellent process for switching between human intuition and engine calculation beat grandmasters.

2. In the BCG 2023 consultant experiment, what distinguished the highest-performing group?

Correct. The deliberate centaur strategy — not simply AI access — drove the +12.2% tasks, +25.1% speed, +40% quality gains in the BCG study.

Not quite. The key variable was deliberate task division — the centaur strategy. Access to AI alone did not produce the gains; it was the explicit allocation of which tasks went to AI and which to humans.

3. What is the defining feature of the cyborg model compared to the centaur model?

Correct. The cyborg model means continuous co-presence — AI is always active within the human's workflow, not consulted in discrete handoffs.

Not quite. The cyborg model's defining feature is continuous co-presence — the AI is always embedded in the workflow, like GitHub Copilot suggesting inline as developers type.

4. What does Dell'Acqua's "jagged frontier" concept describe?

Correct. The jagged frontier is AI's uneven competence — hard tasks done brilliantly, easy-seeming tasks done poorly — with uniformly confident output that makes the frontier invisible.

Not quite. The jagged frontier describes AI's uneven capability — brilliant on some hard tasks, poor on some easy ones — with AI confidence uniform across both, making it hard for workers to know when to trust outputs.

5. What is the "AI halo effect" as identified in the BCG study?

Correct. The AI halo effect is the tendency to over-trust fluent, confident-sounding AI output even when it is outside the AI's reliable range — a critical risk in both centaur and cyborg models.

Not quite. The AI halo effect is specifically about over-trusting AI outputs because they sound fluent and confident, even when they are wrong — the BCG study found this caused consultants to perform worse than the no-AI group on out-of-range tasks.

Lab 2: Centaur vs. Cyborg Design

Choose and justify a collaboration strategy for a real task scenario

Your Lab Task

The assistant will present you with two real-world task scenarios. For each, you will decide whether a centaur or cyborg strategy is more appropriate, justify your choice, and identify where the jagged frontier risk is highest. The goal is to build calibration — knowing when and why each model fits.

Complete at least 3 exchanges to finish this lab.

Ask the assistant for your first scenario, then argue your case for centaur or cyborg. Be ready to defend against the frontier risk question.

AI Lab Assistant Centaur / Cyborg Strategy

Ready for Lab 2. I'll give you two scenarios and we'll work through whether centaur or cyborg strategy fits each — and where the AI halo risk is highest. Type "give me scenario 1" to begin, or describe a task from your own work you'd like to analyze.

Module 4 · Lesson 3

Designing for Trust and Accountability

How organizations build human-AI teams that are reliable, auditable, and fair

When AI makes a consequential mistake, who is responsible — and how do you design to prevent it?

Amazon spent three years building an AI recruiting tool designed to rate job candidates on a scale of one to five stars. By 2018, the company had quietly abandoned the project. The system had learned to penalize résumés that included the word "women's" — as in "women's chess club" — and to downgrade graduates of all-women's colleges. The training data reflected Amazon's historical hiring patterns, which were male-dominated. The AI had learned and replicated discrimination. No single human had decided this; no single human had been watching. The accountability gap was designed in.

The Accountability Gap

When AI systems make consequential decisions — or support humans making them — two failure modes emerge. The first is diffusion of responsibility: because multiple parties (data scientists, product managers, deploying managers, end users) all touched the system, no one feels fully responsible for an outcome. The second is automation bias: humans defer to AI outputs even when their own judgment would be better, precisely because the AI creates a sense of institutional legitimacy.

Research by Madeleine Clare Elish published in 2019 in the journal Big Data & Society identified a third mode: the "moral crumple zone." In automated systems, when something goes wrong, accountability collapses onto the last human in the chain — often the least powerful person — regardless of where the actual failure originated. The human becomes the crumple zone that absorbs the impact of a system-level failure.

Automation BiasThe tendency of humans to favor AI-generated suggestions over their own judgment, even when the AI is wrong. Documented extensively in aviation, radiology, and financial trading contexts.

Moral Crumple ZoneThe human actor who absorbs accountability for an automated system's failure, regardless of their actual control over the outcome. Coined by Madeleine Clare Elish, 2019.

Algorithmic AuditingSystematic review of AI system outputs, training data, and decision logic to identify disparate impacts, errors, or drift over time. Required practice in responsible deployment.

Trust Calibration in Practice

The U.S. Department of Defense's 2023 "Responsible AI" implementation guidelines identify five properties required for trustworthy human-AI collaboration: reliability, security, explainability, traceability, and governability. Of these, explainability — the ability to understand why the AI produced a given output — has the most direct impact on human-AI collaboration quality.

Studies of radiologists working with AI diagnostic tools find that when AI systems provide explanation (e.g., highlighting which regions of an image drove a classification) rather than just conclusions, radiologists are better calibrated — more likely to override AI errors and less likely to override correct AI findings. Explanation enables accurate rather than blanket trust.

Research Finding — Aviation Context

NASA aviation safety research documents that pilots who understand why an autopilot system made a given decision maintain higher situation awareness and catch system errors earlier than pilots who treat autopilot as a black box. Explained AI and explained autopilot produce the same pattern: understanding the reasoning, not just the output, is the mechanism of calibrated trust.

Designing Accountability Into the Workflow

Three structural practices are documented to reduce accountability gaps in deployed human-AI systems:

Red-Team Reviews

Adversarial testing before deployment. A designated team attempts to find failure modes, biases, and edge cases. Microsoft's Responsible AI team and the UK's AI Safety Institute both mandate red-teaming before any high-stakes deployment.

Outcome Auditing

Systematic review of deployed decisions. Not inputs and logic only, but actual outcomes — by demographic, context, and time period. The EU AI Act (2024) requires outcome monitoring for high-risk AI systems as a condition of continued operation.

Override Culture

Normalizing human disagreement with AI. Organizations where overriding AI recommendations is penalized — formally or informally — produce worse outcomes than those where override is treated as healthy judgment. Cleveland Clinic's clinical AI governance framework explicitly tracks and reviews human override patterns as a quality signal.

Principle

Trust in a human-AI system is not a property of the AI — it is a property of the system design, including the humans, processes, and accountability structures around the AI. Trustworthy AI requires trustworthy institutional infrastructure.

Lesson 3 Quiz

Trust and Accountability · 5 questions

1. What flaw caused Amazon to abandon its AI recruiting tool by 2018?

Correct. The tool was trained on Amazon's historical hiring data, which was male-dominated. It learned to penalize résumés associated with women — a documented case of algorithmic discrimination through biased training data.

Not quite. The system learned to penalize résumés mentioning women's organizations and all-women's colleges, replicating discrimination embedded in the historical training data.

2. What is Madeleine Clare Elish's "moral crumple zone"?

Correct. The moral crumple zone is the often-low-power human who becomes the target of accountability when an automated system fails, regardless of where the actual error originated.

Not quite. The moral crumple zone is the last human in the chain who absorbs accountability for a system failure — often the least powerful person — regardless of their actual control over the outcome.

3. What does research on radiologists and AI diagnostic tools show about explainability?

Correct. Explanation — showing which image regions drove a classification — enables accurate calibration: radiologists override AI errors more and defer to AI correct findings more appropriately.

Not quite. Explainability enables calibration: when radiologists see why the AI made a call, they become more accurate at deciding when to trust and when to override — not uniformly more trusting or more skeptical.

4. Cleveland Clinic's clinical AI governance framework treats human overrides of AI recommendations as:

Correct. Cleveland Clinic tracks override patterns as a quality signal — normalizing human disagreement with AI and using those disagreements to improve the system.

Not quite. Cleveland Clinic's framework tracks overrides as a quality signal — treating human disagreement with AI as healthy judgment that should be analyzed, not penalized.

5. The EU AI Act (2024) requires which practice for high-risk AI systems as a condition of continued operation?

Correct. The EU AI Act mandates outcome monitoring for high-risk AI — not just pre-deployment testing, but ongoing review of actual decisions and their impacts by demographic and context.

Not quite. The EU AI Act requires outcome monitoring — systematic post-deployment review of actual decisions and their impacts — as a condition of continued operation for high-risk AI systems.

Lab 3: Accountability Gap Analysis

Identify and remediate accountability failures in a human-AI system

Your Lab Task

The assistant will present a short description of a deployed human-AI system. You will identify the accountability gaps — diffusion of responsibility, automation bias risks, or moral crumple zone conditions — and propose specific structural remedies (red-team review, outcome audit, override culture design, or others).

Complete at least 3 exchanges to finish this lab.

Type "give me a scenario" and the assistant will describe a deployed AI system with accountability weaknesses. Diagnose the gaps and propose fixes — then defend your recommendations.

AI Lab Assistant Accountability Design

Welcome to Lab 3. I'll describe a deployed human-AI system and you'll diagnose its accountability gaps and propose remedies. Ready when you are — type "give me a scenario" to start, or describe a system from your own context.

Module 4 · Lesson 4

Building Your Collaboration Playbook

A practical framework for choosing, implementing, and evolving human-AI partnerships at work

How do you move from knowing the models to actually implementing them in your organization?

In February 2024, Klarna announced that its AI assistant — built on OpenAI's technology — was handling the equivalent work of 700 customer service agents in its first month of deployment. By May 2024, CEO Sebastian Siemiatkowski announced the company had cut its workforce from 5,000 to 3,800 employees and would reduce to 2,000. But the story had a second chapter: by September 2024, Klarna was publicly advertising to rehire human customer service agents, with Siemiatkowski acknowledging that AI had performed worse than expected on complex, nuanced customer issues. The company had moved too fast to full automation and was walking back to augmentation.

The Klarna Lesson: Sequencing Matters

Klarna's experience illustrates what researchers at the Oxford Internet Institute call the "automation overshoot" pattern: organizations automate too fast, discover degraded quality on complex tasks, then rebuild human capacity — often at higher cost than the original workforce. The pattern is well-documented in call centers, content moderation, and financial services.

The organizational playbook that avoids overshoot has three phases, documented across successful deployments at organizations including Spotify, Unilever, and the U.S. Air Force's AI integration programs:

Phase 1 · Map

Task decomposition before any AI deployment. Break every role into sub-tasks. Classify each on two dimensions: (1) AI reliability in this domain and (2) cost of error. High reliability + low error cost = automate. Low reliability or high error cost = augment or leave human. This produces a task map that guides all subsequent decisions.

Phase 2 · Pilot

Augmentation before automation. Deploy AI in augmentation mode — human in the loop — even for tasks where full automation might eventually work. Collect real performance data in your specific context. The BCG and Brynjolfsson studies both found that the jagged frontier varies significantly by organizational context. Generic benchmarks do not predict local performance.

Phase 3 · Calibrate

Adjust the human-AI boundary based on evidence. Move tasks toward greater automation only as performance data justifies it. Build in systematic outcome reviews at fixed intervals. The U.S. Air Force's AI integration guidelines require quarterly "boundary reviews" — formal assessment of which tasks should shift between human and AI based on accumulated performance data.

The Skill Inventory Problem

A 2023 McKinsey Global Institute report on AI and the workforce identifies a structural risk in rapid AI adoption: skill atrophy. When humans stop performing a task because AI has taken it over, the human capacity to perform that task — and to catch AI errors in it — degrades. This creates a fragile dependency: if the AI fails, no one can fill the gap.

Unilever's AI governance framework addresses this explicitly. For each task classified as "AI primary," the company maintains a designated human who continues to perform the task at reduced frequency, solely to preserve the organizational competence to audit the AI and recover if needed. They call these positions "skill anchors."

55%

of tasks in knowledge work roles contain at least one sub-task with high AI automation potential (McKinsey, 2023)

30%

of those same roles have a core decision sub-task that AI performs at below-human accuracy in real deployments

3–5×

cost multiplier when organizations must rebuild human capacity after automation overshoot (Oxford Internet Institute estimate)

The Personal Playbook

The organizational framework scales down to the individual. Harvard Business School's Ethan Mollick recommends that individual professionals build a personal "AI map" covering: (1) which of their tasks AI currently performs reliably, (2) which it performs unreliably, (3) which judgment calls they must never delegate, and (4) which skills they must maintain through deliberate practice regardless of AI capability — their personal skill anchors.

This personal map should be updated quarterly, because the jagged frontier moves. AI capabilities in specific domains improve rapidly and unevenly. The professional who mapped their frontier in early 2023 and stopped updating it found by late 2024 that large sections had shifted — sometimes dramatically. Calibration is not a one-time exercise; it is an ongoing discipline.

Automation OvershootThe pattern of automating too quickly, discovering quality degradation on complex tasks, and having to rebuild human capacity at higher cost than the original workforce.

Skill AnchorA person designated to continue performing a task that AI primarily handles, solely to preserve organizational competence for auditing and recovery. Coined in Unilever's AI governance framework.

Module Synthesis

Effective human-AI collaboration is not a technology problem — it is an institutional design problem. The organizations and individuals that perform best with AI are those that map the frontier honestly, choose collaboration archetypes deliberately, build accountability structures proactively, and update their maps continuously. The tools are means; the design is the discipline.

Lesson 4 Quiz

Building Your Collaboration Playbook · 5 questions

1. What did Klarna's 2024 experience illustrate about rapid AI deployment?

Correct. Klarna automated too fast, found AI underperforming on complex issues, and was publicly rehiring human agents by September 2024 — a textbook automation overshoot.

Not quite. Klarna demonstrated automation overshoot: rapid replacement of 700-agent-equivalent work by AI, followed by discovery of quality failures on complex tasks, followed by rehiring — at higher net cost.

2. In the three-phase organizational playbook, what happens in Phase 1?

Correct. Phase 1 is mapping — breaking roles into sub-tasks and classifying each by AI reliability and error cost before any deployment decision is made.

Not quite. Phase 1 is Map — task decomposition before deployment. You classify every sub-task by AI reliability and error cost, creating a task map that guides all subsequent decisions.

3. What is the purpose of a "skill anchor" in Unilever's AI governance framework?

Correct. Skill anchors are people designated to keep performing AI-primary tasks, ensuring the organization retains the competence to audit AI and recover if it fails.

Not quite. A skill anchor is a human who keeps performing a task that AI primarily handles — at reduced frequency — to preserve the organizational ability to audit the AI and fill the gap if it fails.

4. Why do the BCG and Brynjolfsson studies suggest that generic AI benchmarks are insufficient for deployment planning?

Correct. Both studies found that AI performance varies substantially by context — the jagged frontier is different for each organization and workflow. Generic benchmarks cannot predict local performance.

Not quite. Both studies found that the jagged frontier is context-dependent. An AI that performs brilliantly on a benchmark may perform poorly in a specific organizational deployment — local pilot data is essential.

5. Ethan Mollick recommends that individual professionals update their personal AI map:

Correct. Mollick recommends quarterly updates because the frontier shifts rapidly — AI capabilities in specific domains improve unevenly and a map from one year can be significantly wrong the next.

Not quite. Mollick recommends quarterly updates, because AI capability evolves rapidly and unevenly. A frontier map from early 2023 was significantly different by late 2024 — calibration is ongoing, not episodic.

Lab 4: Build Your Personal AI Collaboration Map

Apply the full module framework to your own role

Your Lab Task

In this final lab, you will work with the assistant to build a personal AI collaboration map for your own role. You will identify sub-tasks, classify them by archetype and reliability, flag your personal skill anchors, and choose your centaur vs. cyborg strategy for each major workflow.

Complete at least 3 exchanges to finish this lab and unlock the Module Test.

Start by describing your current role and the 3–5 tasks you spend the most time on. The assistant will guide you through the mapping process — reliability classification, archetype assignment, frontier risks, and skill anchors.

AI Lab Assistant Personal Collaboration Map

Welcome to your final lab. We're building your personal AI collaboration map — a practical tool you can use and update in your own work. Start by telling me your role and the 3–5 tasks you spend the most time on. I'll guide you through classifying each one.

Module 4 Test

Human-AI Collaboration Models · 15 questions · Pass at 80%

1. Which collaboration archetype involves AI filtering large volumes so humans can make final decisions?

Correct. The Screener archetype: AI processes volume, surfaces candidates, humans decide.

Not quite. The Screener archetype is AI filters, human decides. COIN at JPMorgan is the canonical example.

2. In the centaur model, when does the human take control versus the AI?

Correct. In centaur, control alternates based on task type — explicit deliberate handoffs between human judgment and AI computation.

Not quite. Centaur means explicit task-based handoffs — human takes contextual/relational tasks, AI takes pattern-matching/computational tasks.

3. GitHub Copilot operating inline in VS Code is the canonical example of which model?

Correct. Copilot's inline suggestions — always present, no context switch — define the cyborg model.

Not quite. Copilot is cyborg: continuous AI presence embedded in the workflow, no discrete handoff moments.

4. The BCG 2023 consultant study found that consultants who used AI on tasks OUTSIDE its competence performed:

Correct. The AI halo effect: fluent AI outputs on out-of-range tasks caused consultants to trust wrong answers — performing worse than those with no AI access.

Not quite. On out-of-range tasks, heavy AI users performed worse than the no-AI group — the AI halo effect caused them to over-trust fluent but incorrect outputs.

5. "Diffusion of responsibility" in human-AI accountability means:

Correct. Diffusion of responsibility: many people touched the system, so no one accepts ownership of the failure — a key accountability gap in AI deployments.

Not quite. Diffusion of responsibility means that when many parties contributed to an AI system, no one feels fully accountable when it fails — a major governance risk.

6. Emory Healthcare's ICU AI monitoring system fits which archetype?

Correct. ICU monitoring is the Monitor archetype: continuous AI surveillance, human intervention when flagged — with Emory's system detecting deterioration 12 hours earlier than traditional methods.

Not quite. Emory's ICU system is a Monitor: AI watches continuously, alerts humans when thresholds are crossed, humans direct all interventions.

7. According to the MIT Work of the Future report, what primarily determines where the human-AI labor boundary is set?

Correct. MIT's key finding: the boundary is set by institutional design choices, not fixed by technology. Deliberate organizations outperform those that let the boundary drift.

Not quite. MIT's research shows the boundary is a design choice — technology sets possibilities, institutions set the actual boundary.

8. The "moral crumple zone" describes which phenomenon?

Correct. Elish's moral crumple zone: the last human in the chain — often the least powerful — absorbs accountability for system failures regardless of their actual control.

Not quite. The moral crumple zone is Elish's term for the human who becomes the accountability target for a system failure they did not cause or control.

9. What did the Brynjolfsson 2023 Stanford study find about workers who stopped engaging critically with AI outputs?

Correct. Workers who accepted AI outputs without review showed quality regression — critical engagement is the mechanism that produces both productivity and quality gains.

Not quite. Passive AI acceptance led to quality regression. Critical engagement — not just AI access — is the mechanism of augmentation.

10. What is "automation overshoot"?

Correct. Automation overshoot — documented at Klarna and across call centers and content moderation — means moving too fast to full automation and paying the cost of rebuilding.

Not quite. Automation overshoot: automate too fast, find quality failures on complex tasks, rebuild human capacity — often at 3–5x the original cost.

11. In the three-phase playbook, why does Phase 2 require deploying AI in augmentation mode before full automation?

Correct. Generic benchmarks do not predict local performance. Augmentation pilots generate the local data needed to know where automation is actually safe in your specific context.

Not quite. The jagged frontier is context-specific. You need local performance data — which augmentation pilots generate — before moving to automation.

12. Mass General Brigham radiologists using AI lung nodule detection experienced which outcome?

Correct. Mass General Brigham radiologists gained ~30% speed and improved clinically significant finding rates — a textbook augmentation outcome with the AI handling the mechanical pass and humans retaining judgment.

Not quite. Mass General Brigham radiologists working with AI lung nodule detection saw ~30% speed gains and improved significant finding rates — augmentation at work.

13. The U.S. Air Force's AI integration guidelines require quarterly "boundary reviews" to:

Correct. Quarterly boundary reviews are the Air Force's mechanism for calibrating the human-AI task boundary based on evidence — not set-and-forget automation.

Not quite. Quarterly boundary reviews assess which tasks should shift — toward more automation or more human control — based on accumulated local performance data.

14. What does research on NASA aviation safety and autopilot show about understanding AI/automated system reasoning?

Correct. Understanding reasoning — not just outcomes — produces calibrated trust: pilots catch errors without over-intervening on correct decisions.

Not quite. NASA research shows pilots who understand autopilot reasoning maintain better situation awareness and catch system errors earlier — understanding the "why" is the mechanism of calibrated trust.

15. Which of the following best defines "task decomposition" in collaboration design?

Correct. Task decomposition is the essential first step in any collaboration design: identify sub-tasks, then assign each to the agent — human or AI — best suited for it.

Not quite. Task decomposition means breaking a role or project into sub-tasks and assigning each to whoever can do it best — the foundational step in any collaboration model design.