How OpenClaw assigns, routes, and supervises multiple AI agents working in parallel on complex tasks.
In March 2024, Cognition AI publicly demonstrated Devin — the first AI software engineer capable of completing full software tasks end-to-end. Devin's architecture relied on an orchestrator layer that decomposed a single user request into subtasks, assigned each to a specialized sub-agent (browser, code editor, terminal), monitored execution states, and re-routed failed steps. When the browser sub-agent timed out on a package documentation lookup, the orchestrator detected the stall via a heartbeat check and rerouted the request to a cached documentation retrieval agent — all without user intervention. The resulting system completed 13.86% of SWE-bench tasks autonomously, a benchmark milestone that forced every major AI lab to publish their own multi-agent roadmaps within 90 days.
An orchestrator is the top-level controller in a multi-agent system. It receives a high-level goal, breaks it into concrete subtasks, selects which sub-agent should handle each subtask, passes appropriate context, and aggregates results into a coherent final output. The orchestrator never executes tasks itself — it delegates, monitors, and synthesizes.
OpenClaw implements orchestration through a three-phase loop: decompose → dispatch → integrate. During decomposition, the orchestrator uses a planning model (typically a larger, slower model) to generate a dependency graph of subtasks. During dispatch, it sends each leaf node in the graph to the appropriate sub-agent with a scoped context window. During integration, it merges sub-agent outputs, resolves conflicts, and determines whether the goal has been met or whether replanning is needed.
A single LLM using chain-of-thought is not an orchestrator. True orchestration requires separate execution contexts for each sub-agent — meaning independent token budgets, tool access, and memory scopes — coordinated by a distinct controller layer.
The planning model and the sub-agent models do not need to be the same. OpenClaw typically uses a high-capability model (Claude Opus class) for orchestration and faster, cheaper models (Claude Haiku class) for high-volume subtasks like web scraping, format conversion, or data extraction. This hybrid approach reduces cost by 60–80% compared to running all steps through a single large model.
OpenClaw supports three primary routing strategies, each suited to different task structures:
Real production systems combine all three. A typical OpenClaw pipeline might start with parallel data gathering, feed results through a sequential analysis chain, and use conditional routing at the final step to choose between a standard report format or an escalation workflow.
Anthropic's published research on multi-agent systems (2024) notes that conditional routing is the hardest to get right because it requires the orchestrator to evaluate intermediate quality — a judgment task that itself can fail silently. Most production OpenClaw deployments add a separate quality gate sub-agent that evaluates each intermediate output before routing continues.
Because sub-agents operate asynchronously, the orchestrator must actively monitor their state rather than waiting passively for a return value. OpenClaw implements a heartbeat protocol: each sub-agent sends a status token every N seconds. If the orchestrator does not receive a heartbeat within a configurable timeout window, it marks the sub-agent as stalled and triggers a recovery action.
Recovery actions cascade through a priority order: first, retry the same sub-agent with the same input; second, retry with a simplified or truncated input; third, route to a fallback sub-agent; fourth, escalate to human review. This cascade mirrors the circuit-breaker pattern from distributed systems engineering and is one of the primary reasons OpenClaw can operate autonomously for hours without supervision on complex pipelines.
3 questions — free, untracked, retake anytime.
Work with the AI to architect a multi-agent orchestration plan for a real research task.
You'll design an orchestration plan for a complex research task: automatically monitoring competitor product launches, summarizing key features, and alerting the product team when something significant is detected.
How to scope, constrain, and interface individual agents so they can work reliably inside a larger system.
In November 2023, AutoGPT — the open-source multi-agent framework that briefly became the fastest-growing GitHub repository in history with 150,000 stars in under a month — published a postmortem on why most user pipelines failed. The core finding: sub-agents were being given goals instead of tasks. A sub-agent told to "research competitors" would recursively spawn more sub-agents, exhaust token budgets, and loop indefinitely. The fix required enforcing three constraints on every sub-agent: a single, atomic task description; a maximum tool call count; and an explicit output schema that the orchestrator could validate. Pipelines that enforced all three constraints had an 87% success rate versus 23% for unconstrained sub-agents.
The single most important rule in sub-agent design is atomicity: each sub-agent should have exactly one well-defined task that can succeed or fail unambiguously. "Research competitors" is not atomic. "Fetch the pricing page at competitor.com and return the price of the Pro plan as a number" is atomic.
Atomic tasks have three properties: they have a clear completion condition, they produce a predictable output type, and they do not require the sub-agent to make judgment calls about scope. When a task requires judgment about scope, that judgment belongs to the orchestrator, not the sub-agent.
If you cannot write a unit test for a sub-agent's output — because you cannot define what "correct" looks like — the task is not atomic enough. Decompose it further before assigning it to a sub-agent.
OpenClaw enforces atomicity structurally: sub-agents cannot spawn other sub-agents. Only the orchestrator can create new sub-agent instances. This single architectural constraint eliminates the recursive spawning problem that caused AutoGPT pipelines to fail.
Every sub-agent in OpenClaw operates under an explicit output contract — a JSON schema that defines exactly what the sub-agent must return. The orchestrator validates every sub-agent output against its schema before integrating it. If validation fails, the orchestrator treats the result as a soft failure and triggers its recovery cascade.
Output schemas serve three purposes beyond just data formatting. First, they force sub-agent designers to think concretely about what success looks like before writing the prompt. Second, they prevent ambiguous outputs from silently corrupting downstream sub-agents. Third, they make the system auditable — every data transformation in the pipeline is documented by its schema transition.
Salesforce's Einstein Copilot, which uses a multi-agent architecture announced in February 2024, requires every sub-agent to return a structured CRM action object rather than natural language. This allows the orchestrator to directly validate, reject, or execute the action without parsing free text — reducing integration errors by over 70% compared to their earlier text-based prototype.
Sub-agents in OpenClaw are granted only the tools they need for their specific task — never a full tool suite. A web-scraping sub-agent gets browser access but not file-write access. A report-formatting sub-agent gets file-write access but not browser access. This principle of least privilege is not just a security measure; it also reduces the sub-agent's action space, which measurably improves reliability by eliminating decisions the sub-agent shouldn't be making.
Anthropic's research team demonstrated this in their 2024 paper on agentic systems: sub-agents given access to N tools where only 1 tool was relevant to their task had a 31% higher rate of tool misuse compared to sub-agents given access only to that 1 relevant tool. Scoping tools is one of the highest-leverage reliability interventions available to multi-agent system designers.
3 questions — free, untracked, retake anytime.
Practice writing atomic task descriptions and output schemas for real sub-agents.
You'll design two sub-agent specifications for the competitor monitoring system from Lab 1. Each spec needs an atomic task description, a tool access list, and an output schema.
How OpenClaw passes, scopes, and persists information across sub-agents without overflowing context windows.
In September 2023, Microsoft Research published "LongAgent," a paper demonstrating that naive multi-agent systems stuffed their entire conversation history into every sub-agent's context — causing context window overflows on tasks exceeding 128k tokens. Their solution, context scoping, became a foundational technique: the orchestrator maintains a master state document and passes each sub-agent only the slice of state relevant to its specific task. In benchmark testing, scoped context delivery reduced context overflow failures by 94% and improved task accuracy by 18% because sub-agents were no longer distracted by irrelevant prior steps. This technique is directly implemented in OpenClaw's context routing layer.
OpenClaw manages three distinct tiers of memory, each with different scope, persistence, and access patterns:
Sub-agents never have direct access to persistent memory or the full session state. All memory access is mediated by the orchestrator. This prevents a common failure mode where sub-agents hallucinate connections between unrelated prior events because irrelevant history was in their context.
Before dispatching a sub-agent, OpenClaw's orchestrator performs context injection: it queries the session state for information relevant to that specific subtask and formats it into a concise context block. This is not simply appending prior outputs — it involves semantic retrieval to find the most relevant prior results and active summarization to compress them to fit within the sub-agent's token budget.
Summarization in OpenClaw is itself performed by a dedicated compression sub-agent — a lightweight model optimized for lossless summarization of structured data. This model takes multi-page outputs from earlier pipeline stages and produces 200–400 token summaries that preserve all factual claims while discarding reasoning chains and redundant phrasing.
OpenClaw allocates token budgets per sub-agent at pipeline initialization based on task complexity estimates. A scraping agent might get 4,000 tokens; an analysis agent might get 16,000. The orchestrator tracks cumulative token spend across the pipeline and adjusts subsequent allocations dynamically if early stages run over budget.
Google DeepMind's Gemini 1.5 benchmarks (February 2024) demonstrated that even with a 1-million-token context window, models perform significantly better on needle-in-a-haystack retrieval tasks when relevant information is surfaced to the beginning of the context rather than buried in the middle. This finding reinforces the case for active context injection over passive context accumulation even when large windows are available.
Parallel sub-agents create a state consistency challenge: if two agents simultaneously produce results that update the same field in session state, which result wins? OpenClaw uses optimistic concurrency control adapted from database engineering: each sub-agent reads a versioned snapshot of session state at dispatch time and writes its results back with a version tag. The orchestrator's integration phase detects version conflicts and resolves them by applying a merge strategy appropriate to the data type — last-write-wins for scalar values, union for lists, and human escalation for contradictory factual claims.
3 questions — free, untracked, retake anytime.
Map the three memory tiers to a real pipeline and design the context injection strategy.
You'll design the memory architecture for the competitor monitoring pipeline. This means deciding what lives in each memory tier and how the orchestrator injects context into each sub-agent.
This lesson explores l4: failure & recovery — examining the key principles, real-world applications, and implications for practitioners working in this domain.
Understanding this topic requires both theoretical grounding and practical awareness of how these concepts manifest in deployed systems. The frameworks covered in earlier lessons provide the foundation; this lesson connects them to implementation reality.
The transition from theory to practice reveals challenges that pure conceptual frameworks don't capture. Real-world deployment introduces constraints, trade-offs, and edge cases that demand nuanced judgment rather than rigid rule-following.
Effective practitioners in this space develop the ability to reason across multiple frameworks simultaneously, recognizing when different perspectives apply and how to resolve conflicts between competing priorities.
As this field continues to evolve, the principles covered in this module will remain foundational even as specific technologies and implementations change. The ability to think critically about these topics — rather than simply memorizing current best practices — is what separates effective practitioners from those who merely follow checklists.
Use the AI below to explore the concepts from Lesson 4 in depth. Ask questions, challenge assumptions, and work through practical scenarios related to l4: failure & recovery.