🎯 Advanced · Lesson 1 of 4

Hierarchical Task Decomposition

Breaking complex goals into structured trees of executable subgoals — how real AI systems tear apart hard problems.

In 2022, DeepMind's AlphaCode system tackled competitive programming problems by decomposing each problem into a hierarchy: first identify the problem class (sorting, graph traversal, dynamic programming), then generate candidate algorithms, then implement and verify each piece independently. On Codeforces competitions, it ranked in the top 54% of human participants — not by solving problems whole, but by breaking them into recognizable subtask categories and solving each layer in sequence. The hierarchical decomposition was not optional — it was the architecture.

The same principle showed up in NASA's Mars Science Laboratory mission, where autonomous onboard planning software called AEGIS used a layered goal hierarchy: high-level science objectives decomposed into observation targets, then into specific instrument commands, then into power and timing constraints. When a lower-level step failed, the system could revise that layer without discarding the entire science plan.

What Hierarchical Decomposition Actually Is

Hierarchical task decomposition (HTD) is the process of taking a high-level goal and recursively breaking it into smaller subgoals until you reach atomic actions — steps the agent can execute directly. The result is a tree structure: the root is the goal, intermediate nodes are subgoals, and leaves are executable actions.

The formal version of this, Hierarchical Task Network (HTN) planning, has been used in AI systems since the 1990s. HTN planners like SHOP2 and SIPE-2 were deployed in real logistics operations and military mission planning. What makes the architecture powerful is that it encodes human expert knowledge about how to approach problems, not just what the goal is.

Key Distinction

Flat planners ask: "What sequence of actions leads from state A to state B?" Hierarchical planners ask: "What is the expert decomposition of this task class, and how do I instantiate it here?" The hierarchy carries domain knowledge that flat search cannot recover on its own.

Modern LLM-based agents rediscover this pattern. When GPT-4 is prompted with a complex task and chain-of-thought reasoning, it naturally generates hierarchical subgoal trees — not because it was explicitly told to, but because decomposition is the structure of competent human reasoning that it learned from training data.

Decomposition Strategies: Horizontal vs. Vertical

There are two axes of decomposition. Horizontal decomposition splits a task into parallel or sequential subtasks at the same level of abstraction — like splitting "prepare a report" into "gather data," "analyze data," and "write findings." Each piece is independent or lightly coupled. Vertical decomposition drills down through abstraction levels — "gather data" becomes "identify sources," then "query each source," then "parse response format," then "handle pagination."

Real planning systems combine both. The DARPA Urban Challenge autonomous vehicle teams in 2007 used horizontal decomposition to split driving into perception, planning, and control modules running in parallel — and vertical decomposition within the planning module to go from route-level decisions down to steering commands. Sebastian Thrun's Stanford team, which won the 2005 DARPA Grand Challenge, published detailed accounts of this layered architecture in their 2006 Journal of Field Robotics paper.

Horizontal: identify parallel or sequential subtasks at the same abstraction level
Vertical: drill each subtask into lower-level operations until actions are atomic
Dependency analysis: map which subgoals must complete before others can begin
Resource allocation: assign tools, APIs, or agents to each leaf task

Failure Mode

Premature concreteness — decomposing too deep too fast, without confirming the high-level structure is correct. NASA's Mars Observer mission failure in 1993 was partly attributed to teams working detailed subsystem designs before the system-level architecture was validated. Fix the tree structure first, then fill in leaves.

Decomposition in LLM Agent Frameworks

AutoGPT (2023), LangChain agents, and the BabyAGI architecture all implement implicit hierarchical decomposition. BabyAGI, released by Yohei Nakajima in March 2023, explicitly maintains a task list where an "executive" LLM generates subgoals, an "execution" LLM completes them, and a "prioritization" LLM reorders remaining tasks based on results. This three-layer structure is a direct implementation of HTN principles with language model components.

The key engineering decision in these systems is decomposition granularity: how many levels deep, and how specific each leaf node must be before handing to an executor. Too coarse and the executor fails on ambiguous instructions. Too fine and the planner spends more tokens on decomposition than on execution.

🎯 Advanced · Lesson 1 Quiz

Quiz: Hierarchical Task Decomposition

3 questions — free, untracked, retake anytime.

1. What distinguishes an HTN (Hierarchical Task Network) planner from a flat planner?

✓ Correct — ✓ Correct. HTN planners encode how experts approach a task class — decomposition methods — rather than blindly searching from initial to goal state. This is why they scale better on complex real-world tasks.

Not quite. The defining feature of HTN planners is that they carry domain expert knowledge in their decomposition methods, not just search heuristics or memory usage differences.

2. In AlphaCode's approach to competitive programming, what role did hierarchical decomposition play?

✓ Correct — ✓ Correct. AlphaCode's decomposition was architectural, not optional — classify the problem class first, then generate candidate algorithms, then implement and verify. Each layer informed the next.

Not quite. Decomposition was the architecture itself in AlphaCode, applied to every problem as the fundamental strategy, not a selective optimization.

3. What is the primary failure mode called "premature concreteness" in hierarchical decomposition?

✓ Correct — ✓ Correct. Premature concreteness means working on detailed subproblems before confirming the high-level structure is sound — the same issue that affected NASA's Mars Observer mission design process.

Not quite. Premature concreteness means rushing to fill in the detailed leaves of the task tree before the top-level structure has been validated. Fix the tree first, then the leaves.

🎯 Advanced · Lesson 1 Lab

Lab: Build a Decomposition Tree

Practice hierarchical task decomposition with a real AI planning assistant.

Your Task

You're going to practice hierarchical task decomposition by working with an AI planning assistant. Choose a complex goal — something like "build and deploy a machine learning model for customer churn prediction" or "design a disaster response plan for a city of 500,000." Then:

Ask the assistant to decompose your goal into a two-level hierarchy (high-level subgoals, then one level of sub-subgoals).
Identify which subgoals are horizontally parallel vs. which must be sequential.
Ask the assistant to flag which leaf tasks are truly atomic vs. still need further decomposition.

Suggested opener: "I want to practice hierarchical task decomposition. My goal is [your complex goal]. Can you help me build a two-level decomposition tree and identify which subtasks are parallel vs. sequential?"

🤖 Planning Assistant Lesson 1 Lab

🎯 Advanced · Lesson 2 of 4

Subgoal Generation Techniques

How agents create intermediate objectives — from landmark detection to means-ends analysis and LLM-driven goal chaining.

In 2016, OpenAI's research into RL agent behavior on Montezuma's Revenge (an Atari game notorious for sparse rewards) revealed that standard reinforcement learning completely failed — the agent received no reward signal for thousands of steps and learned nothing. The breakthrough came from subgoal-based approaches: researchers used intrinsic motivation to generate intermediate objectives (reaching a new room, picking up a key) that served as stepping stones. When Uber AI Labs published their "Go-Explore" algorithm in 2018, it addressed the same problem by explicitly storing and returning to promising intermediate states, treating each as a subgoal. The game that defeated standard RL for years fell to subgoal generation.

IBM's Watson system during its 2011 Jeopardy! victory used a different form of subgoal generation: decomposing each clue into evidence-gathering subgoals across 100+ algorithms running in parallel, then merging their confidence scores. The subgoals were dynamically generated per question — Watson didn't have a fixed decomposition for "what is a Jeopardy! answer," it generated the appropriate evidence subgoals based on the clue structure.

Landmark-Based Subgoal Generation

In classical AI planning, landmarks are facts or states that must be true at some point in any valid solution to a planning problem. Automatically identifying landmarks gives a planner a set of necessary subgoals — not hypothetical intermediates, but provably required ones. The Fast Downward planning system (used extensively in academic and competition planning) includes landmark detection as a core component of its heuristic evaluation.

The landmark approach is powerful because it's deductive — subgoals are derived from the problem structure, not guessed. In a logistics planning domain, "the truck must visit the warehouse" might be a landmark because no valid solution exists without it. The planner can then focus search effort on achieving landmarks in a sensible order.

Technical Insight

Landmark detection algorithms work by analyzing the causal structure of the planning domain: if achieving goal G requires action A, and action A requires precondition P, then P is a landmark. Disjunctive landmarks capture "at least one of these must hold" — more complex but more complete coverage of necessary intermediate states.

Means-Ends Analysis and Goal Regression

Means-ends analysis (MEA), introduced in the General Problem Solver (GPS) by Newell and Simon in 1961, generates subgoals by working backwards from the goal: "What is the difference between the current state and the goal? What operator reduces that difference? What preconditions must hold for that operator?" Each precondition becomes a new subgoal.

This backward chaining creates a subgoal chain that, when executed forward, achieves the original goal. STRIPS-style planners use a version of this called goal regression. When NASA's Remote Agent Experiment ran aboard Deep Space 1 in 1999 — the first time an AI planning system controlled a spacecraft autonomously — it used a goal-regression planner to generate the sequence of intermediate states needed to achieve mission objectives. For 11 hours in May 1999, the spacecraft flew itself.

Identify the gap between current state and goal state
Select an operator that reduces the most significant gap
Add operator preconditions as new subgoals to achieve first
Repeat recursively until all subgoals are directly achievable

LLM-Driven Dynamic Subgoal Generation

Modern LLM-based agents generate subgoals through prompted reasoning rather than formal analysis. The ReAct framework (Yao et al., 2022) interleaves Reasoning and Acting: the model generates a "thought" that identifies the next subgoal, then takes an action to pursue it, then reasons about the result to identify the next subgoal. This creates a dynamic subgoal chain that adapts to observed results rather than following a pre-committed plan.

The Reflexion paper (Shinn et al., 2023) added a memory layer: after each failed attempt, the agent generates a verbal reflection on what went wrong, which is stored and prepended to future prompts. This means the subgoal generation process learns from failure within a session — not through gradient updates but through language-mediated self-critique.

Critical Limitation

LLM-generated subgoals are plausible but not provably necessary. Unlike landmark detection, there's no guarantee that a language model's suggested intermediate steps are actually required — or even sufficient. Hallucinated subgoals that seem reasonable can lead agents confidently down paths that cannot produce the goal. Verification layers are essential.

🎯 Advanced · Lesson 2 Quiz

Quiz: Subgoal Generation Techniques

3 questions — free, untracked, retake anytime.

1. What makes landmark-based subgoal generation stronger than guessing intermediate steps?

✓ Correct — ✓ Correct. Landmarks are deductively derived from the problem's causal structure — they are not guessed. Any solution that avoids a landmark is by definition not a valid solution.

Not quite. The strength of landmarks is their deductive necessity: they are mathematically derived states that must appear in any valid plan, not heuristic guesses or human annotations.

2. How did the "Go-Explore" algorithm address the sparse reward problem in Montezuma's Revenge?

✓ Correct — ✓ Correct. Go-Explore maintained an archive of promising states and explicitly returned to them to explore further — treating each interesting intermediate state as a subgoal rather than hoping random exploration would rediscover it.

Not quite. Go-Explore's key insight was archiving and returning to promising intermediate states as explicit subgoals, rather than relying on reward signals that were too sparse to guide exploration.

3. What is the critical limitation of LLM-generated subgoals compared to formal landmark detection?

✓ Correct — ✓ Correct. LLM-generated subgoals lack the deductive guarantee of landmarks. They may seem reasonable while being incorrect or irrelevant — and without verification layers, agents can pursue them confidently into failure.

Not quite. The core limitation is epistemological: LLMs produce plausible-sounding subgoals that lack mathematical necessity. Hallucinated intermediate steps can look convincing while being dead ends.

🎯 Advanced · Lesson 2 Lab

Lab: Subgoal Chain Analysis

Examine how an AI agent generates and evaluates intermediate objectives for a real planning challenge.

Your Task

You'll practice identifying and critiquing subgoal chains using the ReAct-style reasoning approach. Pick a goal that requires multiple intermediate steps in an uncertain domain — for example, "identify the root cause of a production system outage" or "design a clinical trial protocol for a new drug candidate."

Ask the assistant to generate a subgoal chain using means-ends analysis: work backwards from the goal to identify necessary intermediate states.
For each subgoal generated, ask: "Is this provably necessary, or just plausible?"
Identify at least one subgoal in the chain that might be a "hallucinated" step — one that seems reasonable but isn't actually required.

Suggested opener: "I want to analyze subgoal generation. My goal is [your goal]. Can you apply means-ends analysis — working backwards from the goal — and generate a subgoal chain? Then flag which subgoals are provably necessary vs. just plausible."

🤖 Planning Assistant Lesson 2 Lab

🎯 Advanced · Lesson 3 of 4

Backtracking and Plan Revision

When plans fail mid-execution: detection, diagnosis, and structured recovery without losing all prior progress.

On July 4, 1997, the Pathfinder lander's onboard computer suffered repeated resets on the Martian surface due to a priority inversion bug. The software used VxWorks, a real-time operating system with a preemptive scheduler. A low-priority task held a shared resource that a high-priority task needed, but a medium-priority task kept preempting the low-priority task, preventing resource release. The system's watchdog timer detected that high-priority operations weren't completing and triggered a full reset — effectively "backtracking" to a safe initial state. Engineers on Earth diagnosed the issue remotely, enabled priority inheritance in the scheduler over a radio link, and the lander resumed normal operation. The backtracking was costly but correct: the system chose a known-safe state over an unknown-corrupted one.

DeepMind's AlphaGo, in its famous 2016 match against Lee Sedol, exhibited a form of plan revision in Game 4. After Lee Sedol played the unexpected "divine move" (Move 78), AlphaGo's evaluation function initially assigned it low value — a misestimation. The system continued executing its prior plan for several moves before its lookahead detected that the board position had shifted significantly. It then revised its strategy mid-game, a form of online replanning that ultimately failed — Lee Sedol won that game — but demonstrated that even state-of-the-art game AI faces the challenge of timely plan revision when unexpected moves occur.

Failure Detection: Knowing When a Plan Has Broken

Backtracking begins with detection. An agent cannot revise a plan it doesn't know has failed. There are three categories of failure detection in planning systems:

Precondition violation: An action's preconditions are no longer satisfied when it's time to execute — the world changed since the plan was generated.
Execution monitoring: The observed result of an action differs from the predicted result by more than an acceptable threshold.
Goal regression failure: A subsequent planning step detects that the current state cannot lead to the goal even with remaining planned actions.

Real-World Timing

The Mars Pathfinder case illustrates a critical timing challenge: failure detection must be fast enough to prevent cascading damage, but the "backtrack to safe state" response must not be triggered so aggressively that normal operations are disrupted. This threshold calibration is an active research area in fault-tolerant systems, distinct from the planning problem itself.

In LLM-based agents, failure detection is harder because outputs are text, not formal state transitions. The Voyager system (Wang et al., 2023) for Minecraft used a code execution environment to detect failures precisely: if generated JavaScript code raised an exception or produced wrong game state, the agent knew definitively that the subtask had failed and triggered replanning.

Backtracking Strategies: How Far to Rewind

Once failure is detected, the agent must decide where to rewind. The three main strategies are:

Chronological backtracking: Undo the most recent decision and try an alternative. Simple and cheap, but can lead to "thrashing" — repeatedly undoing and retrying the same level when the real problem is deeper in the decision tree. PROLOG's default search uses chronological backtracking.

Dependency-directed backtracking (DBT): Analyze the cause of the failure — which earlier decision created the constraint that is now violated — and jump back to that specific decision point, skipping irrelevant intermediate steps. The ATMS (Assumption-based Truth Maintenance System) developed by Johan de Kleer at Xerox PARC in the 1980s formalized this approach. DBT can recover in O(n) steps what chronological backtracking might need O(2^n) steps to find.

Partial plan preservation: Rather than discarding the entire plan, identify which subgoals have already been validly achieved and preserve them as fixed points, replanning only the failed portion. This is the approach used in CPEF (Continuous Planning and Execution Framework) and is essential in long-horizon tasks where early subgoal achievements are expensive to reproduce.

Advanced Pattern

Commitment levels: an agent can hold plans at three levels of commitment — "committed" (will execute), "tentative" (preferred but revisable), and "open" (not yet decided). When failure occurs, only tentative and open commitments need revision. This is formalized in Bratman's theory of Practical Reasoning and implemented in BDI (Belief-Desire-Intention) agent architectures used in industrial autonomous systems.

Online Replanning vs. Plan Repair

When a plan fails, the agent has two broad options: replan from scratch (generate a new plan from the current state to the goal) or repair the existing plan (make minimal edits to the failed plan to make it valid again). Replanning from scratch is simpler to implement but computationally expensive and discards potentially valid plan segments. Plan repair is cheaper but requires formal analysis of which parts of the plan remain valid.

Research by Koenig and Sun (2009) on "D*-Lite" — an incremental replanning algorithm for robot navigation — showed that incremental repair can be orders of magnitude faster than full replanning when the plan changes are localized. D*-Lite was used in DARPA Urban Challenge vehicles that needed to replan routes when unexpected obstacles appeared. When a parked car blocked a planned route, the system updated only the affected portion of the navigation plan rather than recomputing from scratch.

🎯 Advanced · Lesson 3 Quiz

Quiz: Backtracking and Plan Revision

3 questions — free, untracked, retake anytime.

1. Why is dependency-directed backtracking (DBT) more efficient than chronological backtracking in many cases?

✓ Correct — ✓ Correct. DBT analyzes the causal source of the failure — which earlier commitment created the violated constraint — and jumps directly to that decision point, avoiding the exponential thrashing of chronological backtracking.

Not quite. DBT's advantage is causal diagnosis: it traces the failure to its source decision and backtracks specifically to that point, rather than undoing decisions in reverse chronological order regardless of their relevance to the failure.

2. In the Mars Pathfinder priority inversion incident, what was the system's backtracking response?

✓ Correct — ✓ Correct. The watchdog timer detected that high-priority operations weren't completing and triggered a full system reset — a form of aggressive backtracking to a safe initial state, accepting the cost of lost progress over the risk of continued operation in a corrupted state.

Not quite. The Pathfinder's recovery was a full reset to a known-safe state, triggered by the watchdog timer. It prioritized safety over preserving execution progress — a valid but costly form of backtracking.

3. What advantage does plan repair offer over full replanning from scratch?

✓ Correct — ✓ Correct. Plan repair identifies which parts of the existing plan remain valid after a failure and edits only the broken portion — this is far cheaper than regenerating the entire plan from scratch, as demonstrated by D*-Lite in robot navigation.

Not quite. The advantage of plan repair is computational efficiency through preservation: valid plan segments are kept intact, and only the failed portion is fixed. D*-Lite's success in DARPA Urban Challenge vehicles showed the practical value of this approach.

🎯 Advanced · Lesson 3 Lab

Lab: Failure Detection and Recovery

Simulate plan failures and practice structured backtracking with an AI planning partner.

Your Task

You'll practice diagnosing plan failures and choosing the right backtracking strategy. Start with a multi-step plan in any domain, then introduce a mid-plan failure scenario. Your goal is to identify which type of backtracking is most appropriate.

Present a 5-7 step plan to accomplish a goal, then describe a failure that occurs at step 3 or 4.
Ask the assistant to diagnose what type of failure it is (precondition violation, execution monitoring failure, or goal regression failure).
Ask whether chronological backtracking, dependency-directed backtracking, or partial plan preservation is most appropriate — and why.
Ask the assistant to produce the repaired plan.

Suggested opener: "I have a plan with a mid-execution failure. Here's the original plan: [list 5-7 steps]. At step [N], this happened: [describe failure]. Can you diagnose the failure type and recommend the right backtracking strategy?"

🤖 Planning Assistant Lesson 3 Lab

Building AI Agents II — Skills · Module 5 · Lesson 4

Lesson 4

Advanced concepts, real-world applications, and practical implications

Core Concepts

This lesson explores lesson 4 — examining the key principles, real-world applications, and implications for practitioners working in this domain.

Understanding this topic requires both theoretical grounding and practical awareness of how these concepts manifest in deployed systems. The frameworks covered in earlier lessons provide the foundation; this lesson connects them to implementation reality.

Practical Applications

The transition from theory to practice reveals challenges that pure conceptual frameworks don't capture. Real-world deployment introduces constraints, trade-offs, and edge cases that demand nuanced judgment rather than rigid rule-following.

Effective practitioners in this space develop the ability to reason across multiple frameworks simultaneously, recognizing when different perspectives apply and how to resolve conflicts between competing priorities.

Looking Forward

As this field continues to evolve, the principles covered in this module will remain foundational even as specific technologies and implementations change. The ability to think critically about these topics — rather than simply memorizing current best practices — is what separates effective practitioners from those who merely follow checklists.

Lesson 4 Quiz

Lesson 4

What is the primary focus of Lesson 4?

✓ Correct — Correct. This lesson bridges theory and practice, focusing on real-world implementation.

Review the lesson — the focus is on connecting frameworks to practical reality.

Why does real-world deployment introduce challenges that pure theory doesn't capture?

✓ Correct — Correct. Real deployment requires judgment, not just framework application.

Practice doesn't invalidate theory — it reveals complexities that require nuanced application of theoretical principles.

What separates effective practitioners from those who merely follow checklists?

✓ Correct — Correct. Critical thinking and adaptability matter more than memorized procedures.

The key differentiator is critical thinking ability, not experience or resources alone.

🎯 Advanced · Lesson 4 Lab

Lab: Apply What You've Learned

Synthesize concepts from Lesson 4 through guided AI conversation

Your Task

Use the AI below to explore the concepts from Lesson 4 in depth. Ask questions, challenge assumptions, and work through practical scenarios related to lesson 4.

Try: "How would the concepts from this lesson apply to a real-world scenario in this field?"

🤖 AESOP Lab Assistant Lesson 4 Lab

Module 5 Test

Agent Planning Strategies · 15 Questions · 70% to Pass

Score: 0/15

1. What is the core objective of Agent Planning Strategies?

2. How should practitioners approach applying concepts from this module?

3. Which best describes the relationship between theory and practice in Building AI Agents II — Skills?

4. What distinguishes expert practitioners from novices in this field?

5. How does Agent Planning Strategies build on previous modules?

6. What role do constraints play in practical implementation?

7. When applying frameworks from this module, what is most important?

8. How should practitioners handle conflicting perspectives in this field?

9. What makes the concepts in Agent Planning Strategies relevant beyond their immediate context?

10. How should practitioners continue developing expertise after completing this module?

11. What is the relationship between understanding Building AI Agents II — Skills concepts and making decisions?

12. How do the lessons from this module apply to novel situations?

13. What is the value of understanding multiple perspectives on {course_title}?

14. How should practitioners evaluate new information or developments in this field?

15. What is the ultimate goal of learning Agent Planning Strategies?