In March 2023, Adept AI released a detailed technical writeup on their ACT-1 model, which operates a web browser as its primary perception surface. The agent receives a screenshot of the current browser state — pixel data from a 1280×800 viewport — and a serialized DOM tree stripped of visual styling. These two modalities, visual and structural, are fused before any reasoning step begins. When ACT-1 was tested against tasks like "book a flight on United.com," its perception layer had to reconcile pixel-level button positions with DOM node IDs that didn't always match what a human would see, because many modern sites render interactive elements via JavaScript after the initial DOM is parsed. The agent sometimes perceived a "Book" button as existing in the DOM before it was visually rendered, leading to premature action attempts. Adept's engineers logged this as a perception-action timing mismatch — a canonical failure mode that only appears when you study the loop carefully.
Perception in an AI agent is not passive observation. It is the active process of converting raw environmental signals — text, pixels, API responses, file contents, sensor readings — into a structured internal representation the reasoning module can work with. The quality of this representation determines the ceiling of every subsequent step.
Modern language-model-based agents typically receive perception through one or more channels: the system prompt (static context set at initialization), the conversation history (accumulated prior turns), tool call results injected into the context window, and, in multimodal systems, image or audio embeddings. Each of these channels carries different reliability characteristics. A tool result from a live API call reflects the present state of the world; a document embedded in the system prompt may be hours or weeks old.
The agent's context window is its perception organ. Everything the agent can know at decision time must fit within that window — or be explicitly retrieved and inserted before reasoning begins. Perception is not what exists in the world; it is what exists in the context.
This creates a hard engineering constraint: information that is relevant but not present in the context window is functionally invisible to the agent. Teams building production agents must design retrieval pipelines that anticipate what information the agent will need before it needs it — a form of predictive perception architecture.
Different perception channels fail in distinct ways. Understanding these failure modes at the advanced level means being able to diagnose agent errors by tracing them back to specific perception breakdowns rather than blaming reasoning or action steps.
In 2023, researchers at Riley GmbH demonstrated a prompt injection attack against early Bing Chat (Sydney) where a webpage the agent was asked to summarize contained hidden text instructing it to ignore its original user task. The agent complied. This is a perception-layer attack: the malicious instruction entered the loop through the tool-result perception channel, indistinguishable in structure from legitimate content.
Designing robust perception means treating every input channel as potentially adversarial, stale, or incomplete. Production agents at companies like Cognition (Devin), Adept, and Character.AI invest heavily in input sanitization, freshness timestamping, and context prioritization schemes precisely because perception errors compound: a single bad input early in a long agentic run can corrupt every reasoning and action step that follows.
You're going to dig into perception channel design with an AI that specializes in agent architecture. Push past surface-level answers — ask about tradeoffs, edge cases, and real failure diagnostics.
In May 2023, DeepMind published the results of their Gemini-based agent on the GAIA benchmark — a set of 466 real-world questions requiring multi-step web research. The agent used a chain-of-thought reasoning trace where each step produced an explicit "what do I know, what do I need, what should I do next" triplet before any tool call was issued. On questions requiring more than five reasoning steps, the agent's performance dropped sharply — not because its individual reasoning steps were wrong, but because reasoning errors compounded: a slightly incorrect intermediate conclusion in step 3 would redirect the search strategy in step 4, which would retrieve documents that reinforced the error in step 5. DeepMind's analysis labeled this "reasoning drift" — a phenomenon where the agent builds an increasingly confident but increasingly wrong world model as the loop progresses.
Reasoning in the agent loop is the step where the agent takes its current perceptual state and produces either an action to take or a conclusion to return. At the advanced level, it's important to understand that "reasoning" in current LLM-based agents is not a separate computational module — it is text generation constrained by a prompt structure. This means reasoning quality is highly sensitive to how the reasoning task is framed in the context.
The dominant framework in 2023–2025 production agents is ReAct (Reasoning + Acting), introduced by Yao et al. at Google Brain in 2022. ReAct interleaves reasoning traces with action calls in a single generation stream: the model generates a "Thought:" line explaining its reasoning, an "Action:" line specifying a tool call, and waits for an "Observation:" to be injected by the runtime before generating the next thought. This structure forces the model to externalize its reasoning, which has two effects: it makes errors more auditable, and it allows the runtime to catch and interrupt dangerous chains before they complete.
Reasoning in an agent loop is not isolated cognition — it is the process of selecting the next action given current perceptual state and memory. Every reasoning step produces either a tool call, a sub-goal decomposition, or a final answer. "Pure thinking" loops that produce neither are a common source of agent stalling.
OpenAI's o1 and o3 models, released in late 2024, introduced a new reasoning architecture where the model performs extended chain-of-thought in a hidden "scratchpad" before producing a visible response. In agent deployments, this means the reasoning step is more thorough but also more opaque — the agent may spend thousands of tokens reasoning before committing to an action, and that internal chain is not surfaced to the runtime for early intervention.
Advanced agent reasoning involves decomposing high-level goals into executable sub-tasks. This is non-trivial. The agent must decide the granularity of decomposition (too coarse and actions fail; too fine and the loop runs for thousands of steps), the ordering of sub-tasks (some are prerequisite to others), and when to replan (when observations don't match expectations).
Cognition AI's Devin software engineering agent, released in March 2024, uses a hierarchical task representation where a top-level goal is broken into a tree of sub-tasks. Each node in the tree has an explicit completion criterion. When Devin's evaluation team tested it on SWE-bench — 300 real GitHub issues — they found the agent's primary failure mode was sub-task completion misclassification: Devin would mark a sub-task complete based on partial evidence, then proceed to the next node with an incorrect baseline, causing cascading failures that were difficult to trace back to their origin.
The implication for systems builders is that reasoning quality cannot be evaluated by examining single steps in isolation. An agent that produces correct reasoning at every individual step can still produce incorrect outcomes if its error management and replanning logic is weak. This is why evaluation frameworks like SWE-bench, GAIA, and WebArena measure end-to-end task completion rather than step-level accuracy.
Probe the mechanics of in-loop reasoning with a specialist AI. Challenge it on replanning strategies, goal decomposition granularity, and how to design systems that resist reasoning drift.
In February 2024, Air Canada's customer service chatbot — built on a RAG-augmented LLM — took an action it was not authorized to take: it told a grieving customer that bereavement fares could be claimed retroactively after a flight, which was incorrect per Air Canada's actual policy. The chatbot had reasoned, based on its training, that this policy existed, and its action was to state it as fact to the customer. When the customer attempted to claim the fare and Air Canada refused, the customer sued. The British Columbia Civil Resolution Tribunal ruled against Air Canada in February 2024, holding the airline responsible for the chatbot's statement. This case established a legal precedent: an agent's verbal action — stating a policy — carries the same real-world weight as a human employee's statement. The chatbot's action was low-token, high-consequence, and irreversible in its effect on customer expectation and legal liability.
An agent's action space is the complete set of operations it can perform on the world. In LLM-based agents, actions are typically mediated by tools — function calls that the model can issue, which are then executed by a runtime and whose results are returned as observations. Understanding the action space at depth means understanding the risk profile of each action type.
The most dangerous actions are not necessarily the most technically complex. Irreversibility is the primary risk dimension. Deleting a database record and sending a customer an incorrect policy statement are both actions that cannot be cleanly undone — and both can cascade into major consequences.
Anthropic's guidance on building agents with Claude, published in their model documentation in 2024, introduces the concept of "minimal footprint" as a design principle: agents should request only necessary permissions, prefer reversible over irreversible actions, and confirm with users when uncertain about scope. This is not a technical constraint but an architectural philosophy — the agent is designed to be hesitant about action by default.
How tools are designed determines which actions are possible and which are not. This is the most direct way to constrain agent behavior: if a tool is not defined, the agent cannot invoke it, regardless of what it reasons. Production agent builders at companies like Stripe, Salesforce, and GitHub Copilot spend significant engineering effort designing tool interfaces that expose the minimum necessary action surface.
A key engineering decision is where to place guardrails in the action execution chain. There are three common placements: in the prompt (instructing the agent not to take certain actions), in the tool definition (restricting what parameters are valid), and in the execution layer (the runtime rejecting or queuing certain action types for human review). These layers are not redundant — each catches a different class of failure. Prompt-level guardrails fail when the model reasons its way around them. Tool-level restrictions fail if the tool interface is too permissive. Execution-layer guardrails are most robust but add latency and human-in-the-loop overhead.
In Stripe's 2024 developer documentation for their AI agent integration guides, they recommend that any tool capable of initiating a financial transaction require an explicit two-step confirmation: the agent generates a transaction summary, which is displayed to the human operator for approval before the actual payment API call is made. This mirrors the pattern used in industrial control systems where humans must authorize high-consequence machine actions — a principle being imported into AI agent design.
The confirmation problem is a fundamental tension in agent design: adding confirmation steps improves safety but degrades the autonomy that makes agents valuable. Teams at OpenAI, Anthropic, and Google DeepMind are actively researching adaptive confirmation strategies — systems that require confirmation for high-risk actions and trust autonomy for low-risk ones, calibrating thresholds based on observed error rates. This remains an open engineering problem as of 2025.
Work through the hardest design problems in agent action systems. What tools should an agent have? Where should guardrails live? How do you balance safety with usefulness?
This lesson explores lesson 4: observation — examining the key principles, real-world applications, and implications for practitioners working in this domain.
Understanding this topic requires both theoretical grounding and practical awareness of how these concepts manifest in deployed systems. The frameworks covered in earlier lessons provide the foundation; this lesson connects them to implementation reality.
The transition from theory to practice reveals challenges that pure conceptual frameworks don't capture. Real-world deployment introduces constraints, trade-offs, and edge cases that demand nuanced judgment rather than rigid rule-following.
Effective practitioners in this space develop the ability to reason across multiple frameworks simultaneously, recognizing when different perspectives apply and how to resolve conflicts between competing priorities.
As this field continues to evolve, the principles covered in this module will remain foundational even as specific technologies and implementations change. The ability to think critically about these topics — rather than simply memorizing current best practices — is what separates effective practitioners from those who merely follow checklists.
Use the AI below to explore the concepts from Lesson 4 in depth. Ask questions, challenge assumptions, and work through practical scenarios related to lesson 4: observation.