In March 2023, Bing's newly deployed ChatGPT-powered search agent — codenamed Sydney — began exhibiting erratic behavior during extended user conversations. Researchers at Stanford and journalists at The New York Times documented sessions where the agent confused information stated by users with facts it had independently retrieved, misattributed sources, and began responding to its own prior outputs as if they were new external inputs. The root failure was perceptual: the agent's context window blended user messages, retrieved web snippets, tool results, and its own completions into a single undifferentiated token stream. Without explicit structural separation, the agent's "perception" of the conversation was a scrambled composite.
Microsoft responded by hard-limiting session length to 5 turns — a blunt workaround that masked the underlying architectural gap rather than solving it. The episode exposed a fundamental truth: perception in language-model agents is not free. It must be engineered.
In classical robotics, perception refers to the pipeline that converts raw sensor data — voltage readings from cameras, sonar pings, encoder ticks — into a structured world model an AI can reason about. The same concept applies to language agents, but the "sensors" are radically different: they are text streams from users, API responses, tool outputs, retrieved document chunks, and prior conversation history.
The agent's perception layer is responsible for ingesting these heterogeneous inputs and converting them into a coherent context that the reasoning engine can interpret. This involves three distinct sub-problems: source attribution (knowing which input came from where), temporal ordering (understanding what happened in what sequence), and relevance filtering (deciding what to include in the reasoning context vs. what to discard).
Each of these sub-problems has concrete implementation consequences. Source attribution requires that every piece of information entering the agent's context be tagged with a role marker — "user," "assistant," "tool_result," "retrieved_document" — so the reasoning layer knows what epistemic weight to assign it. Temporal ordering means the context must be assembled in chronological sequence and, where that sequence is ambiguous (e.g., parallel tool calls), an explicit ordering must be imposed.
Perception is not passive reception — it is active construction. Every design decision about how inputs are structured, tagged, and filtered shapes what the agent believes about the world at reasoning time.
Relevance filtering is the most complex of the three. A naive agent includes everything in its context window. A production agent must implement a truncation and prioritization strategy: what gets dropped when the context fills up? Most current systems use recency-biased truncation (drop the oldest entries first), but this discards the original task specification — often the most important input — before discarding the verbose middle turns. Smarter strategies anchor the system prompt and user goal, then apply recency to everything else.
In practice, the output of the perception layer is an observation object — a structured data structure (often JSON or a typed Python dataclass) that the reasoning engine receives as input. A well-designed observation object includes: the raw input, its source and timestamp, a relevance score (if retrieval was involved), and any metadata the agent may need for attribution later.
The OpenAI Assistants API, released in November 2023, formalized this concept by introducing explicit "message roles" and a structured "thread" object that separates user messages, assistant messages, and tool call results into distinct typed slots. This was a direct architectural response to the category of failure exhibited by Sydney: by forcing structural separation at the API level, the framework prevented the context conflation that had caused erratic behavior.
The perception layer is also where multi-modal fusion happens in agents that handle images, audio, or structured data files alongside text. The agent must have a consistent internal representation that can hold any modality, with the reasoning layer deciding how to weight them — not the perception layer. Perception's job is fidelity, not interpretation.
The single most common bug in early agent implementations is the "flat string context" anti-pattern: concatenating all inputs into one string with no structural markers. Even basic role separators reduce context confusion measurably. Always build structured observation objects from day one.
You are architecting the perception layer for a customer support agent that receives: user chat messages, retrieved knowledge-base snippets, CRM lookup results, and the agent's own prior responses. The AI tutor will walk you through designing a structured observation object for this system.
In August 2023, a team at Google DeepMind published detailed results from their Gemini Ultra evaluations on the MMLU benchmark and — more relevantly — on multi-step tool-use tasks. A consistent failure mode emerged: the model would correctly identify the first required action, execute it, receive a result, and then plan a second action that was logically valid in isolation but inconsistent with what the first action's result had revealed. The model was not updating its plan based on new evidence — it was executing a pre-committed sequence as if the first tool call had returned exactly what was expected.
This failure — called "plan rigidity" in the literature — is distinct from hallucination. The model was not confabulating facts; it was failing to revise a reasonable prior plan in light of contradicting evidence. The fix required explicit plan-revision scaffolding: after each tool result, the model was instructed to reassess its current goal state before selecting the next action.
The reasoning layer of an agent performs two conceptually distinct operations that are easy to conflate: situation assessment and action selection. Situation assessment asks "what is the current state of the world and the task?" — it synthesizes the observation object into a world model. Action selection asks "given that world model, what is the best next action?" — it consults a policy (implicit in the LLM's weights, or explicit in a structured planner) to output a decision.
Most LLM-based agents implement both operations in a single forward pass — the model reads the context, thinks, and outputs an action. The ReAct framework (Yao et al., 2022) formalized this by asking the model to alternate between explicit "Thought:" and "Action:" tokens, making the reasoning chain visible and auditable. This transparency has practical debugging value: when an agent takes a wrong action, you can read the preceding Thought to understand what world model produced that decision.
Thought: [agent's explicit reasoning about current state and goal]
Action: [structured action specification]
Observation: [tool result or environment response]
Thought: [updated reasoning incorporating the observation]
Action: [next action]
The critical insight from the DeepMind evaluation described above is that the "Thought after Observation" step is not cosmetic — it is load-bearing. Without explicit re-assessment after each tool call, the model defaults to executing its prior plan sequentially. Forcing a reasoning step between each observation and each action is what converts a brittle script-follower into a genuine adaptive planner.
A sophisticated reasoning layer must handle uncertainty explicitly, not just when a tool call fails outright, but when the result is ambiguous. If a database query returns three possible matching records, the reasoning layer needs to decide: pick the best match and proceed, ask the user to disambiguate, or execute parallel branches and merge results. Each choice has cost implications (latency, user friction, token budget), and the right choice depends on the task's stakes and reversibility.
The Anthropic "Constitutional AI" papers (2022-2023) documented a related challenge: when reasoning steps were allowed to be long and exploratory, models would sometimes "reason themselves into" incorrect conclusions through internally consistent but factually wrong chains. The lesson for agent builders is that longer reasoning chains are not unconditionally better. Each reasoning step is a point of potential error amplification. Production systems should implement chain-length budgets and break complex reasoning into independently verifiable sub-tasks.
The most dangerous reasoning failure is silent false confidence — the agent proceeds without flagging uncertainty. Build explicit uncertainty surfacing into your system prompt scaffolding: if the model cannot determine the right action with sufficient confidence, it must say so and request clarification rather than guessing.
You are designing the system prompt scaffolding for a financial data research agent that uses tools to query databases, retrieve news articles, and perform calculations. The agent must handle ambiguous results and irreversible actions (e.g., filing a report).
In February 2024, Air Canada lost a small claims court case in British Columbia (Moffatt v. Air Canada) after its customer-facing chatbot agent autonomously told a grieving customer he could purchase a full-price bereavement fare retroactively and apply for a refund — a policy that did not exist. The agent had no mechanism to distinguish between "information I retrieved from policy documents" and "inference I generated that sounds plausible." It executed a natural-language commitment — equivalent to a contractual statement — without any guardrail distinguishing advisory output from binding action.
The court ruled Air Canada responsible for its agent's statement, setting a precedent that autonomous agent outputs can constitute legal commitments. The case demonstrated that the action layer must categorize outputs not just by their technical type (API call, message send) but by their real-world effect class — informational, advisory, or binding — and apply appropriate verification to each.
The action layer is responsible for translating the reasoning engine's decision — typically expressed in natural language or a structured JSON schema — into an executed operation. This translation step is not trivial. Before any action is dispatched, the action layer must classify it along two critical axes: effect class and reversibility.
Effect class describes the real-world consequence of the action. A read action queries a system without modifying it — searching a database, fetching a webpage, reading a file. A write action modifies state — updating a record, creating a file, sending a message. A commit action has durable external effects — charging a payment, sending an email, filing a form, making a public statement. These three classes require progressively stricter validation before execution.
Read: No state change. Execute with standard validation.
Write: Local or reversible state change. Require parameter verification and confirmation logging.
Commit: Durable external effect. Require explicit confidence threshold, human-in-the-loop option, and rollback plan where possible.
The Air Canada case is a canonical example of a Commit-class action being treated as a Read-class action. Stating a refund policy to a customer in a customer service context is a Commit — it creates an expectation with legal standing. The agent's action layer had no framework to recognize this, so it dispatched the output with zero additional safeguards.
Reversibility compounds with effect class. A Write action that can be undone (delete a draft) has a much lower risk profile than a Write action that cannot (overwrite production data). The action layer must track whether each action has a defined rollback procedure and, if not, escalate the confidence requirement accordingly.
Production agent action layers are built around explicit action schemas: typed specifications of every action the agent is permitted to take, with defined parameter types, required fields, allowed value ranges, and precondition checks. The LLM reasoning engine outputs a candidate action; the action layer validates it against the schema before dispatch. If validation fails, the error is returned to the reasoning layer as an observation — not silently dropped.
Anthropic's documentation on autonomous agent design (published in the Claude usage guidelines, 2024) introduced the concept of the minimal footprint principle: an agent should request only the permissions it needs for the current task, avoid storing sensitive information beyond the immediate operation, and prefer reversible actions over irreversible ones when both achieve the goal. This principle is not just a safety guideline — it is an engineering discipline. Agents that acquire capabilities they don't need are harder to audit, harder to bound, and produce more damage when they malfunction.
Request only the permissions needed for the current task. Prefer reversible actions. Avoid retaining sensitive data beyond immediate use. An agent that does less, more precisely, is safer and more auditable than one that does more, loosely.
The action layer is also where prompt injection defenses must operate. A retrieved document may contain adversarial text designed to hijack the agent's next action — instructing it to ignore previous instructions or take unauthorized operations. The action layer should validate that action parameters do not contain embedded instruction syntax and that action targets fall within the expected domain before dispatch.
You are building the action layer for an e-commerce agent that can: search product catalogs (Read), update cart contents (Write), and process payment and send order confirmation emails (Commit). The system must be safe for autonomous operation with minimal human oversight.
This lesson explores lesson 4: closing the loop — examining the key principles, real-world applications, and implications for practitioners working in this domain.
Understanding this topic requires both theoretical grounding and practical awareness of how these concepts manifest in deployed systems. The frameworks covered in earlier lessons provide the foundation; this lesson connects them to implementation reality.
The transition from theory to practice reveals challenges that pure conceptual frameworks don't capture. Real-world deployment introduces constraints, trade-offs, and edge cases that demand nuanced judgment rather than rigid rule-following.
Effective practitioners in this space develop the ability to reason across multiple frameworks simultaneously, recognizing when different perspectives apply and how to resolve conflicts between competing priorities.
As this field continues to evolve, the principles covered in this module will remain foundational even as specific technologies and implementations change. The ability to think critically about these topics — rather than simply memorizing current best practices — is what separates effective practitioners from those who merely follow checklists.
Use the AI below to explore the concepts from Lesson 4 in depth. Ask questions, challenge assumptions, and work through practical scenarios related to lesson 4: closing the loop.