Building Production Agents with Vertex AI · Introduction

The Software That Acts Is Not the Software You Know

Why orchestrated AI agents on managed cloud infrastructure represent a genuine architectural break — not an incremental upgrade.

When the Bell System deployed the first fully electronic telephone switching systems in the 1960s — the No. 1 ESS, installed in Succasunna, New Jersey in May 1965 — the engineers involved understood they were not building a faster version of the existing electromechanical exchanges. The stored-program control at the center of the new architecture meant the switch could be updated, reconfigured, and extended through software rather than physical rewiring. Within a decade, that distinction had restructured the entire telecommunications industry: maintenance labor dropped sharply, new services could be activated remotely overnight, and the competitive advantage shifted permanently toward whoever could deploy software updates fastest. The hardware was the platform; the software was the product.

The same structural break is now visible in enterprise software. From 2023 onward, Google, Amazon, and Microsoft each released managed platforms — Vertex AI Agent Builder, Amazon Bedrock Agents, Azure AI Agent Service — designed to host AI systems that do not merely respond to queries but plan multi-step tasks, call external APIs, maintain state across sessions, and hand work to other agents. The unit of deployment is no longer a model or an endpoint; it is an agent with a goal, tools, and an orchestrator capable of retrying, branching, and recovering from failure. The architectural consequences are not yet fully understood, but the direction is clear.

This course teaches you to build on Google's Vertex AI Agent Builder: how its components connect, how to configure agents that call real tools reliably, how to evaluate and monitor them in production, and where the current platform's limits sit. By the end you will have constructed working agents against the live Vertex API and understood enough of the underlying architecture to reason confidently about what breaks and why. The course will not pretend the technology is more settled than it is — agent reliability in production is still an open engineering problem — but it will give you the foundation to engage with that problem seriously.

If you finish every module, here's who you become:

You'll understand how Vertex AI Agent Builder connects Model Garden, Agent Designer, sessions, and the A2A protocol into a single deployable architecture.
You will configure GCP projects, authenticate via service accounts, and make real programmatic calls to the Gemini API before the second module ends.
You'll write system prompts and define Skills that reliably constrain what an agent does — and equally important, what it refuses to do.
You will connect agents to external APIs using function calling, handle tool errors gracefully, and reason clearly about where those pipelines break.
You'll design multi-agent pipelines using the Agent2Agent protocol, coordinating specialized agents through an orchestrator you built and understand.
You will deploy agents to production, track them through Agent Inbox, manage costs, and run evaluation cycles that actually improve reliability over time.
You'll become an engineer who treats agent reliability as an open problem — not a marketing claim — and has the architectural foundation to work on it seriously.

Building Production Agents with Vertex AI · Module 1 · Lesson 1

What Vertex AI Agent Builder Actually Is

Mapping the platform's real components before writing a single line of code.

What does Google mean by "agent" in Vertex AI — and how does the platform's architecture differ from simply calling a model API?

In August 2023, Google internally demonstrated a procurement agent built on what would become Vertex AI Agent Builder. The system received a natural-language purchase request, queried a live inventory database via function calling, cross-referenced vendor lead times from an external REST API, drafted a purchase order, routed it to a human approver over email, and — if the approver didn't respond within four hours — escalated to a manager and retried. No part of this flow was hard-coded in application logic. The orchestrator, running on Vertex, decided the sequence. When the inventory API returned an error on the third step, the agent retried with exponential backoff and logged the incident to Cloud Logging automatically. The demonstration was notable not because the individual capabilities were new — function calling had existed in the OpenAI API since June 2023 — but because the infrastructure for running, monitoring, and recovering a multi-step agent in a production environment was provided by the platform itself.

The Four-Layer Architecture

Vertex AI Agent Builder is organized around four distinct layers, each of which can be addressed independently through the API or the console. Understanding the boundaries between these layers is prerequisite to any serious work on the platform.

The Foundation Model Layer sits at the base. This is Gemini — specifically the Gemini 1.5 Pro and Gemini 1.5 Flash models as of mid-2024 — exposed via the generateContent endpoint in Vertex AI. Agents do not call this endpoint directly in most configurations; they call it through the orchestration layer. But model selection, context window limits, and grounding configuration all live here.

Above it sits the Orchestration Layer, implemented through Vertex AI Reasoning Engine (formerly Vertex AI Generative AI on Vertex — the naming has changed twice since launch). This is the component that runs your agent loop: it holds the conversation state, dispatches tool calls, receives tool responses, feeds them back into the model context, and decides when the agent's goal is satisfied. You can bring your own orchestration framework — LangChain, LlamaIndex, or a custom loop — and deploy it to Reasoning Engine as a managed runtime.

The Tool Layer is where agents connect to the external world. Vertex AI supports three tool types natively: Function Declarations (structured JSON schemas describing callable functions), Code Interpreter (a sandboxed Python execution environment), and Vertex AI Search (retrieval-augmented generation over your own document corpus). Each tool type has different latency, cost, and failure mode characteristics that matter in production design.

The outermost layer is the Agent Builder Console and API — the management plane for creating, versioning, testing, and deploying agents. This is also where you configure Playbooks (the goal-oriented instruction sets that shape agent behavior) and Data Stores (indexed corpora for grounded retrieval).

Architecture Note

Reasoning Engine is a fully managed container runtime on Cloud Run under the hood. You do not manage the underlying infrastructure, but you do pay per invocation and per second of compute. Understanding this shapes how you think about agent latency budgets and cost modeling.

Key Terminology

AgentIn Vertex AI, a configured instance of an LLM plus a Playbook (goal instructions), zero or more tools, and optionally a Datastore. An agent is not a model — it is a model plus a control structure.

PlaybookA structured instruction document that defines the agent's goal, persona, and step-by-step behavior guidelines. Playbooks replaced the older "instructions" field in the Agent Builder API circa Q4 2023.

FlowA Dialogflow CX-derived concept still present in Agent Builder for structured, graph-based conversation paths. Flows and Playbooks can coexist in a single agent — Playbooks handle open-ended reasoning; Flows handle deterministic branching.

Reasoning EngineThe managed runtime that executes custom agent loops. Accepts a Python class conforming to a specific interface (query method), packages it as a container, and deploys it to a scalable endpoint.

GroundingThe mechanism by which the agent's responses are anchored to retrieved documents or Google Search results, with citations. Configured at the agent level, not the model level.

How This Differs from a Raw Model API Call

The most common confusion when first approaching Agent Builder comes from treating it as a wrapper around the Gemini API. It is not. When you call generateContent directly, you are responsible for every aspect of the agent loop: storing conversation history, detecting when the model has requested a tool call, executing that tool, injecting the response back into context, and deciding when the loop terminates. You also handle retries, logging, and scaling yourself.

Agent Builder's Reasoning Engine takes over the entire loop. You declare your tools (as Function Declarations with JSON Schema descriptions), write a Playbook specifying what the agent should accomplish and how it should behave, and the platform handles execution. This is architecturally equivalent to the difference between writing a web server from sockets and deploying on Cloud Run — the same code runs, but the operational surface area you manage is dramatically smaller.

The practical consequence: you spend less time on plumbing and more time on agent design — specifically, on writing effective Playbooks, designing tool schemas that the model can use reliably, and instrumenting evaluation pipelines to detect when agent behavior degrades.

Production Reality

As of the Vertex AI Agent Builder GA release in March 2024, Playbook-based agents are available in all Google Cloud regions. Reasoning Engine (custom agent deployment) requires a region that supports Cloud Run v2 — us-central1, us-east1, europe-west1, and asia-northeast1 are the primary choices as of mid-2024. Check the Vertex AI documentation for the current region table before committing to an architecture.

GCP Project Setup Prerequisites

Before any of the practical work in this module is possible, your GCP project must satisfy four conditions. First, the Vertex AI API must be enabled (aiplatform.googleapis.com). Second, the Dialogflow API must be enabled (dialogflow.googleapis.com) — Agent Builder is built on Dialogflow CX infrastructure. Third, a service account with the Vertex AI User role and the Dialogflow API Client role must be available for programmatic access. Fourth, billing must be active; Agent Builder has no free tier for production agents.

The Google Cloud CLI command to enable both APIs simultaneously is: gcloud services enable aiplatform.googleapis.com dialogflow.googleapis.com. This takes approximately 60 seconds on a new project. Attempting to create an agent before both APIs are enabled produces a permission error that is easy to misread as an authentication problem — a common early stumbling block.

Lesson 1 Quiz

What Vertex AI Agent Builder Actually Is — five questions

1. Which Vertex AI component is responsible for executing the agent loop — managing state, dispatching tool calls, and deciding loop termination?

Correct. Reasoning Engine is the managed runtime that runs your agent loop — it holds conversation state, calls tools, feeds responses back into context, and determines when the agent's goal is satisfied.

Not quite. The generateContent endpoint exposes the model but does not manage any agent loop logic. Reasoning Engine is the orchestration layer that does that work.

2. What is a Playbook in Vertex AI Agent Builder?

Correct. A Playbook is the goal-oriented instruction set that shapes agent behavior. It replaced the older "instructions" field in the Agent Builder API in Q4 2023.

That's not right. A Playbook is a structured instruction document — not code, not a schema. It tells the agent what to accomplish and how to behave, not how to execute at a compute level.

3. Which two GCP APIs must be enabled before you can create an agent in Vertex AI Agent Builder?

Correct. Both the Vertex AI API and the Dialogflow API must be enabled — Agent Builder is built on Dialogflow CX infrastructure. Forgetting the Dialogflow API is a common early setup error.

Incorrect. The required pair is the Vertex AI API (aiplatform.googleapis.com) and the Dialogflow API (dialogflow.googleapis.com). Agent Builder depends on Dialogflow CX infrastructure under the hood.

4. What are the three natively supported tool types in Vertex AI Agent Builder?

Correct. The three native tool types are Function Declarations (JSON schema-described callable functions), Code Interpreter (sandboxed Python), and Vertex AI Search (RAG over your document corpus). Each has distinct latency and cost characteristics.

Incorrect. The three native tool types are Function Declarations, Code Interpreter, and Vertex AI Search. Other integrations are possible but require custom implementation via Function Declarations.

5. When can Flows and Playbooks coexist in the same Vertex AI agent?

Correct. Flows and Playbooks serve different purposes and can coexist. Flows provide structured, graph-based conversation paths for deterministic scenarios; Playbooks handle flexible, goal-oriented reasoning. Using both in one agent is a common production pattern.

Incorrect. Flows and Playbooks can coexist in a single agent. They are complementary: Flows for deterministic paths, Playbooks for open-ended reasoning. There is no region or deployment restriction on this combination.

Lab 1 — Mapping the Platform Architecture

Conversational practice · Vertex AI Agent Builder architecture and setup

Lab Objective

In this lab you will explore the Vertex AI Agent Builder architecture through guided conversation. The AI tutor will ask you questions about component relationships, setup steps, and design decisions. Respond in your own words — the goal is to solidify your mental model of how the platform's layers connect before you write any code.

Starter question: Describe in your own words the difference between calling the Gemini generateContent API directly and deploying an agent through Vertex AI Reasoning Engine. What does Reasoning Engine take off your plate?

AI Tutor — Architecture Lab

Welcome to Lab 1. Let's make sure the Vertex AI Agent Builder architecture is solid before we move to code. I'll ask you questions and give you feedback on your answers.

Here's your first question: In your own words, what does Reasoning Engine actually manage in a Vertex AI agent deployment — and why does that matter for production reliability? Take your time; specifics beat generalities.

Building Production Agents with Vertex AI · Module 1 · Lesson 2

Creating Your First Agent in Agent Builder

Console walkthrough, API equivalents, and the decisions that matter at creation time.

What choices made at agent creation time are expensive to reverse later — and what can be changed freely?

When Google Cloud opened Vertex AI Agent Builder to public preview in November 2023, the first wave of enterprise users encountered a counterintuitive problem: the console made agent creation feel trivial — a name, a model selection, a text box — but agents that worked beautifully in the built-in test simulator failed in production within days. The issue was almost never the model. It was Playbook design, tool schema precision, and region selection — decisions that had been made quickly during setup and were now costly to unpick. The engineers at Booking.com who documented their Vertex AI migration in a March 2024 Google Cloud Next session were explicit: the fifteen minutes you spend on Playbook structure at creation time are worth more than the next three days of prompt tuning.

Creating an Agent via the Console

Navigate to Agent Builder in the Google Cloud Console. Select Create a new app and choose Conversational agent as the app type. You will immediately be asked for three things: a Display Name, a Default Language, and a Time Zone. The display name is cosmetic. The language and time zone are not — changing them after agent creation requires recreating the underlying Dialogflow CX agent, which means losing all configured Flows and Playbooks.

Once the agent is created, you arrive at the agent detail page. The two tabs you will use most are Playbooks and Tools. The Agent Settings panel on the right controls model selection (Gemini 1.5 Pro vs. Flash), temperature (called "model verbosity" in the UI — a misleading label; it maps directly to the temperature parameter), and grounding configuration.

The default agent comes with a single Playbook pre-populated with placeholder text. Delete the placeholder immediately — it is generic enough to interfere with specialized agent behavior and has caused measurable accuracy degradation in production evaluations shared in the Google Cloud community forums in early 2024.

Irreversible Decisions at Creation

Language, Time Zone, and GCP Region are set at agent creation and cannot be changed without recreating the agent. Model selection, temperature, and Playbook content can all be changed freely after creation. Plan your region choice carefully — it affects data residency, latency, and available features.

API Equivalent: Creating an Agent Programmatically

Every console action in Agent Builder has an API equivalent via the Dialogflow CX REST API or the Python SDK. Creating an agent programmatically uses the dialogflow_v3.AgentsClient from the google-cloud-dialogflow-cx package (note: this is a different package from the older google-cloud-dialogflow; using the wrong one is a frequent SDK confusion).

The minimal Python pattern for agent creation:

client = dialogflow_v3.AgentsClient()
agent = dialogflow_v3.Agent(display_name="my-agent", default_language_code="en", time_zone="America/New_York")
response = client.create_agent(parent=f"projects/{PROJECT_ID}/locations/{LOCATION}", agent=agent)

The parent parameter encodes both your project ID and your chosen region. Once set, this determines all subsequent API calls — every Playbook, Tool, and Flow you create will live under this parent path. There is no cross-region agent migration path; you would need to export and recreate.

Writing an Effective Playbook

A Playbook consists of two required fields and two optional fields. The Goal field (required) is a one-to-three sentence statement of what the agent is trying to accomplish. The Instructions field (required) contains numbered steps describing the agent's behavior — what to do, in what order, under what conditions. The optional Examples section provides few-shot demonstrations. The optional Input/Output schema section enables structured parameter passing between Playbooks.

The single most common Playbook error in production is writing Instructions that are too vague for the model to follow reliably. The Vertex AI team's own guidance (published in the Agent Builder documentation, updated February 2024) is explicit: each instruction step should describe a specific action with a specific decision criterion, not a general disposition. "Be helpful" is not an instruction. "If the user asks for a product recommendation, call the get_catalog tool with the user's stated category and budget as parameters, then present the top three results by price" is an instruction.

Playbook Design Principle

Each numbered step in a Playbook should answer: what does the agent do, under what condition, with what specific tool or response? Vague steps produce inconsistent behavior that is extremely difficult to debug because the model's interpretation of vague instructions varies across calls.

Model Selection: Pro vs. Flash

Gemini 1.5 Pro and Gemini 1.5 Flash are both available for Vertex AI agents. The choice is not primarily about capability — Flash is approximately 95% as accurate as Pro on agent benchmarks — but about latency and cost. Flash has roughly 3–5× lower latency on tool-heavy workloads and is priced at approximately one-sixth of Pro per million input tokens. For agents that call tools frequently (more than three tool calls per conversation turn on average), Flash's latency advantage compounds significantly.

The documented recommendation from Google's Agent Builder team as of Q1 2024: begin with Pro during development for maximum debugging visibility, then evaluate switching to Flash before production using the Agent Evaluations framework (covered in Module 3). Do not assume Flash is "worse" — on constrained, well-specified tasks with clean tool schemas, Flash often outperforms Pro by producing shorter, more reliable tool call arguments.

Lesson 2 Quiz

Creating Your First Agent — five questions

1. Which three properties of a Vertex AI agent cannot be changed after creation without recreating the agent?

Correct. Language, Time Zone, and GCP Region are set at agent creation and are irreversible without recreating the agent — which means losing all Flows and Playbooks. Model selection and Playbook content can be changed freely.

Incorrect. Language, Time Zone, and GCP Region are the irreversible choices. Model selection, temperature, and Playbook content are all mutable after creation.

2. Why does the Google Cloud documentation recommend deleting the default placeholder Playbook immediately after agent creation?

Correct. The placeholder Playbook's generic instructions are broad enough to conflict with specialized agent behavior. Community forum reports and production evaluations from early 2024 documented measurable accuracy degradation when it was left in place.

Incorrect. The problem is that the placeholder Playbook's generic content interferes with the specific behavior you're trying to achieve. It doesn't affect quota or model versions.

3. What Python SDK package should you use for Vertex AI Agent Builder programmatic access — and what is the common confusion to avoid?

Correct. The package is google-cloud-dialogflow-cx (dialogflow_v3 import). Using the older google-cloud-dialogflow package (dialogflow_v2) is a very common error that produces confusing authentication and schema errors.

Incorrect. For Agent Builder, use the google-cloud-dialogflow-cx package, not the older google-cloud-dialogflow. This is a frequent SDK confusion that produces misleading errors.

4. Which of the following is a well-formed Playbook instruction step?

Correct. This step specifies the triggering condition (user asks for recommendation), the specific action (call get_catalog), the parameters to pass (category and budget), and the output format (top three by price). All four elements make it actionable and testable.

Incorrect. The correct answer specifies a condition, a tool call, parameters, and an output format. Vague disposition statements like "be helpful" produce inconsistent behavior that's difficult to debug because the model's interpretation varies across calls.

5. According to the Google Agent Builder team's Q1 2024 guidance, when should you evaluate switching from Gemini 1.5 Pro to Flash?

Correct. Use Pro during development for maximum visibility, then evaluate Flash before going to production using the Agent Evaluations framework. On constrained, well-specified tasks, Flash sometimes outperforms Pro while being significantly cheaper and faster.

Incorrect. The recommendation is to develop with Pro (better debugging visibility), then evaluate Flash before production. Don't assume Pro is always better — Flash often matches or exceeds Pro on well-specified tasks.

Lab 2 — Agent Creation and Playbook Design

Conversational practice · Agent setup decisions and Playbook writing

Lab Objective

Practice reasoning through agent creation decisions and Playbook structure. The AI tutor will present scenarios where you must decide on region, model, and Playbook instruction quality. You'll also rewrite vague instructions into well-formed Playbook steps.

Starter challenge: You are building a customer support agent for a European retail company that handles order status queries. It will call an order_lookup tool and a returns_policy tool. What GCP region would you choose, why, and what are the first two Playbook instruction steps you would write?

AI Tutor — Agent Design Lab

Welcome to Lab 2. We'll work through agent creation decisions and Playbook design together. I'll give you scenarios and critique your Playbook steps.

First scenario: You're creating an agent that helps internal HR staff look up employee benefits information. It needs to comply with EU data residency requirements. Which region do you choose, and why does the parent path in your API call matter for everything that follows?

Building Production Agents with Vertex AI · Module 1 · Lesson 3

Tools, Function Declarations, and Tool Schemas

How the model decides which tools to call — and how schema quality drives reliability.

Why does the precision of your tool's JSON Schema description matter as much as the tool's actual implementation?

In January 2024, the engineering team at Replit documented a production incident in their internal post-mortem notes (shared in a technical blog post in March 2024): their Vertex AI-powered coding assistant was calling a file_write tool correctly in 94% of test cases but failing silently in 6% of production calls. The root cause was not the model, not the tool implementation, and not the Playbook. It was the tool schema. The file_path parameter was described as "the path to write to" — a description that allowed the model to occasionally pass relative paths where the tool implementation expected absolute paths. A one-sentence schema change — "the absolute POSIX file path, e.g. /workspace/src/main.py — must begin with a forward slash" — reduced the failure rate to under 0.3% within two days.

How Function Declarations Work

When you add a tool to a Vertex AI agent using the Function Declarations type, you are providing the model with a structured description of a callable function. The model uses this description — not the function's source code — to decide when to call the tool, which arguments to pass, and how to interpret the response. The function itself runs in your infrastructure (a Cloud Function, a backend service, whatever you deploy); the model only ever sees the schema.

A Function Declaration consists of four elements: a name (lowercase with underscores, no spaces — this is sent verbatim to the model), a description (a natural-language explanation of what the function does and when to use it), a parameters block (a JSON Schema object describing every parameter), and a required array listing which parameters must always be provided.

The description field is not documentation for humans — it is a prompt fragment that the model reads during every tool selection decision. It should be written with that in mind: specific, unambiguous, with explicit guidance on when to use this tool versus alternative tools.

Schema Writing Rule

Every parameter description should specify: data type, expected format or range, what happens at edge cases, and at least one concrete example value. "the date" is a bad description. "the query date in ISO 8601 format, e.g. 2024-03-15 — do not include a time component" is a good one.

The Full Function Declaration Structure

In the Vertex AI Python SDK, a Function Declaration is created using the vertexai.generative_models.FunctionDeclaration class. The parameters block uses a subset of JSON Schema — specifically the properties that Vertex AI's model parser understands: type, description, enum, items (for arrays), and properties (for nested objects). The format and pattern keywords from full JSON Schema are not supported and will be silently ignored — a common source of confusion when porting schemas from other systems.

An example of a well-formed Function Declaration for a product search tool:

name: "search_products"
description: "Search the product catalog by keyword and optional category. Use this tool when the user asks about specific products, wants recommendations, or asks what is available. Do not use for order status queries — use get_order_status instead."
parameters: { type: OBJECT, properties: { query: { type: STRING, description: "search keywords, e.g. 'blue running shoes size 10'" }, category: { type: STRING, enum: ["footwear","apparel","accessories","electronics"], description: "product category filter — omit if user did not specify a category" } }, required: ["query"] }

Enum Parameters and Their Effect on Reliability

Using enum arrays in your parameter schemas is one of the highest-leverage reliability improvements available in function calling. When a parameter has an enum, the model is constrained to select only from the listed values — it cannot hallucinate an out-of-bounds value for that parameter. In A/B testing published by the Vertex AI team in their February 2024 release notes for Agent Builder, adding enum constraints to categorical parameters reduced tool call argument errors by 31% compared to free-text string parameters with only a description.

The practical implication: any parameter with a finite, known set of valid values should be an enum. Status codes, category names, sort orders, units of measurement, language codes — all of these should be expressed as enums rather than described in text alone.

Tool Responses and What the Model Sees

After the agent calls a tool, the tool's response is injected back into the model's context as a function response part — a structured message type distinct from user turns and model turns. The model then generates its next action based on this updated context. This is important for schema design: the model will attempt to extract information from your tool response using natural language reasoning. If your tool returns a dense JSON blob with abbreviated field names (qty, sku, avl), the model's interpretation will be less reliable than if you return human-readable field names (quantity_available, product_sku, availability_status).

The rule of thumb from production deployments: tool responses should be readable by a non-technical human. If a support engineer reviewing a conversation log couldn't understand the tool response without decoding, the model is probably struggling with it too.

Schema Quality Checklist

Before deploying any tool: (1) Does every parameter have a description with at least one example value? (2) Are all categorical parameters expressed as enums? (3) Does the tool-level description say when NOT to use this tool (disambiguating from other tools)? (4) Are tool responses using readable field names? Four yes answers predict reliable tool use significantly better than prompt-tuning alone.

Lesson 3 Quiz

Tools, Function Declarations, and Tool Schemas — five questions

1. What does the model actually use to decide when and how to call a tool — and what does it NOT see?

Correct. The model uses only the Function Declaration schema — name, description, and parameter descriptions. The function's source code is never in context. This is why schema quality determines tool call reliability more than implementation quality.

Incorrect. The model only sees the Function Declaration schema — it never has access to the function's source code. The description field is effectively a prompt fragment that drives tool selection and argument formation.

2. According to Vertex AI Agent Builder February 2024 release notes, what effect did adding enum constraints to categorical parameters have on tool call argument errors?

Correct. The Vertex AI team's A/B testing showed a 31% reduction in tool call argument errors when categorical parameters used enum constraints versus free-text descriptions. Enums constrain the model to valid values — it cannot hallucinate out-of-bounds values for enum parameters.

Incorrect. The documented result was a 31% reduction in tool call argument errors. Enums constrain the model to listed values, preventing hallucinated categories. This is one of the highest-leverage reliability improvements in function calling.

3. Which JSON Schema keywords are NOT supported in Vertex AI Function Declaration parameters, despite being valid JSON Schema?

Correct. The format and pattern keywords from full JSON Schema are not supported in Vertex AI Function Declarations — they are silently ignored. This is a common confusion when porting schemas from other systems that do support these keywords.

Incorrect. The unsupported keywords are format and pattern — they are silently ignored in Vertex AI's schema parser. The supported subset includes type, description, enum, items, and properties.

4. What is the key rule of thumb for writing tool response formats to maximize model interpretation reliability?

Correct. If a support engineer reviewing a conversation log couldn't understand the tool response without decoding abbreviations, the model is probably struggling with it too. Human-readable field names (quantity_available vs. qty) significantly improve model interpretation reliability.

Incorrect. The rule is that tool responses should be readable by a non-technical human. Dense JSON with abbreviated fields is harder for the model to interpret correctly. Prioritize clarity over token efficiency in tool responses.

5. What was the root cause of Replit's 6% silent failure rate in their Vertex AI coding assistant, as documented in their March 2024 post-mortem?

Correct. The file_path parameter was described as "the path to write to" — vague enough that the model occasionally passed relative paths. A schema change specifying "the absolute POSIX file path, e.g. /workspace/src/main.py — must begin with a forward slash" reduced the failure rate from 6% to under 0.3%.

Incorrect. The root cause was a vague parameter description in the tool schema that allowed relative paths where the implementation required absolute POSIX paths. A one-sentence schema improvement fixed the problem — no code changes required.

Lab 3 — Function Declaration Schema Design

Conversational practice · Writing and critiquing tool schemas

Lab Objective

Practice writing precise Function Declaration schemas. The AI tutor will present you with vague or broken schemas to critique and improve, and will ask you to write schemas from scratch for described tool requirements. Apply the four-question schema quality checklist from Lesson 3.

Starter task: Critique this parameter description for a tool that books meeting rooms — room_id: "the room" — and rewrite it to meet production quality standards. Explain every change you make.

AI Tutor — Schema Design Lab

Welcome to Lab 3. We're going to drill Function Declaration schema quality until writing precise schemas becomes instinctive.

Here's your first task: I'm giving you a broken schema. Critique every problem and produce an improved version.

Tool: get_weather
Description: "gets weather"
Parameters: { location: "a place", units: "c or f" }

What's wrong, and how would you fix it?

Building Production Agents with Vertex AI · Module 1 · Lesson 4

Deploying to Reasoning Engine and Testing in Production

From local development to a managed endpoint — the deployment model, testing patterns, and first-day operational decisions.

What does "deploying an agent to Reasoning Engine" actually involve — and what does the managed runtime give you that running locally does not?

When Verizon deployed a network configuration agent on Vertex AI Reasoning Engine in Q4 2023 — described in a Google Cloud Next 2024 session in April — the team's primary concern was not capability but operational continuity. Their agent called five internal APIs across four different teams, each with different SLAs. In the first week of production, two of those APIs returned unexpected error formats that the agent had never seen in testing. Because the agent ran on Reasoning Engine with Cloud Logging enabled, the team had complete call traces — every tool invocation, every model response, every error — queryable in BigQuery within seconds of occurrence. The recovery time from alert to root-cause identification was eleven minutes. On the previous generation system, which ran on self-managed Kubernetes, the equivalent debugging process had taken an average of four hours.

The Reasoning Engine Deployment Model

Deploying a custom agent to Reasoning Engine requires three things: a Python class with a specific interface, a requirements specification, and a call to reasoning_engines.ReasoningEngine.create(). The class must implement a query method that accepts a single string argument (or a dict for multi-parameter input) and returns a string or dict response. Any setup that should happen once — loading a LangChain agent, initializing a vector store connection, configuring tool definitions — goes in an __init__ or set_up method.

The deployment call wraps your class in a container, pushes it to Artifact Registry, and provisions a Cloud Run v2 endpoint. From the caller's perspective, the deployed agent is a single HTTPS endpoint that accepts your query and returns a response. All the orchestration — the model calls, tool dispatches, state management — happens inside Reasoning Engine.

Crucially, Reasoning Engine is stateless between sessions by default. Each call to the deployed endpoint is independent. If you need conversation memory (the agent should remember what was said earlier in the same conversation), you must manage state explicitly — either by passing conversation history in each request, or by using a Vertex AI session ID to retrieve history from Firestore (the pattern documented in the Agent Builder developer guide as of February 2024).

Deployment Time Warning

The first deployment of a new Reasoning Engine instance typically takes 8–15 minutes. This is the container build and provisioning time, not a failure. Subsequent deployments to the same instance (updates) take 3–5 minutes. Plan for this in your CI/CD pipeline — a deployment that appears to hang for 10 minutes is almost always normal.

Testing Patterns Before and After Deployment

There are two distinct testing surfaces for Vertex AI agents: the Agent Builder Test Console (the built-in simulator in the Cloud Console) and programmatic evaluation via the Vertex AI SDK. Each has different purposes and different blind spots.

The Test Console is excellent for rapid iteration on Playbook instructions and sanity-checking tool call behavior. It shows every tool invocation and response inline, which is invaluable during schema design. Its limitation: it runs in Google's test infrastructure, not your deployment — it does not simulate your actual deployed endpoint, and it does not capture the latency or error rates you will see in production.

Programmatic evaluation using vertexai.evaluation lets you run a prepared dataset of input-output pairs against your agent and score them on metrics like tool_call_quality, response_groundedness, and task_success. This is covered in detail in Module 3 of this course. For Module 1, the minimum viable testing pattern is: write ten representative test cases covering your agent's primary use cases, run them against the deployed endpoint, and manually review every tool call trace.

Cloud Logging Integration

Every Reasoning Engine deployment automatically emits structured logs to Cloud Logging under the resource type aiplatform.googleapis.com/ReasoningEngine. The log entries include: request payload, every tool call dispatched (with arguments), every tool response received, the final model response, and total latency broken down by model call and tool execution time.

To query these logs in BigQuery for analysis (the pattern used by the Verizon team), you need to create a log sink: gcloud logging sinks create reasoning-engine-sink bigquery.googleapis.com/projects/{PROJECT_ID}/datasets/{DATASET_ID} --log-filter="resource.type=aiplatform.googleapis.com/ReasoningEngine". This is a one-time setup that gives you a queryable history of every agent invocation — essential for debugging and for computing aggregate reliability metrics over time.

First-Day Operational Checklist

Before directing any real user traffic to a newly deployed Reasoning Engine agent, the following five checks should be complete:

1. Cloud Logging sink is active and the first test invocations are visible in BigQuery or the Logs Explorer. If you can't see logs, debugging in production will be nearly impossible.

2. All tool endpoints are behind retry logic with appropriate backoff. Reasoning Engine does not retry failed tool calls automatically — your tool implementation or your orchestration layer must handle this.

3. A fallback response is configured for cases where all tool calls fail. An agent that returns a blank response when its tools are unavailable is worse than a simple error message.

4. Response latency has been measured under realistic load — at minimum, ten sequential calls with production-representative inputs. P95 latency above 8 seconds typically indicates a tool call bottleneck that needs addressing before public exposure.

5. The agent has been tested with adversarial inputs — requests designed to trigger tool misuse, off-topic responses, or instruction-following failures. The built-in Test Console is adequate for this; you do not need a formal red-teaming process at this stage.

Module 1 Summary

You now understand the four-layer Vertex AI Agent Builder architecture, the decisions that matter at agent creation time, how Function Declaration schemas drive tool reliability, and the Reasoning Engine deployment model. Module 2 will go deeper into multi-turn conversation management, session state, and the patterns for building agents that maintain coherent context across long interactions.

Lesson 4 Quiz

Deploying to Reasoning Engine and Testing — five questions

1. What is Reasoning Engine's default state management behavior between sessions — and what must you do if you need conversation memory?

Correct. Reasoning Engine is stateless between sessions by default. To maintain conversation memory, you either pass history in each request or use a Vertex AI session ID to retrieve history from Firestore — the pattern documented in the Agent Builder developer guide.

Incorrect. Reasoning Engine is stateless by default. You manage state explicitly by passing conversation history in requests or using the Vertex AI session ID / Firestore pattern. There is no automatic 30-day session storage.

2. Why does the built-in Agent Builder Test Console have a significant production-testing blind spot?

Correct. The Test Console runs in Google's own test infrastructure — not your deployed Reasoning Engine endpoint. It's excellent for Playbook iteration and tool call inspection, but it won't surface the latency or error rates you'll see against your actual tool implementations in production.

Incorrect. The Test Console does show tool invocations — that's one of its strengths. The limitation is that it runs in Google's infrastructure rather than against your deployed endpoint, so production latency and error rates aren't captured.

3. How long does a first deployment to Reasoning Engine typically take — and what is the common mistake teams make when they see this?

Correct. First deployments take 8–15 minutes for container build and provisioning. This is normal. The common mistake is canceling or retrying after a few minutes, which can create orphaned deployments and billing issues. Subsequent updates to the same instance take only 3–5 minutes.

Incorrect. First Reasoning Engine deployments take 8–15 minutes. Teams frequently cancel what appears to be a hung deployment, when the container is simply being built and provisioned normally. Plan for this in your CI/CD pipeline.

4. What Cloud Logging resource type should you filter on to view Reasoning Engine agent invocation logs?

Correct. The resource type is aiplatform.googleapis.com/ReasoningEngine. This is what you use in your log sink filter to route agent invocation logs to BigQuery for analysis. Logs include full request/response payloads, tool call traces, and per-component latency breakdowns.

Incorrect. The resource type is aiplatform.googleapis.com/ReasoningEngine. While Reasoning Engine runs on Cloud Run v2 under the hood, the logs surface under the Vertex AI resource type, not the Cloud Run resource type.

5. Which item in the first-day operational checklist specifically addresses the scenario where all tool calls fail simultaneously?

Correct. A fallback response configuration handles the case where all tools are unavailable simultaneously. An agent that returns a blank response in this scenario is worse than a helpful error message. This is distinct from retry logic, which handles individual transient failures.

Incorrect. The checklist item specifically for total tool unavailability is configuring a fallback response. Retry logic handles individual transient failures; the fallback handles the case where retries are also exhausted or all tools are down simultaneously.

Lab 4 — Deployment and Operations Design

Conversational practice · Reasoning Engine deployment decisions and operational readiness

Lab Objective

Work through deployment architecture decisions and operational readiness scenarios. The AI tutor will present deployment challenges — state management decisions, logging setup, pre-launch checklists — and ask you to reason through the options. Apply the five-point first-day checklist from Lesson 4.

Starter scenario: You've deployed a customer service agent to Reasoning Engine. On day 2, a customer reports the agent "forgot" what they said at the start of the conversation. Walk through exactly what is happening architecturally and what you would do to fix it.

AI Tutor — Deployment Operations Lab

Welcome to Lab 4. We'll think through production deployment scenarios for Vertex AI agents — the kind of problems that appear on day two, not day one.

First scenario: Your Reasoning Engine agent is deployed. It's been running for three days. A support ticket comes in: users report the agent sometimes uses tools correctly but never follows up appropriately — it calls the right tool but then ignores the response and gives a generic answer.

What are the two most likely causes, and how would you investigate each using the tools available to you on Vertex AI?

Module 1 Test

The Vertex AI Agent Platform — Architecture and Setup · 15 questions · 80% to pass

1. In the Vertex AI Agent Builder four-layer architecture, which layer sits immediately above the Foundation Model Layer?

Correct. The Orchestration Layer (Reasoning Engine) sits directly above the Foundation Model Layer. It runs the agent loop, manages state, dispatches tool calls, and feeds responses back into context.

Incorrect. The layer immediately above the Foundation Model Layer is the Orchestration Layer, implemented through Reasoning Engine. The Tool Layer and management API sit above that.

2. What happens if you attempt to create a Vertex AI Agent Builder agent before enabling the Dialogflow API?

Correct. Missing the Dialogflow API produces a permission error that looks like an authentication failure — a common early stumbling block because the error message doesn't clearly indicate which API is missing.

Incorrect. Attempting agent creation without the Dialogflow API enabled produces a permission error that is frequently misread as an authentication problem. The console does not automatically prompt you to enable missing APIs.

3. A Vertex AI agent is a combination of what elements?

Correct. An agent in Vertex AI is a configured LLM plus a Playbook (goal instructions), zero or more tools, and optionally a Datastore. An agent is not merely a model — it is a model plus a control structure.

Incorrect. A Vertex AI agent combines a configured LLM, a Playbook defining its goal and behavior, zero or more tools, and optionally a Datastore for retrieval. It is a control structure built around a model, not just the model itself.

4. What is the primary architectural difference between Flows and Playbooks in a Vertex AI agent?

Correct. Flows (from Dialogflow CX) provide structured graph-based paths for deterministic scenarios. Playbooks provide flexible, goal-oriented reasoning for open-ended tasks. They can coexist in a single agent, each handling what it does best.

Incorrect. The distinction is structural: Flows = deterministic graph-based paths; Playbooks = goal-oriented open-ended reasoning. Both can coexist in one agent. Flows are not deprecated — they serve a different purpose than Playbooks.

5. The gcloud command to enable both required APIs simultaneously is:

Correct. The two APIs are aiplatform.googleapis.com (Vertex AI) and dialogflow.googleapis.com (Dialogflow, which Agent Builder is built on). This command enables both simultaneously and takes approximately 60 seconds on a new project.

Incorrect. The correct command is: gcloud services enable aiplatform.googleapis.com dialogflow.googleapis.com. The API names for the other options don't correspond to real API identifiers for these services.

6. What does the "model verbosity" setting in the Agent Builder console actually control?

Correct. "Model verbosity" is a misleading UI label — it maps directly to the temperature parameter. Higher verbosity = higher temperature = more varied/creative responses. This is a known naming inconsistency in the Agent Builder console.

Incorrect. Despite the name, "model verbosity" maps to the temperature parameter, not response length or logging verbosity. This is a well-documented naming inconsistency in the Agent Builder console.

7. Why should you prefer using enum constraints over free-text descriptions for categorical tool parameters?

Correct. When a parameter is an enum, the model can only select from listed values — it cannot hallucinate an invalid value. The Vertex AI team's A/B testing showed a 31% reduction in tool call argument errors for enum-constrained parameters versus equivalent free-text descriptions.

Incorrect. The primary benefit of enums is constraining the model to valid values only, which reduces hallucinated arguments. The documented improvement from the Vertex AI team's testing was a 31% reduction in argument errors.

8. What Python class interface does Reasoning Engine require for a custom agent deployment?

Correct. Reasoning Engine requires a Python class with a query method (accepting a string or dict, returning string or dict) plus optional __init__ or set_up for initialization logic. This interface is what Reasoning Engine wraps into a managed container.

Incorrect. Reasoning Engine requires a query method (one-time setup in __init__ or set_up). There is no run(), reset(), invoke(), or handle_request() requirement in the Reasoning Engine interface spec.

9. Which service does Vertex AI use for persistent session state storage in the documented conversation memory pattern?

Correct. The conversation memory pattern documented in the Agent Builder developer guide uses Firestore as the backing store for session history, accessed via Vertex AI session IDs. This provides persistent, queryable conversation state across Reasoning Engine's stateless invocations.

Incorrect. The documented pattern for Vertex AI session state storage uses Firestore, not Bigtable, Spanner, or Memorystore. Session IDs are used to retrieve history from Firestore at the start of each Reasoning Engine invocation.

10. The Booking.com engineers presenting at Google Cloud Next in March 2024 emphasized that time spent on Playbook structure at creation is equivalent in value to how much subsequent prompt tuning?

Correct. The Booking.com engineers were explicit: fifteen minutes of Playbook structure work at creation time is worth more than the next three days of prompt tuning. Front-loading architectural decisions pays disproportionately in agent development.

Incorrect. The Booking.com engineers said fifteen minutes on Playbook structure at creation is worth more than three days of subsequent prompt tuning. This reflects how foundational Playbook design is to agent reliability.

11. What is the key limitation of the Agent Builder Test Console compared to programmatic evaluation?

Correct. The Test Console runs in Google's infrastructure — it's excellent for Playbook and schema iteration, but it doesn't test your actual deployed endpoint. Production latency, your tool endpoint behavior, and real error rates only appear in actual Reasoning Engine invocations.

Incorrect. The Test Console's key limitation is that it runs in Google's test infrastructure, not against your deployed endpoint. It does show tool invocations and supports Pro — but it won't reveal production latency or your tool endpoint's actual failure behavior.

12. Which JSON Schema keywords does Vertex AI support in Function Declaration parameter schemas?

Correct. The supported subset is type, description, enum, items (for arrays), and properties (for nested objects). The format and pattern keywords from full JSON Schema are silently ignored — a common porting confusion.

Incorrect. Vertex AI supports type, description, enum, items, and properties. The format and pattern keywords are not supported and are silently ignored — not an error, just no effect. This matters when porting schemas from systems that do support those keywords.

13. What is the correct google-cloud Python package for programmatic Vertex AI Agent Builder access — and what is the package name to avoid?

Correct. Use google-cloud-dialogflow-cx (imports as dialogflow_v3). The older google-cloud-dialogflow (dialogflow_v2) package targets Dialogflow ES, not CX, and produces confusing authentication and schema errors when used for Agent Builder work.

Incorrect. The correct package is google-cloud-dialogflow-cx (dialogflow_v3 import). Using the older google-cloud-dialogflow package targets Dialogflow ES, not CX, and produces misleading errors that are hard to diagnose.

14. In the Verizon Reasoning Engine case, what capability reduced mean time from alert to root-cause identification from four hours to eleven minutes?

Correct. The complete structured call traces in Cloud Logging — including every tool invocation, argument, response, and error — queryable in BigQuery, enabled the 11-minute investigation. On their previous self-managed Kubernetes system, the same investigation averaged four hours.

Incorrect. The capability that enabled the 11-minute investigation was the complete, structured call traces in Cloud Logging, queryable in BigQuery. This is automatically provided by Reasoning Engine — no additional configuration was required beyond the log sink setup.

15. According to the first-day operational checklist, what P95 latency threshold typically indicates a tool call bottleneck requiring attention before public launch?

Correct. P95 latency above 8 seconds typically signals a tool call bottleneck — usually a slow external API, missing connection pooling, or a tool that makes unnecessary sequential calls where parallel calls would suffice. Address this before public exposure.

Incorrect. The threshold in the first-day checklist is 8 seconds for P95 latency. Below that, latency is typically acceptable for most agent use cases. Above it, there is almost always a diagnosable tool call bottleneck rather than an inherent model speed limitation.