When the Bell System deployed the first fully electronic telephone switching systems in the 1960s — the No. 1 ESS, installed in Succasunna, New Jersey in May 1965 — the engineers involved understood they were not building a faster version of the existing electromechanical exchanges. The stored-program control at the center of the new architecture meant the switch could be updated, reconfigured, and extended through software rather than physical rewiring. Within a decade, that distinction had restructured the entire telecommunications industry: maintenance labor dropped sharply, new services could be activated remotely overnight, and the competitive advantage shifted permanently toward whoever could deploy software updates fastest. The hardware was the platform; the software was the product.
The same structural break is now visible in enterprise software. From 2023 onward, Google, Amazon, and Microsoft each released managed platforms — Vertex AI Agent Builder, Amazon Bedrock Agents, Azure AI Agent Service — designed to host AI systems that do not merely respond to queries but plan multi-step tasks, call external APIs, maintain state across sessions, and hand work to other agents. The unit of deployment is no longer a model or an endpoint; it is an agent with a goal, tools, and an orchestrator capable of retrying, branching, and recovering from failure. The architectural consequences are not yet fully understood, but the direction is clear.
This course teaches you to build on Google's Vertex AI Agent Builder: how its components connect, how to configure agents that call real tools reliably, how to evaluate and monitor them in production, and where the current platform's limits sit. By the end you will have constructed working agents against the live Vertex API and understood enough of the underlying architecture to reason confidently about what breaks and why. The course will not pretend the technology is more settled than it is — agent reliability in production is still an open engineering problem — but it will give you the foundation to engage with that problem seriously.
If you finish every module, here's who you become:
In August 2023, Google internally demonstrated a procurement agent built on what would become Vertex AI Agent Builder. The system received a natural-language purchase request, queried a live inventory database via function calling, cross-referenced vendor lead times from an external REST API, drafted a purchase order, routed it to a human approver over email, and — if the approver didn't respond within four hours — escalated to a manager and retried. No part of this flow was hard-coded in application logic. The orchestrator, running on Vertex, decided the sequence. When the inventory API returned an error on the third step, the agent retried with exponential backoff and logged the incident to Cloud Logging automatically. The demonstration was notable not because the individual capabilities were new — function calling had existed in the OpenAI API since June 2023 — but because the infrastructure for running, monitoring, and recovering a multi-step agent in a production environment was provided by the platform itself.
Vertex AI Agent Builder is organized around four distinct layers, each of which can be addressed independently through the API or the console. Understanding the boundaries between these layers is prerequisite to any serious work on the platform.
The Foundation Model Layer sits at the base. This is Gemini — specifically the Gemini 1.5 Pro and Gemini 1.5 Flash models as of mid-2024 — exposed via the generateContent endpoint in Vertex AI. Agents do not call this endpoint directly in most configurations; they call it through the orchestration layer. But model selection, context window limits, and grounding configuration all live here.
Above it sits the Orchestration Layer, implemented through Vertex AI Reasoning Engine (formerly Vertex AI Generative AI on Vertex — the naming has changed twice since launch). This is the component that runs your agent loop: it holds the conversation state, dispatches tool calls, receives tool responses, feeds them back into the model context, and decides when the agent's goal is satisfied. You can bring your own orchestration framework — LangChain, LlamaIndex, or a custom loop — and deploy it to Reasoning Engine as a managed runtime.
The Tool Layer is where agents connect to the external world. Vertex AI supports three tool types natively: Function Declarations (structured JSON schemas describing callable functions), Code Interpreter (a sandboxed Python execution environment), and Vertex AI Search (retrieval-augmented generation over your own document corpus). Each tool type has different latency, cost, and failure mode characteristics that matter in production design.
The outermost layer is the Agent Builder Console and API — the management plane for creating, versioning, testing, and deploying agents. This is also where you configure Playbooks (the goal-oriented instruction sets that shape agent behavior) and Data Stores (indexed corpora for grounded retrieval).
Reasoning Engine is a fully managed container runtime on Cloud Run under the hood. You do not manage the underlying infrastructure, but you do pay per invocation and per second of compute. Understanding this shapes how you think about agent latency budgets and cost modeling.
The most common confusion when first approaching Agent Builder comes from treating it as a wrapper around the Gemini API. It is not. When you call generateContent directly, you are responsible for every aspect of the agent loop: storing conversation history, detecting when the model has requested a tool call, executing that tool, injecting the response back into context, and deciding when the loop terminates. You also handle retries, logging, and scaling yourself.
Agent Builder's Reasoning Engine takes over the entire loop. You declare your tools (as Function Declarations with JSON Schema descriptions), write a Playbook specifying what the agent should accomplish and how it should behave, and the platform handles execution. This is architecturally equivalent to the difference between writing a web server from sockets and deploying on Cloud Run — the same code runs, but the operational surface area you manage is dramatically smaller.
The practical consequence: you spend less time on plumbing and more time on agent design — specifically, on writing effective Playbooks, designing tool schemas that the model can use reliably, and instrumenting evaluation pipelines to detect when agent behavior degrades.
As of the Vertex AI Agent Builder GA release in March 2024, Playbook-based agents are available in all Google Cloud regions. Reasoning Engine (custom agent deployment) requires a region that supports Cloud Run v2 — us-central1, us-east1, europe-west1, and asia-northeast1 are the primary choices as of mid-2024. Check the Vertex AI documentation for the current region table before committing to an architecture.
Before any of the practical work in this module is possible, your GCP project must satisfy four conditions. First, the Vertex AI API must be enabled (aiplatform.googleapis.com). Second, the Dialogflow API must be enabled (dialogflow.googleapis.com) — Agent Builder is built on Dialogflow CX infrastructure. Third, a service account with the Vertex AI User role and the Dialogflow API Client role must be available for programmatic access. Fourth, billing must be active; Agent Builder has no free tier for production agents.
The Google Cloud CLI command to enable both APIs simultaneously is: gcloud services enable aiplatform.googleapis.com dialogflow.googleapis.com. This takes approximately 60 seconds on a new project. Attempting to create an agent before both APIs are enabled produces a permission error that is easy to misread as an authentication problem — a common early stumbling block.
In this lab you will explore the Vertex AI Agent Builder architecture through guided conversation. The AI tutor will ask you questions about component relationships, setup steps, and design decisions. Respond in your own words — the goal is to solidify your mental model of how the platform's layers connect before you write any code.
When Google Cloud opened Vertex AI Agent Builder to public preview in November 2023, the first wave of enterprise users encountered a counterintuitive problem: the console made agent creation feel trivial — a name, a model selection, a text box — but agents that worked beautifully in the built-in test simulator failed in production within days. The issue was almost never the model. It was Playbook design, tool schema precision, and region selection — decisions that had been made quickly during setup and were now costly to unpick. The engineers at Booking.com who documented their Vertex AI migration in a March 2024 Google Cloud Next session were explicit: the fifteen minutes you spend on Playbook structure at creation time are worth more than the next three days of prompt tuning.
Navigate to Agent Builder in the Google Cloud Console. Select Create a new app and choose Conversational agent as the app type. You will immediately be asked for three things: a Display Name, a Default Language, and a Time Zone. The display name is cosmetic. The language and time zone are not — changing them after agent creation requires recreating the underlying Dialogflow CX agent, which means losing all configured Flows and Playbooks.
Once the agent is created, you arrive at the agent detail page. The two tabs you will use most are Playbooks and Tools. The Agent Settings panel on the right controls model selection (Gemini 1.5 Pro vs. Flash), temperature (called "model verbosity" in the UI — a misleading label; it maps directly to the temperature parameter), and grounding configuration.
The default agent comes with a single Playbook pre-populated with placeholder text. Delete the placeholder immediately — it is generic enough to interfere with specialized agent behavior and has caused measurable accuracy degradation in production evaluations shared in the Google Cloud community forums in early 2024.
Language, Time Zone, and GCP Region are set at agent creation and cannot be changed without recreating the agent. Model selection, temperature, and Playbook content can all be changed freely after creation. Plan your region choice carefully — it affects data residency, latency, and available features.
Every console action in Agent Builder has an API equivalent via the Dialogflow CX REST API or the Python SDK. Creating an agent programmatically uses the dialogflow_v3.AgentsClient from the google-cloud-dialogflow-cx package (note: this is a different package from the older google-cloud-dialogflow; using the wrong one is a frequent SDK confusion).
The minimal Python pattern for agent creation:
client = dialogflow_v3.AgentsClient()
agent = dialogflow_v3.Agent(display_name="my-agent", default_language_code="en", time_zone="America/New_York")
response = client.create_agent(parent=f"projects/{PROJECT_ID}/locations/{LOCATION}", agent=agent)
The parent parameter encodes both your project ID and your chosen region. Once set, this determines all subsequent API calls — every Playbook, Tool, and Flow you create will live under this parent path. There is no cross-region agent migration path; you would need to export and recreate.
A Playbook consists of two required fields and two optional fields. The Goal field (required) is a one-to-three sentence statement of what the agent is trying to accomplish. The Instructions field (required) contains numbered steps describing the agent's behavior — what to do, in what order, under what conditions. The optional Examples section provides few-shot demonstrations. The optional Input/Output schema section enables structured parameter passing between Playbooks.
The single most common Playbook error in production is writing Instructions that are too vague for the model to follow reliably. The Vertex AI team's own guidance (published in the Agent Builder documentation, updated February 2024) is explicit: each instruction step should describe a specific action with a specific decision criterion, not a general disposition. "Be helpful" is not an instruction. "If the user asks for a product recommendation, call the get_catalog tool with the user's stated category and budget as parameters, then present the top three results by price" is an instruction.
Each numbered step in a Playbook should answer: what does the agent do, under what condition, with what specific tool or response? Vague steps produce inconsistent behavior that is extremely difficult to debug because the model's interpretation of vague instructions varies across calls.
Gemini 1.5 Pro and Gemini 1.5 Flash are both available for Vertex AI agents. The choice is not primarily about capability — Flash is approximately 95% as accurate as Pro on agent benchmarks — but about latency and cost. Flash has roughly 3–5× lower latency on tool-heavy workloads and is priced at approximately one-sixth of Pro per million input tokens. For agents that call tools frequently (more than three tool calls per conversation turn on average), Flash's latency advantage compounds significantly.
The documented recommendation from Google's Agent Builder team as of Q1 2024: begin with Pro during development for maximum debugging visibility, then evaluate switching to Flash before production using the Agent Evaluations framework (covered in Module 3). Do not assume Flash is "worse" — on constrained, well-specified tasks with clean tool schemas, Flash often outperforms Pro by producing shorter, more reliable tool call arguments.
Practice reasoning through agent creation decisions and Playbook structure. The AI tutor will present scenarios where you must decide on region, model, and Playbook instruction quality. You'll also rewrite vague instructions into well-formed Playbook steps.
In January 2024, the engineering team at Replit documented a production incident in their internal post-mortem notes (shared in a technical blog post in March 2024): their Vertex AI-powered coding assistant was calling a file_write tool correctly in 94% of test cases but failing silently in 6% of production calls. The root cause was not the model, not the tool implementation, and not the Playbook. It was the tool schema. The file_path parameter was described as "the path to write to" — a description that allowed the model to occasionally pass relative paths where the tool implementation expected absolute paths. A one-sentence schema change — "the absolute POSIX file path, e.g. /workspace/src/main.py — must begin with a forward slash" — reduced the failure rate to under 0.3% within two days.
When you add a tool to a Vertex AI agent using the Function Declarations type, you are providing the model with a structured description of a callable function. The model uses this description — not the function's source code — to decide when to call the tool, which arguments to pass, and how to interpret the response. The function itself runs in your infrastructure (a Cloud Function, a backend service, whatever you deploy); the model only ever sees the schema.
A Function Declaration consists of four elements: a name (lowercase with underscores, no spaces — this is sent verbatim to the model), a description (a natural-language explanation of what the function does and when to use it), a parameters block (a JSON Schema object describing every parameter), and a required array listing which parameters must always be provided.
The description field is not documentation for humans — it is a prompt fragment that the model reads during every tool selection decision. It should be written with that in mind: specific, unambiguous, with explicit guidance on when to use this tool versus alternative tools.
Every parameter description should specify: data type, expected format or range, what happens at edge cases, and at least one concrete example value. "the date" is a bad description. "the query date in ISO 8601 format, e.g. 2024-03-15 — do not include a time component" is a good one.
In the Vertex AI Python SDK, a Function Declaration is created using the vertexai.generative_models.FunctionDeclaration class. The parameters block uses a subset of JSON Schema — specifically the properties that Vertex AI's model parser understands: type, description, enum, items (for arrays), and properties (for nested objects). The format and pattern keywords from full JSON Schema are not supported and will be silently ignored — a common source of confusion when porting schemas from other systems.
An example of a well-formed Function Declaration for a product search tool:
name: "search_products"
description: "Search the product catalog by keyword and optional category. Use this tool when the user asks about specific products, wants recommendations, or asks what is available. Do not use for order status queries — use get_order_status instead."
parameters: { type: OBJECT, properties: { query: { type: STRING, description: "search keywords, e.g. 'blue running shoes size 10'" }, category: { type: STRING, enum: ["footwear","apparel","accessories","electronics"], description: "product category filter — omit if user did not specify a category" } }, required: ["query"] }
Using enum arrays in your parameter schemas is one of the highest-leverage reliability improvements available in function calling. When a parameter has an enum, the model is constrained to select only from the listed values — it cannot hallucinate an out-of-bounds value for that parameter. In A/B testing published by the Vertex AI team in their February 2024 release notes for Agent Builder, adding enum constraints to categorical parameters reduced tool call argument errors by 31% compared to free-text string parameters with only a description.
The practical implication: any parameter with a finite, known set of valid values should be an enum. Status codes, category names, sort orders, units of measurement, language codes — all of these should be expressed as enums rather than described in text alone.
After the agent calls a tool, the tool's response is injected back into the model's context as a function response part — a structured message type distinct from user turns and model turns. The model then generates its next action based on this updated context. This is important for schema design: the model will attempt to extract information from your tool response using natural language reasoning. If your tool returns a dense JSON blob with abbreviated field names (qty, sku, avl), the model's interpretation will be less reliable than if you return human-readable field names (quantity_available, product_sku, availability_status).
The rule of thumb from production deployments: tool responses should be readable by a non-technical human. If a support engineer reviewing a conversation log couldn't understand the tool response without decoding, the model is probably struggling with it too.
Before deploying any tool: (1) Does every parameter have a description with at least one example value? (2) Are all categorical parameters expressed as enums? (3) Does the tool-level description say when NOT to use this tool (disambiguating from other tools)? (4) Are tool responses using readable field names? Four yes answers predict reliable tool use significantly better than prompt-tuning alone.
Practice writing precise Function Declaration schemas. The AI tutor will present you with vague or broken schemas to critique and improve, and will ask you to write schemas from scratch for described tool requirements. Apply the four-question schema quality checklist from Lesson 3.
When Verizon deployed a network configuration agent on Vertex AI Reasoning Engine in Q4 2023 — described in a Google Cloud Next 2024 session in April — the team's primary concern was not capability but operational continuity. Their agent called five internal APIs across four different teams, each with different SLAs. In the first week of production, two of those APIs returned unexpected error formats that the agent had never seen in testing. Because the agent ran on Reasoning Engine with Cloud Logging enabled, the team had complete call traces — every tool invocation, every model response, every error — queryable in BigQuery within seconds of occurrence. The recovery time from alert to root-cause identification was eleven minutes. On the previous generation system, which ran on self-managed Kubernetes, the equivalent debugging process had taken an average of four hours.
Deploying a custom agent to Reasoning Engine requires three things: a Python class with a specific interface, a requirements specification, and a call to reasoning_engines.ReasoningEngine.create(). The class must implement a query method that accepts a single string argument (or a dict for multi-parameter input) and returns a string or dict response. Any setup that should happen once — loading a LangChain agent, initializing a vector store connection, configuring tool definitions — goes in an __init__ or set_up method.
The deployment call wraps your class in a container, pushes it to Artifact Registry, and provisions a Cloud Run v2 endpoint. From the caller's perspective, the deployed agent is a single HTTPS endpoint that accepts your query and returns a response. All the orchestration — the model calls, tool dispatches, state management — happens inside Reasoning Engine.
Crucially, Reasoning Engine is stateless between sessions by default. Each call to the deployed endpoint is independent. If you need conversation memory (the agent should remember what was said earlier in the same conversation), you must manage state explicitly — either by passing conversation history in each request, or by using a Vertex AI session ID to retrieve history from Firestore (the pattern documented in the Agent Builder developer guide as of February 2024).
The first deployment of a new Reasoning Engine instance typically takes 8–15 minutes. This is the container build and provisioning time, not a failure. Subsequent deployments to the same instance (updates) take 3–5 minutes. Plan for this in your CI/CD pipeline — a deployment that appears to hang for 10 minutes is almost always normal.
There are two distinct testing surfaces for Vertex AI agents: the Agent Builder Test Console (the built-in simulator in the Cloud Console) and programmatic evaluation via the Vertex AI SDK. Each has different purposes and different blind spots.
The Test Console is excellent for rapid iteration on Playbook instructions and sanity-checking tool call behavior. It shows every tool invocation and response inline, which is invaluable during schema design. Its limitation: it runs in Google's test infrastructure, not your deployment — it does not simulate your actual deployed endpoint, and it does not capture the latency or error rates you will see in production.
Programmatic evaluation using vertexai.evaluation lets you run a prepared dataset of input-output pairs against your agent and score them on metrics like tool_call_quality, response_groundedness, and task_success. This is covered in detail in Module 3 of this course. For Module 1, the minimum viable testing pattern is: write ten representative test cases covering your agent's primary use cases, run them against the deployed endpoint, and manually review every tool call trace.
Every Reasoning Engine deployment automatically emits structured logs to Cloud Logging under the resource type aiplatform.googleapis.com/ReasoningEngine. The log entries include: request payload, every tool call dispatched (with arguments), every tool response received, the final model response, and total latency broken down by model call and tool execution time.
To query these logs in BigQuery for analysis (the pattern used by the Verizon team), you need to create a log sink: gcloud logging sinks create reasoning-engine-sink bigquery.googleapis.com/projects/{PROJECT_ID}/datasets/{DATASET_ID} --log-filter="resource.type=aiplatform.googleapis.com/ReasoningEngine". This is a one-time setup that gives you a queryable history of every agent invocation — essential for debugging and for computing aggregate reliability metrics over time.
Before directing any real user traffic to a newly deployed Reasoning Engine agent, the following five checks should be complete:
1. Cloud Logging sink is active and the first test invocations are visible in BigQuery or the Logs Explorer. If you can't see logs, debugging in production will be nearly impossible.
2. All tool endpoints are behind retry logic with appropriate backoff. Reasoning Engine does not retry failed tool calls automatically — your tool implementation or your orchestration layer must handle this.
3. A fallback response is configured for cases where all tool calls fail. An agent that returns a blank response when its tools are unavailable is worse than a simple error message.
4. Response latency has been measured under realistic load — at minimum, ten sequential calls with production-representative inputs. P95 latency above 8 seconds typically indicates a tool call bottleneck that needs addressing before public exposure.
5. The agent has been tested with adversarial inputs — requests designed to trigger tool misuse, off-topic responses, or instruction-following failures. The built-in Test Console is adequate for this; you do not need a formal red-teaming process at this stage.
You now understand the four-layer Vertex AI Agent Builder architecture, the decisions that matter at agent creation time, how Function Declaration schemas drive tool reliability, and the Reasoning Engine deployment model. Module 2 will go deeper into multi-turn conversation management, session state, and the patterns for building agents that maintain coherent context across long interactions.
Work through deployment architecture decisions and operational readiness scenarios. The AI tutor will present deployment challenges — state management decisions, logging setup, pre-launch checklists — and ask you to reason through the options. Apply the five-point first-day checklist from Lesson 4.