Module 4 · Lesson 1

What Function Calling Actually Does

How Vertex AI agents translate natural language into structured API calls — and why that changes everything.

When a model decides to call a function, what is it really doing — and what is it not doing?

In May 2023, Google announced function calling support in the Gemini API. The capability was not new — OpenAI had released it in June 2023 with GPT-4, and earlier research showed similar patterns — but the framing revealed something important. The model does not execute the function. It returns a structured object describing which function to call and with what arguments. Your application code does the actual work and returns the result.

This asymmetry is the entire foundation of safe, controllable tool use. The model reasons about intent; your infrastructure manages execution.

The Mechanics: What Happens at Each Step

Function calling in Vertex AI follows a deterministic loop. You provide a set of function declarations alongside your prompt. These declarations are JSON Schema objects describing each tool's name, purpose, parameters, and parameter types. The model reads them as part of its context and decides — based on the user's message — whether a function call is appropriate.

If the model determines a function is needed, it returns a function call response instead of a text answer. This response contains the function name and a JSON object of arguments. Your code then executes the actual function, collects the result, and passes it back to the model as a function response in the next turn. The model uses that result to produce a final natural language answer.

The loop: user message → model requests function → your code runs function → result returned to model → model answers user. Nothing executes without your code in the middle.

# Minimal function declaration for Vertex AI (Python SDK)
from vertexai.generative_models import FunctionDeclaration, Tool

get_product_price = FunctionDeclaration(
    name="get_product_price",
    description="Returns current price and inventory for a product SKU",
    parameters={
        "type": "object",
        "properties": {
            "sku": {
                "type": "string",
                "description": "The product SKU identifier"
            }
        },
        "required": ["sku"]
    }
)

tool = Tool(function_declarations=[get_product_price])

Why Description Quality Determines Everything

The model selects which function to call — and what arguments to supply — based almost entirely on the description fields you write. A vague description like "gets product info" will produce inconsistent routing. A precise description like "Returns current list price in USD and available inventory count for a single product, identified by its 12-character SKU" gives the model enough signal to route correctly even in ambiguous situations.

Parameter descriptions matter equally. Specifying "The ISO 4217 three-letter currency code, e.g. USD, EUR, GBP" prevents the model from passing "dollars" when your API expects "USD". Every token in your function declaration is part of the model's decision context.

Critical Distinction

The Gemini model never touches your database, API key, or external service. It produces a structured intent. Your application code is the only thing that executes real actions. This design is intentional — it gives you complete control over authorization, validation, rate limiting, and error handling before anything external is called.

Function Calling Modes in Vertex AI

Vertex AI exposes three function calling modes through the tool_config parameter. AUTO (the default) lets the model decide whether to call a function or respond with text. ANY forces the model to always call one of the provided functions — useful when you need guaranteed structured output. NONE disables function calling entirely even if tools are declared, useful for testing baseline text responses.

The ANY mode with allowed_function_names is particularly powerful: you can constrain the model to call exactly one specific function, effectively using function calling as a structured extraction mechanism rather than autonomous tool selection.

AUTO Mode

Model decides based on context. May respond with text or call a function. Appropriate for general-purpose agents where either response type is valid.

ANY Mode

Model must call one of the declared functions. Use for structured data extraction, form filling, or when you require machine-readable output every time.

Parallel and Sequential Calls

Gemini 1.5 and later models support parallel function calling — the model can return multiple function call objects in a single response when it determines that several independent calls are needed. For example, if a user asks for a weather report and stock price simultaneously, the model may request both in one turn rather than sequentially. Your application handles each call in whatever order suits your infrastructure, then returns all results before the model generates its answer.

This capability significantly reduces latency in multi-tool agents. Google's internal benchmarks for Gemini 1.5 Pro showed that parallel function calling reduced average round-trip time by roughly 40% on tasks requiring three or more independent tool calls — compared to strictly sequential architectures.

Design Principle

Each function declaration consumes tokens in the model's context window. Keep declarations concise but complete. In production agents with large tool libraries (50+ functions), Google recommends using Vertex AI Extensions or dynamic tool selection to avoid exhausting context on unused declarations.

Lesson 1 Quiz

Function calling mechanics — 3 questions

When a Gemini model in Vertex AI "calls a function," what does it actually return?

Correct. The model produces a function call intent — name plus arguments — as a structured object. Your application code is responsible for executing the actual function and returning results to the model.

Not quite. The model never executes anything directly. It returns a structured intent (function name + arguments) and your application code does the actual execution before returning results back to the model.

Which Vertex AI function calling mode forces the model to always invoke one of the declared functions, never returning plain text?

Correct. ANY mode requires the model to call one of the declared functions on every turn. It can be further constrained with allowed_function_names to target a specific function, making it useful for guaranteed structured output.

Not quite. AUTO lets the model choose between text and function calls. NONE disables function calling. ANY is the mode that forces a function call on every response.

What does the description field of a FunctionDeclaration primarily influence?

Correct. The description is the model's primary signal for routing decisions. Poor descriptions lead to incorrect function selection or malformed arguments. This makes writing precise, unambiguous descriptions one of the most impactful skills in tool-augmented agent design.

Incorrect. Descriptions are consumed as part of the model's context and directly drive its routing decisions — which function to call and how to populate its arguments. Latency, authentication, and error handling are application-layer concerns.

Lab 1: Designing Function Declarations

Practice writing function declarations that produce reliable routing decisions

Your Task

You're building an e-commerce agent that needs to call three backend services: inventory lookup, order status, and product recommendations. Your challenge is to write function declarations whose descriptions are precise enough to prevent misrouting.

Work with the AI instructor to design declarations, test edge cases, and understand what makes descriptions reliable vs. ambiguous.

Start by describing one of the three functions you need to declare. Explain what it does, what parameters it takes, and what it returns. Then we'll work on making the description production-ready.

AI Lab Instructor

Function Declaration Design

Welcome to Lab 1. We're going to practice writing FunctionDeclaration objects that produce reliable tool routing in Vertex AI agents.

Let's build an e-commerce agent together. You need three tools: inventory lookup, order status, and product recommendations. Start by describing what one of these functions should do — what it accepts, what it returns, and when an agent should call it. Don't worry about JSON syntax yet — just describe the intent clearly.

Module 4 · Lesson 2

Building Multi-Tool Agents on Vertex AI

Orchestrating multiple APIs, handling tool errors gracefully, and keeping agents from going off the rails.

When your agent has ten tools and the model picks the wrong one — or calls the right one with bad arguments — what happens next?

At Google Cloud Next 2024, several enterprise customers presented production Vertex AI agents running with 20–40 declared tools. The most cited failure pattern was not hallucination — it was argument drift: the model correctly identified the right function but passed slightly wrong parameter formats, causing API errors that cascaded into unhelpful user responses. The fix in most cases was not model-side — it was stricter JSON Schema validation in the function declarations and explicit enum lists for constrained fields.

Structuring Tool Libraries for Large Agents

When an agent needs more than a handful of tools, the organization of your Tool objects becomes architecturally significant. Vertex AI allows passing multiple Tool objects in a single request, each containing up to 64 FunctionDeclarations. Grouping related functions within the same Tool object helps the model reason about toolsets as coherent capability clusters — e.g., all financial operations in one Tool, all customer data operations in another.

Keep in mind that every declaration in every Tool consumes tokens. Google recommends auditing your tool library regularly: if a function has never been called in 10,000 production turns, its description may be too similar to another function's, or the use case may not actually arise in your user base. Remove or merge it.

Error Handling in the Function Call Loop

Your function execution code will fail. APIs time out, schemas change, permissions expire. The question is what you pass back to the model when execution fails. Returning a bare null or empty string produces vague model responses. Returning a structured error object — with an error code, a human-readable message, and ideally a suggested recovery action — gives the model enough context to respond helpfully.

# Returning structured errors to the model
def execute_tool_call(function_name, args):
    try:
        result = dispatch_function(function_name, args)
        return {"status": "success", "data": result}
    except APITimeoutError:
        return {
            "status": "error",
            "error_code": "TIMEOUT",
            "message": "Service temporarily unavailable",
            "retry_suggested": True
        }
    except ValidationError as e:
        return {
            "status": "error",
            "error_code": "INVALID_ARGS",
            "message": str(e),
            "retry_suggested": False
        }

Preventing Runaway Function Call Loops

A model in AUTO mode can enter a loop: it calls a function, gets a result, decides it needs another function call, gets another result, and so on indefinitely. Without guardrails, this consumes both tokens and money. Production Vertex AI agents should always implement a max_tool_calls counter in the orchestration loop — most teams set this between 5 and 15 depending on task complexity.

When the limit is reached, force the model to respond in text mode by setting tool_config to NONE for the final turn. This ensures users receive a response even if the agent couldn't fully complete the task.

Production Pattern

Shopify's internal Vertex AI agents (described in their 2024 engineering blog) implemented a "tool budget" per conversation turn — maximum 8 function calls before forced text response. Combined with structured error returns, this reduced infinite-loop incidents by over 90% compared to early deployments without turn limits.

Argument Validation Before Execution

Never trust model-generated arguments directly. Even with excellent function declarations, models occasionally produce arguments that pass JSON Schema validation but fail at the semantic level — e.g., a date range where end_date precedes start_date, or a quantity that's negative. Your execution layer should perform semantic validation before calling external APIs.

Use enum fields liberally in your JSON Schema. If a parameter accepts only three valid values, declare them as an enum. The model will almost always respect enums, dramatically reducing the argument drift problem documented in production deployments.

Before Execution

Validate argument types, ranges, enums, and semantic constraints. Log the raw model-generated call object. Set a per-turn tool call budget. Check authorization for the requested action.

After Execution

Return structured success or error objects — never null. Log the result. Check if the model is looping by comparing recent function call history. Enforce the tool call budget limit.

Lesson 2 Quiz

Multi-tool orchestration and error handling — 3 questions

What is "argument drift" in production Vertex AI agents, as observed at Google Cloud Next 2024?

Correct. Argument drift describes the model correctly identifying which tool to use but generating parameter values in slightly wrong formats — causing API failures. The primary fix is stricter JSON Schema validation including enum constraints in the function declarations.

Not quite. Argument drift refers specifically to correct function selection but malformed arguments — the most common production failure pattern observed in large multi-tool agent deployments. Use enums and strict schemas to mitigate it.

What should you return to the model when a function call fails due to an API timeout?

Correct. Returning a structured error object with error_code, message, and retry_suggested gives the model enough context to respond helpfully — either by retrying, informing the user gracefully, or attempting an alternative tool.

Incorrect. Returning null or empty produces vague model responses. Structured error objects — with error codes and recovery hints — allow the model to reason about failure and communicate clearly with users.

How do production teams typically prevent infinite function-call loops in Vertex AI agents?

Correct. A tool budget (e.g., max 8 calls per turn) combined with switching tool_config to NONE when the budget is exhausted ensures the agent always produces a user-facing response — even if the task couldn't be fully completed.

Incorrect. The standard pattern is a max_tool_calls counter in your orchestration loop. Once the budget is consumed, tool_config is set to NONE for the final model call, forcing a text response. Shopify and similar teams use 5–15 as their typical budget.

Lab 2: Orchestrating Multiple Tools

Design error handling and tool budgets for a multi-function agent

Your Task

You're debugging a customer service agent that has 12 declared tools and is occasionally looping — calling functions repeatedly without producing a response. You need to design the orchestration logic to prevent this.

Work through the error handling strategy, tool budget design, and structured error return format with the AI instructor.

Start by describing what your current orchestration loop looks like — how does your code currently handle the model's function call responses? Then we'll identify the gaps.

AI Lab Instructor

Multi-Tool Orchestration

Welcome to Lab 2. You're debugging a customer service agent with 12 tools that's looping without producing responses.

Let's fix this systematically. First, describe your current orchestration loop — how does your code process a model response that contains a function call? Walk me through what happens step by step, even if you think it's incomplete. That's where we'll find the problem.

Module 4 · Lesson 3

Connecting to Real External APIs

Authentication, rate limiting, data transformation, and the practical plumbing that makes tool use production-grade.

Your agent can call any external API — but how do you handle auth tokens, rate limits, and schema mismatches without breaking the model's reasoning loop?

When Duet AI (now Gemini for Google Workspace) was extended with external API connectivity in late 2023, Google's engineering team published a technical post describing the "translation layer" problem: real-world APIs return data in shapes the model hasn't been trained to reason about efficiently. A weather API might return 47 fields; the model only needs 4 of them to answer the user's question. Without a transformation step, you're wasting context tokens on noise and potentially confusing the model with irrelevant data.

The solution — truncate and transform API responses before returning them to the model — became a standard pattern in Vertex AI agent architectures.

Authentication Without Exposing Secrets

The cardinal rule: API keys and OAuth tokens must never appear in function declarations, system prompts, or any content visible to the model. The model's context window is logged and may be subject to various monitoring systems. Store credentials in Secret Manager on Google Cloud and access them in your execution layer — the code that runs between the model's function call response and the actual API request.

For Google Cloud APIs called from a Vertex AI agent, use Application Default Credentials via a service account. Assign the minimum IAM roles required. For third-party APIs (Stripe, Salesforce, Slack), store tokens in Secret Manager and retrieve them at execution time. Never interpolate tokens into the function declaration schema itself.

# Correct pattern: credentials retrieved in execution layer
from google.cloud import secretmanager
import requests

def call_stripe_api(customer_id: str) -> dict:
    # Credentials never in function declaration — fetched here
    client = secretmanager.SecretManagerServiceClient()
    name = "projects/my-project/secrets/stripe-key/versions/latest"
    secret = client.access_secret_version(request={"name": name})
    api_key = secret.payload.data.decode("UTF-8")
    
    response = requests.get(
        f"https://api.stripe.com/v1/customers/{customer_id}",
        auth=(api_key, "")
    )
    return _transform_stripe_response(response.json())

Response Transformation: Returning Only What the Model Needs

Most external APIs return far more data than the model needs. A REST endpoint for a customer record might return 80+ fields. Your transformation function should extract the 5–10 fields relevant to the agent's task and return a clean, flat JSON object. This has three benefits: it reduces token consumption, it prevents the model from latching onto irrelevant fields, and it protects sensitive data (PII, internal IDs, financial details) from entering the model's context.

Define your transformation functions as part of the same module as your function declarations — it forces you to think about input/output contracts explicitly when writing descriptions.

Handling Rate Limits and Retries

External APIs rate-limit requests. Your execution layer needs exponential backoff with jitter — not simple fixed-interval retries — to avoid thundering herd problems in multi-user agent deployments. For Vertex AI agents specifically, use the tenacity library or Cloud Tasks for retry orchestration rather than synchronous blocking retries, which degrade user experience and consume model turn budgets.

Return a structured rate_limited error with an estimated retry_after_seconds field when rate limits are hit. The model can then inform the user of the delay rather than silently failing.

Security Boundary

In January 2024, a well-documented prompt injection attack against a commercial agent demonstrated that malicious content from an external API response could instruct the agent to call additional functions with attacker-controlled arguments. Always sanitize API responses before returning them to the model — strip markup, limit string lengths, and validate that response content matches expected schema types.

Caching API Responses for Agent Efficiency

Many agent function calls within a single conversation retrieve the same data — user profile, account balance, product catalog. Implement an in-conversation cache keyed on function name plus argument hash. On Vertex AI, the recommended pattern uses a simple Python dict within the agent session scope, cleared at conversation end. For cross-session caching of slow/expensive lookups, use Cloud Memorystore (Redis).

Agentic tasks at Wayfair (documented in their 2024 ML engineering blog) found that simple in-session caching of product lookup calls reduced API spending by 34% in their customer service agent — because the same SKU was often queried 3–5 times within a single complex order-assistance conversation.

What to Cache

Reference data (product info, user profiles), slow external lookups, any data that doesn't change during the conversation scope. Cache key: function name + sorted argument hash.

What Not to Cache

Live inventory, real-time pricing, transaction results, anything that must reflect the current state of the world. Stale cache misses here cause incorrect agent responses.

Timeout Strategy

Set aggressive timeouts on external API calls — 2–5 seconds for synchronous function calls within an agent turn. Users expect conversational-speed responses; a 30-second API call destroys that experience. If a needed API is reliably slow, move its invocation to an asynchronous pattern using Vertex AI Extensions with background task support, or restructure the agent to use a "processing" acknowledgment + webhook pattern.

Lesson 3 Quiz

External API integration patterns — 3 questions

Where should API keys for external services be stored in a production Vertex AI agent architecture?

Correct. Secret Manager is the recommended storage for API credentials. The execution layer (your function dispatch code) retrieves them at runtime. Credentials must never appear in model-visible content — function declarations, prompts, or API responses returned to the model.

Incorrect. Credentials must never appear in function declarations or system prompts. Environment variables are better than inline credentials but Cloud Secret Manager provides audit logging, rotation, and access control that production systems require.

Why should you transform external API responses before returning them to the model?

Correct. Transformation serves three purposes: token efficiency (stripping the 40+ irrelevant fields a typical API returns), reasoning quality (the model won't latch onto noise), and data protection (PII and sensitive fields never enter the model's context window).

Incorrect. The benefits of response transformation are: token efficiency, improved model reasoning (by removing irrelevant signal), and data protection. Returning raw API payloads wastes tokens and can expose sensitive data in the model's context.

According to Wayfair's 2024 ML engineering findings, simple in-session caching of product lookups reduced their API spending by approximately how much?

Correct. Wayfair's customer service agent queried the same product SKU 3–5 times within complex order-assistance conversations. In-session caching keyed on function name and argument hash eliminated these redundant calls, cutting API costs by 34%.

Incorrect. Wayfair documented approximately 34% API cost reduction from in-session caching. The savings were large because agents naturally revisit the same data multiple times — product lookups in particular recur throughout complex order conversations.

Lab 3: Secure API Integration

Design a transformation layer, caching strategy, and secure credential pattern

Your Task

You're integrating a Salesforce CRM API into a Vertex AI agent. The Salesforce contact endpoint returns 60+ fields. You need to design: (1) a response transformation that returns only what the model needs, (2) a caching strategy for in-conversation re-queries, and (3) confirm your credential storage approach.

Work through the design decisions with the AI instructor. You'll be challenged on edge cases — what if the user asks for a field you're not returning? What if the cache is stale?

Start by listing the 5–8 fields from a Salesforce contact record that a customer service agent would actually need to answer typical support questions. Justify each inclusion.

AI Lab Instructor

Secure API Integration

Welcome to Lab 3. You're connecting a Vertex AI agent to Salesforce CRM — a real-world integration with real security and efficiency challenges.

A Salesforce Contact object has 60+ fields: name, email, phone, account, opportunity history, lead source, custom fields, timestamps, and more. Your transformation function must decide what to return to the model.

Start by listing 5–8 fields you'd include for a customer service agent. For each one, give me a one-line justification. Then we'll pressure-test your choices.

Module 4 · Lesson 4

Vertex AI Extensions and Built-In Tool Integrations

Google Search grounding, Code Interpreter, and the Extensions framework for managing large tool libraries at scale.

When custom function calling isn't enough — or when you need Google-managed tools your own infrastructure can't replicate — what does Vertex AI provide?

At Google I/O 2024, Google announced the general availability of Grounding with Google Search for Vertex AI. The feature routes the model's information needs to live Google Search results, then cites sources in the response. For enterprise deployments in legal, finance, and healthcare, this addressed a critical gap: agents that need current, verifiable information rather than model knowledge with an arbitrary training cutoff date.

The announcement coincided with the release of Vertex AI Extensions — a framework for registering, versioning, and serving custom tools to agents without managing function declarations manually in application code.

Grounding with Google Search

Enabling Google Search grounding in Vertex AI is a one-line change to your generation config. When grounding is active, the model can retrieve and cite live search results for queries that require current information. The response includes a grounding_metadata field containing the search queries issued and the source URLs used.

Search grounding is not free — it incurs separate per-query pricing. At scale, implement grounding selectively: use a classifier or keyword filter to identify queries that require current information (news, prices, regulatory changes) versus queries answerable from model knowledge alone.

# Enabling Google Search grounding in Vertex AI
from vertexai.generative_models import (
    GenerativeModel, Tool, grounding
)

model = GenerativeModel("gemini-1.5-pro")

# Google Search as a built-in tool
google_search_tool = Tool.from_google_search_retrieval(
    grounding.GoogleSearchRetrieval(
        dynamic_retrieval_config=grounding.DynamicRetrievalConfig(
            dynamic_threshold=0.7  # Only ground when confidence < 70%
        )
    )
)

response = model.generate_content(
    "What is the current Fed funds rate?",
    tools=[google_search_tool]
)

Code Interpreter Tool

Vertex AI's built-in Code Interpreter tool allows the model to write and execute Python code in a sandboxed environment. The model generates code, the tool executes it, and the output is returned to the model for incorporation into its response. This is particularly powerful for data analysis, mathematical computations, chart generation, and any task requiring deterministic computation rather than language model estimation.

Code Interpreter runs in an isolated execution environment — it cannot access the internet, your filesystem, or external APIs. This constraint is a security feature. If you need the model to perform computation on data retrieved from your APIs, retrieve the data with a custom function call, then pass it to Code Interpreter for analysis.

Real Use Case

Deutsche Bank's Vertex AI-powered financial analyst agent (described in a Google Cloud case study, Q1 2024) uses Code Interpreter to perform portfolio calculations after retrieving position data via custom function calls to their internal trading systems. The pattern: function call retrieves raw data → Code Interpreter performs computation → model formats and explains the result. This avoids floating-point errors that would occur if the model performed arithmetic in natural language generation.

Vertex AI Extensions Framework

The Extensions framework provides a managed way to register and version tool definitions centrally, rather than embedding FunctionDeclarations in application code. An Extension is a registered resource in your Vertex AI project — you define it once with an OpenAPI spec, and multiple agents can reference it by resource name without duplicating declaration code.

Extensions support authentication configs that handle OAuth 2.0, API key, and service account auth transparently — your agent code never touches credentials. Google manages the token refresh lifecycle. This is particularly valuable for enterprise deployments with dozens of tool integrations maintained by different teams.

When to Use Extensions

Tool library exceeds 15 declarations. Multiple agents share the same tools. Different teams own different integrations. You need centralized versioning and rollback. Auth management should be handled by infrastructure, not application code.

When to Use Inline FunctionDeclarations

Fewer than 10 tools. Single agent, single team. Rapid prototyping. Tool definitions change frequently during development. You need fine-grained control over which tools appear in which conversation contexts.

Combining Built-In and Custom Tools

A single Vertex AI agent can use Google Search grounding, Code Interpreter, and custom FunctionDeclarations simultaneously. Pass all tools in a single list to the tools parameter. The model will select among them based on the task at hand — searching for current information, computing against retrieved data, or calling your custom APIs as needed.

The practical constraint is context length: each active tool declaration consumes tokens. Measure the token overhead of your tool set with count_tokens() before production deployment. If overhead exceeds 10% of your context budget for typical queries, prune the tool list or adopt the Extensions framework with dynamic tool selection.

Architecture Checkpoint

Before finishing this module, map your planned agent's tool set across four categories: (1) built-in Google tools (Search, Code Interpreter), (2) Google Cloud service integrations, (3) third-party API function declarations, (4) internal API function declarations. Each category has different auth, versioning, and error-handling patterns. Treating them uniformly is a common source of production incidents.

Lesson 4 Quiz

Extensions, grounding, and built-in tools — 3 questions

What does the dynamic_threshold parameter in Google Search grounding configuration control?

Correct. dynamic_threshold (e.g., 0.7) means "only perform a live search if the model's confidence in its own knowledge is below 70%." This enables cost-efficient selective grounding rather than searching on every query.

Incorrect. dynamic_threshold controls when grounding is triggered — specifically, the model confidence level below which a live Google Search is performed. Setting it to 0.7 means search is only triggered when the model is less than 70% confident in its training knowledge.

What is the key security constraint of Vertex AI's built-in Code Interpreter tool?

Correct. Code Interpreter's sandbox isolation is a deliberate security feature. To analyze external data, the pattern is: retrieve data via custom function call → pass to Code Interpreter for computation. The two tools are used in sequence, not simultaneously.

Incorrect. Code Interpreter runs in a fully isolated sandbox — no internet, no filesystem, no external API access. This is intentional. If you need computation on external data, retrieve it first with a custom function call, then pass the data into the code execution context.

According to the lesson, when should you prefer Vertex AI Extensions over inline FunctionDeclarations?

Correct. Extensions shine in multi-agent, multi-team environments with shared tool libraries. For single agents with fewer than 10 tools under rapid development, inline FunctionDeclarations are simpler and provide more direct control over per-conversation context.

Incorrect. Extensions are recommended when: tool libraries exceed ~15 declarations, multiple agents share tools, different teams own different integrations, or you need centralized versioning. For small, single-agent, single-team deployments, inline FunctionDeclarations are simpler and more flexible.

Lab 4: Designing a Mixed Tool Architecture

Combine Google Search grounding, Code Interpreter, and custom APIs in a single agent

Your Task

You're designing a financial research agent for an investment firm. It needs to: (1) search for current market news, (2) retrieve portfolio positions from an internal API, (3) perform quantitative analysis on the retrieved data, and (4) summarize findings. This requires all three tool types — Search grounding, custom function calls, and Code Interpreter.

Work with the AI instructor to design the tool selection logic, token budget, and sequencing for a typical research query.

Start by describing a sample user query this agent would handle — something specific enough to trace through all four steps. Then we'll map which tool handles which part.

AI Lab Instructor

Mixed Tool Architecture

Welcome to Lab 4. You're designing a financial research agent that combines three tool types: Google Search grounding (for current news), custom function calls (for internal portfolio data), and Code Interpreter (for quantitative analysis).

Give me a specific example query this agent would handle — the kind a portfolio manager or analyst would actually ask. Something realistic enough that we can trace exactly which tool fires at each step and in what order. The more specific, the better.

Module 4 Test

15 questions across all four lessons — 80% required to pass

1. What does a Gemini model actually output when it "calls a function"?

Correct. The model outputs a structured function call intent — name and arguments. Your application code executes the actual function.

Incorrect. The model outputs a structured intent (name + arguments). Your application code does the actual execution.

2. Which function calling mode forces the model to always invoke a declared function?

Correct. ANY mode mandates a function call on every model response turn.

Incorrect. ANY mode forces function invocation. AUTO allows text or function. NONE disables tools entirely.

3. What is the primary driver of correct function selection in a multi-tool agent?

Correct. Description quality is the dominant factor in function routing accuracy. Vague descriptions produce misrouting even when function names are distinct.

Incorrect. Description quality is the primary routing signal. Well-written descriptions consistently outperform poorly described functions regardless of order or naming.

4. Gemini 1.5 and later models support parallel function calling. What is the primary benefit?

Correct. Parallel function calling batches multiple independent tool requests in one model response, reducing round-trips and cutting latency by ~40% on multi-tool tasks per Google's benchmarks.

Incorrect. The benefit is latency reduction — multiple functions requested in one turn means fewer round-trips through the model.

5. What is "argument drift" as observed in production Vertex AI deployments?

Correct. Argument drift: right function, wrong format. Use enum constraints and strict JSON Schema to mitigate.

Incorrect. Argument drift specifically means the model picks the correct function but generates parameters in slightly wrong formats — the most common production failure pattern in large tool libraries.

6. What is the recommended maximum function calls per turn that Shopify's Vertex AI agents use?

Correct. Shopify documented using a tool budget of 8 calls per conversation turn, after which tool_config is set to NONE to force a text response.

Incorrect. Shopify used a budget of 8 function calls per turn. When exhausted, tool_config switches to NONE to guarantee a user-facing response.

7. When a function call fails with an API timeout, what should your execution layer return to the model?

Correct. Structured error objects with error codes and recovery hints allow the model to reason about failure and communicate clearly with users.

Incorrect. Structured error objects — not empty strings, stale data, or tracebacks — give the model actionable context for producing a helpful response despite the failure.

8. Where must API keys for external services be stored in production Vertex AI agents?

Correct. Secret Manager provides audit logging, rotation, and access control. Credentials must never appear in model-visible context.

Incorrect. Credentials belong in Secret Manager, never in prompts, declarations, or container images. The execution layer retrieves them at runtime.

9. Why should external API responses be transformed before being returned to the model?

Correct. Transformation serves three goals: efficiency (fewer tokens), quality (less irrelevant noise), and security (sensitive fields never enter model context).

Incorrect. Transformation is about efficiency, reasoning quality, and data protection — not format conversion or API limitations.

10. What approach did Wayfair use to reduce API spending by 34% in their customer service agent?

Correct. Same product SKUs were queried 3–5 times per complex conversation. In-session caching eliminated redundant API calls for a 34% cost reduction.

Incorrect. Wayfair achieved savings through in-session result caching — the same product data was being retrieved multiple times per conversation. Caching with function name + argument hash as the key eliminated redundant calls.

11. What happened to agents attacked via prompt injection in January 2024 as described in the lesson?

Correct. This attack vector — injecting instructions into API response content — demonstrates why sanitizing external data before returning it to the model is a security requirement, not just an optimization.

Incorrect. The January 2024 attack exploited unvalidated API response content to inject function-calling instructions, causing the agent to make attacker-directed tool calls. Always sanitize API responses before returning them to the model.

12. What does dynamic_threshold in Google Search grounding configuration control?

Correct. Setting dynamic_threshold to 0.7 means search only fires when model confidence is below 70%, enabling selective grounding to control costs.

Incorrect. dynamic_threshold is the model confidence threshold — below it, live search fires. Above it, the model uses its own knowledge. This enables cost-efficient selective grounding.

13. What is the key security constraint of Vertex AI's Code Interpreter built-in tool?

Correct. Code Interpreter runs in complete isolation. To use it with external data, retrieve data via custom function call first, then pass it to Code Interpreter for computation.

Incorrect. The isolation constraint — no internet, no filesystem, no external APIs — is the primary security feature. Use custom function calls to retrieve external data, then pass it into Code Interpreter.

14. According to Deutsche Bank's Vertex AI case study, why use Code Interpreter for portfolio calculations rather than having the model compute in natural language?

Correct. LLMs performing arithmetic in natural language generation are prone to floating-point errors and numerical hallucinations. Code Interpreter provides deterministic, accurate computation.

Incorrect. The reason is arithmetic accuracy — LLMs can produce numerical errors during generation. Code Interpreter executes Python, which performs deterministic floating-point arithmetic.

15. When should you prefer Vertex AI Extensions over inline FunctionDeclarations?

Correct. Extensions solve the multi-agent, multi-team tool management problem. For small single-agent projects, inline declarations are simpler and more flexible.

Incorrect. Extensions are recommended for large, shared, multi-team tool libraries. For single agents with fewer than ~10 tools, inline FunctionDeclarations offer simpler, more direct control.