Agentic Data Workflows on Google Cloud · Introduction

The Machine That Forgets Where It Left Off

Every powerful AI agent is only as useful as the data it can reach — this course is about closing that gap on Google Cloud.

In 1945, Vannevar Bush published "As We May Think" in The Atlantic, imagining the Memex — a desk-sized device that would store every book, record, and communication a person needed, linked by associative trails rather than indexes. Bush understood something the engineers of that era did not yet: the bottleneck was never compute. It was retrieval. An instrument that could calculate but could not recall was a narrow tool. The same pattern surfaced in 1962 when Douglas Engelbart at SRI began building NLS — the oN-Line System — specifically because he believed augmenting human intellect depended on giving people frictionless access to the right information at the right moment.

Today's large language models arrived under nearly identical conditions. GPT-3 launched in June 2020 with remarkable fluency and a hard cutoff — it knew nothing after its training window closed, nothing proprietary, nothing live. Google's own researchers published the Retrieval-Augmented Generation paper in May 2020 not as an academic curiosity but as an engineering acknowledgment: the model alone is insufficient. By 2023, every serious enterprise deployment of a language model had discovered the same constraint Bush named in 1945. The bottleneck is retrieval.

This course is about building AI agents on Google Cloud that actually know things — your things, current things, structured and unstructured things — by connecting them properly to BigQuery, Cloud Storage, AlloyDB, Vertex AI Search, and the retrieval infrastructure that makes the difference between a demo and a production system. We will be specific about architecture, honest about limits, and grounded in what Google's documentation actually specifies as of mid-2025. You will finish knowing how data access decisions propagate through agent quality, latency, cost, and correctness.

If you finish every module, here's who you become:

You'll understand why retrieval architecture — not model capability — is the binding constraint on agent quality in production systems.
You'll be able to wire a Vertex AI agent to BigQuery, Cloud Storage, AlloyDB, and Vertex AI Search, choosing the right connector for the right data type.
You'll build a complete RAG pipeline: ingestion, chunking, embedding, retrieval, and grounded generation — end to end on Google Cloud.
You'll know how to federate cross-cloud data from S3 and Azure Data Lake into an agent workflow without migrating a single file.
You'll be able to measure what your agent actually gets right — retrieval quality, hallucination rates, and grounding coverage — and run systematic improvement loops.
You'll design event-driven agents that respond to live Pub/Sub streams, not just static snapshots.
You're becoming the engineer who closes the gap between a language model demo and a production system people can trust with real data.

Lesson 1 · Why Data Access Defines Agent Quality

What the Agent Doesn't Know Will Hurt You

The intelligence ceiling of any agent is set not by the model but by the data the model can reach.

If a language model scores 90% on a benchmark but only sees 40% of your relevant data, what is its real-world accuracy?

In late 2023, Morgan Stanley's wealth management division disclosed to journalists at Bloomberg that it had spent more than a year building an OpenAI-powered assistant for financial advisors. The system was impressive in demos. In production, advisors discovered it confidently cited research reports that existed in the firm's proprietary database but had not been ingested into the retrieval layer. The model was not hallucinating — it was blind. The assistant knew what OpenAI had trained it on; it did not know what Morgan Stanley's analysts had written last quarter. The retrieval architecture, not the model, was the failure point. The firm responded by rebuilding the indexing pipeline before expanding the rollout.

The Morgan Stanley case is not exceptional — it is representative. In 2024, Google's own Cloud Next conference presentations from enterprise customers repeatedly named data freshness and retrieval coverage as the top production blockers, ahead of model capability, ahead of cost, ahead of latency. The model is rarely the problem. The data pipeline is almost always the problem.

The Three Failure Modes of Data-Starved Agents

When an agent cannot access the data it needs, failure arrives in one of three forms. Understanding which failure is occurring determines which part of the architecture to fix.

Staleness: The agent's knowledge is frozen at a training cutoff or last-sync timestamp. It answers questions about current inventory, pricing, or policy using information that is months or years old. This is the most common failure in production systems that launched with a batch-ingestion pipeline and no refresh mechanism.

Coverage gaps: The agent has access to some of the relevant corpus but not all. It answers correctly for the data it can see, incorrectly or not at all for the data it cannot. Coverage gaps are insidious because they are invisible to users — the agent does not say "I don't have that document," it says something plausible that is wrong.

Retrieval imprecision: The data exists and is fresh, but the retrieval mechanism returns the wrong chunks. The model reasons over the wrong evidence and produces a confident, coherent, incorrect answer. This is a vector-index design problem or a chunking strategy problem, not a model problem.

Why This Matters for Vertex AI Agents

Vertex AI Agent Builder (formerly Dialogflow CX + Vertex AI Search, unified in 2024) exposes data connectors for BigQuery, Cloud Storage, Google Drive, and web crawl. Each connector has distinct latency, freshness, and coverage characteristics. Choosing the wrong connector for a use case is the single most common architectural mistake in enterprise Vertex AI agent deployments.

The Data-Quality Cascade

Data access quality cascades through every layer of agent output. Consider the chain: a user asks a question → the retrieval layer fetches candidate documents → the model synthesizes an answer from those documents → the answer is presented with apparent confidence. Each link in this chain multiplies or divides the quality of the final output.

Google's internal research team (DeepMind and Google Brain, now merged as Google DeepMind) published findings in 2023 showing that retrieval augmentation improved factual accuracy on closed-domain enterprise tasks by 38–61% compared to base model prompting — but only when retrieval recall exceeded 70%. Below 70% recall, retrieval augmentation could decrease accuracy because it introduced misleading partial context. The lesson: a retrieval layer that is half-built is sometimes worse than no retrieval layer at all.

This means the engineering task is not simply "add a vector store." It is to achieve sufficient coverage, freshness, and precision that the retrieval layer becomes a genuine amplifier rather than a noise source.

Key Concepts

Retrieval-Augmented Generation (RAG) Architecture pattern in which a retrieval system fetches relevant documents at inference time and injects them into the model's context window, first formally described by Lewis et al. at Facebook AI Research in the May 2020 paper "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks."

Retrieval recall The fraction of genuinely relevant documents that a retrieval query successfully returns. Low recall means the agent is reasoning over incomplete evidence even when the full evidence exists in the corpus.

Data freshness The elapsed time between when source data changes and when that change is reflected in the retrieval index. Vertex AI Search data connectors as of 2025 offer scheduled sync (hourly minimum for some sources), incremental sync, and real-time triggers depending on the data source type.

Coverage gap Portions of a relevant corpus that are not indexed or not reachable by the agent's retrieval queries. Often caused by incomplete ingestion pipelines, unsupported file formats, or access-permission mismatches.

Grounding In Vertex AI terminology, grounding refers to connecting a model's response to verifiable source documents. Vertex AI supports grounding with Google Search and with custom data stores via the Grounding API, released to GA in Q3 2024.

Module 1 Through-Line

Every lesson in this module examines one dimension of the relationship between data access architecture and agent output quality. Lesson 1 establishes the theoretical framework. Lessons 2–4 examine specific Google Cloud data sources — BigQuery, Cloud Storage / Vertex AI Search, and AlloyDB — and how their access patterns shape what agents can and cannot do reliably.

Google Cloud's Data Access Architecture for Agents

As of mid-2025, Google Cloud provides four primary pathways for agent data access. Each has a different latency profile, freshness guarantee, and appropriate use case.

Vertex AI Search data stores are the highest-level abstraction. You configure a data store, connect a source (Cloud Storage bucket, BigQuery table, Google Drive folder, or web crawl), and the service handles chunking, embedding, and indexing. Retrieval is via the Discovery Engine API. Freshness depends on sync configuration — scheduled or triggered.

Direct BigQuery access via the BigQuery API or Vertex AI's BigQuery tool in Agent Builder allows agents to execute SQL at query time against live warehouse data. This provides real-time freshness at the cost of higher latency (seconds per query) and potential cost at scale.

AlloyDB for PostgreSQL with pgvector extension supports hybrid search — structured SQL queries combined with vector similarity search — in a single database engine. This is appropriate when data is transactional and relational, and the agent needs both lookup and semantic retrieval.

Cloud Storage + manual RAG pipelines using Vertex AI Embeddings API and a Vector Search index give engineers the most control over chunking strategy, embedding model, and retrieval logic, at the cost of managing the pipeline themselves.

Why Model Choice Is Secondary

A recurring mistake in early enterprise AI projects (2022–2023) was treating model selection as the primary architectural decision. Teams spent weeks evaluating GPT-4 versus Claude versus Gemini while deploying all of them against the same under-built retrieval layer. The differences in model output quality were measurable. The differences introduced by retrieval gaps were larger.

Google's Gemini 1.5 Pro, released in February 2024, has a 1-million-token context window. A naive interpretation is that large context makes retrieval unnecessary — simply inject the entire corpus. In practice, this does not work for three reasons: (1) most enterprise corpora exceed even 1M tokens; (2) model attention degrades for relevant information buried in large contexts, a phenomenon studied in the "Lost in the Middle" paper by Liu et al. at Stanford in July 2023; (3) cost at scale is prohibitive. Retrieval remains necessary. The quality of retrieval remains the binding constraint.

Lesson 1 Quiz

Four questions · Why data access defines agent quality

According to Google DeepMind research cited in Lesson 1, retrieval augmentation can decrease agent accuracy when retrieval recall falls below what threshold?

Correct. Below 70% recall, partial context can mislead the model into generating confident but wrong answers — retrieval becomes a net negative.

Not quite. The threshold cited is 70% recall. Below that, misleading partial context can make retrieval augmentation counterproductive.

Which of the three failure modes described in Lesson 1 is most dangerous because the agent does not signal uncertainty — it simply produces a plausible incorrect answer?

Correct. Coverage gaps are insidious because the agent has no way to signal "I don't have that document" — it reasons over what it can see and produces confident wrong answers.

Review the three failure modes section. Coverage gaps are specifically described as invisible to users because the agent does not announce its blind spots.

The "Lost in the Middle" paper (Liu et al., Stanford, July 2023) is relevant to which specific argument in Lesson 1?

Correct. Liu et al. showed that even with large contexts, models lose track of relevant information in the middle of the window — one of three reasons large context cannot replace retrieval.

Re-read the "Why Model Choice Is Secondary" section. The paper is cited specifically to argue against the naive "just inject everything" approach enabled by large context windows.

Which Google Cloud data access pathway provides the highest data freshness but at the cost of higher per-query latency and potential cost at scale?

Correct. Direct BigQuery access executes SQL against live warehouse data at inference time — always fresh, but seconds of latency per query and slot costs that accumulate at scale.

Review the "Google Cloud's Data Access Architecture for Agents" section. Direct BigQuery access is described as real-time freshness at the cost of higher latency and potential scale cost.

Lab 1 — Data Access Architecture Advisor

Chat with an AI tutor about how data access shapes agent quality on Google Cloud · Complete 3 exchanges to finish

Your Task

You are designing a Vertex AI agent for a retail company. The agent needs to answer customer service questions using product catalog data (updated daily in BigQuery), return policy documents (PDFs in Cloud Storage), and live inventory counts (updated every 5 minutes in a Cloud SQL database). Use this lab to work through which data access pattern fits each data type and why the wrong choice will degrade agent quality.

Starter prompt: "I have three data sources — BigQuery for the product catalog, Cloud Storage for return policy PDFs, and Cloud SQL for live inventory. Which Vertex AI data access pattern should I use for each, and what goes wrong if I get it backwards?"

Data Access Advisor

Vertex AI · Google Cloud

Hello. I'm your data access architecture advisor for this lab. You're designing a Vertex AI agent with three distinct data sources, each with different freshness requirements and query patterns. Tell me about your architecture challenge and I'll help you reason through the right access pattern for each source — and what breaks when you choose incorrectly.

Lesson 2 · BigQuery as Agent Memory

The Warehouse That Thinks

BigQuery is not just a data store — for agents that need structured, queryable, always-fresh knowledge, it is the closest thing to working memory at warehouse scale.

When should an agent query BigQuery directly at inference time, and when should it pre-index BigQuery data into Vertex AI Search?

In early 2024, the logistics company DHL Supply Chain published a case study with Google Cloud describing an internal operations assistant built on Vertex AI. The agent answered questions from warehouse managers about shipment status, inventory levels, and routing exceptions. The initial architecture pre-indexed daily BigQuery snapshots into Vertex AI Search — a batch sync every 24 hours. Managers quickly rejected the tool because shipment status changed hourly. A manager asking "where is shipment 4821-C right now?" received an answer that was true yesterday and wrong today. The team rebuilt the agent to query BigQuery directly at inference time for time-sensitive fields and retain Vertex AI Search only for static reference data. The difference in user adoption was immediate and dramatic.

BigQuery's Role in Agentic Architectures

BigQuery, Google's serverless data warehouse, processed over 110 exabytes of data in 2023 according to Google's infrastructure disclosures. It is the most common enterprise data store in Google Cloud deployments and therefore the most common data source for Vertex AI agents. Understanding its access patterns is not optional — it is foundational.

BigQuery can serve agents in two distinct modes. In tool mode, the agent calls a BigQuery tool at inference time, generates or receives a SQL query, executes it, and incorporates the results into its context before responding. This mode provides real-time freshness — the answer reflects data as of the moment of the query. In index mode, BigQuery data is exported or synced into a Vertex AI Search data store, embedded, and retrieved semantically. This mode is faster at inference time but introduces freshness lag equal to the sync interval.

The choice between modes is determined by three factors: how fast does the data change, what kind of query does the agent need to run (semantic search vs. exact lookup), and what latency is acceptable to the user.

Vertex AI Agent Builder — BigQuery Tool

As of 2025, Vertex AI Agent Builder includes a native BigQuery tool that can be attached to an agent. The tool accepts natural language, generates BigQuery SQL via Gemini, executes the query, and returns structured results. It requires appropriate IAM roles (bigquery.dataViewer minimum) and supports row-level security via BigQuery's column-level and row-level security policies.

Designing for Freshness vs. Speed

The core tension in BigQuery-backed agent design is between freshness and latency. A BigQuery query that scans 10 GB of data typically returns in 2–8 seconds. For a conversational agent where users expect sub-second responses, this is often unacceptable. For an agent answering operational questions where accuracy is critical and a 5-second wait is tolerable, it is the right architecture.

Google's recommended pattern as of the Agent Builder documentation (2025) is a hybrid: use BigQuery direct access for high-velocity, small-result queries (single record lookups, aggregations over recent time windows) and Vertex AI Search for large-corpus semantic search where the data changes infrequently. This hybrid approach was validated in the DHL case and in Google's own internal deployment of Duet AI for Google Cloud, which uses BigQuery direct access for billing and usage queries and pre-indexed search for documentation.

A critical implementation detail: BigQuery's BI Engine can cache frequently-accessed data in memory, reducing query latency to under 1 second for repeated or similar queries. For agent use cases where the same or similar queries recur, BI Engine reservation is a significant latency optimization that most teams overlook in initial deployments.

Key Concepts

Tool mode (BigQuery) The agent calls the BigQuery API at inference time, executes a query, and incorporates live results. Freshness is real-time. Latency is 2–8 seconds for typical scan queries, sub-second with BI Engine for cached results.

Index mode (BigQuery) BigQuery data is synced to a Vertex AI Search data store on a schedule. The agent retrieves semantically relevant chunks. Freshness is bounded by sync interval (minimum hourly for some connectors). Retrieval latency is under 500ms.

BigQuery BI Engine An in-memory analysis service that caches BigQuery data for sub-second query response. Particularly valuable for agentic use cases with repeated or templated queries. Capacity is reserved in GB-hour units.

Row-level security (BigQuery) BigQuery row access policies restrict which rows a given user or service account can query. When an agent runs as a service account, row-level security governs which data the agent can ever return — a critical data governance consideration for multi-tenant agent deployments.

What Goes Wrong: Common BigQuery Agent Failures

Querying at the wrong granularity. An agent asked "how are sales trending?" that runs a full table scan across 3 years of transaction data will hit slot limits, run for 30+ seconds, and potentially time out. Agents must be designed with query budget constraints — either via materialized views, partitioned tables queried with date filters, or query cost limits enforced at the API level.

Ignoring partition pruning. BigQuery tables partitioned by date dramatically reduce scan cost and latency when queries include partition filters. An agent that generates SQL without date constraints on a partitioned table will perform full scans. This is a prompt engineering problem as much as a schema design problem — the agent's system prompt should include guidance on which tables are partitioned and how.

Schema blindness. Without schema context in the system prompt or tool description, a Gemini-generated SQL query will guess column names. Google recommends providing table schemas, sample values, and semantic descriptions of columns as part of the tool definition. This is documented in the Vertex AI Agent Builder tool configuration guide (2025).

Lesson 2 Quiz

Four questions · BigQuery as agent memory

In the DHL Supply Chain case, why did the initial architecture fail despite using Vertex AI Search for the agent's data?

Correct. The 24-hour sync interval meant shipment status data was always up to a day stale — unacceptable for an operational query like "where is this shipment right now?"

Re-read the DHL opening scene. The failure was data freshness: a 24-hour batch sync for data that changed hourly.

BigQuery BI Engine is specifically valuable for agentic use cases because it:

Correct. BI Engine's in-memory caching makes repeated or templated queries sub-second — a significant improvement over the 2–8 second baseline for agents with recurring query patterns.

Review the BI Engine key term. Its value is in-memory caching for sub-second latency on frequently-accessed data.

According to Google's documented recommendations, what should be included in a BigQuery tool definition to prevent "schema blindness" in agent-generated SQL?

Correct. Without schema context, the model guesses column names. Google's 2025 tool configuration guide recommends including schemas, sample values, and semantic column descriptions.

Review the "What Goes Wrong: Schema blindness" subsection. The fix is providing rich schema context — not templates or infrastructure identifiers.

Which of the following is the most appropriate use of BigQuery index mode (sync to Vertex AI Search) rather than direct query at inference time?

Correct. Infrequently-updated, large-corpus data that needs semantic search — not exact lookup — is the ideal candidate for index mode. Product descriptions updated monthly are a textbook match.

Index mode suits large corpora with slow change rates and semantic search needs. Real-time or exact-lookup needs require direct query access.

Lab 2 — BigQuery Query Design for Agents

Design BigQuery access patterns for a Vertex AI agent · Complete 3 exchanges to finish

Your Task

You are building a Vertex AI agent that answers questions about a BigQuery dataset containing 5 years of e-commerce transactions (2 trillion rows, partitioned by transaction_date). Questions range from "what were total sales last week?" to "find all customers who bought product X and returned it." Work through query design, partitioning strategy, BI Engine use, and schema context with the AI advisor.

Starter prompt: "My BigQuery table has 2 trillion rows partitioned by transaction_date. My agent needs to answer questions like 'what were total sales last week?' How do I make sure the agent generates efficient SQL that doesn't scan everything and time out?"

BigQuery Agent Design Advisor

BigQuery · Vertex AI

Hello. I'm your BigQuery agent design advisor. You're working with a massive partitioned table and need to ensure your Vertex AI agent generates efficient, correct SQL without full-table scans or timeouts. Let's work through your architecture — what's your first question about the query design?

Lesson 3 · Vertex AI Search and Unstructured Data

Making Documents Answerable

Unstructured data — PDFs, HTML, emails, slides — is where most enterprise knowledge lives. Vertex AI Search is the managed layer that turns it into something agents can reason over.

What does Vertex AI Search actually do between ingestion and retrieval, and where does quality degrade?

In 2024, Highmark Health, one of the largest integrated health systems in the United States, publicly described a Vertex AI Search deployment at Google Cloud Next. The system indexed clinical guidelines, policy documents, and member handbooks — roughly 2.4 million pages — to help care managers answer coverage questions quickly. The initial deployment achieved high retrieval speed but low answer accuracy on complex multi-part questions. Investigation revealed two chunking problems: very long documents were split at fixed character intervals that crossed section boundaries, and short documents (single-page memos) were embedded as single chunks that were too coarse for precise retrieval. Highmark's engineering team rebuilt the ingestion pipeline with semantic chunking (splitting at paragraph and section boundaries) and added metadata fields for document type and effective date. Answer accuracy on the evaluation set improved by 29 percentage points.

The Ingestion Pipeline: What Vertex AI Search Actually Does

When you connect a Cloud Storage bucket or Google Drive folder to a Vertex AI Search data store, the service executes a pipeline with four stages: document extraction, chunking, embedding, and indexing. Each stage is a point where quality can be gained or lost.

Document extraction converts raw files — PDFs, DOCX, HTML, TXT — into plain text and structured metadata. Vertex AI Search uses Google's Document AI under the hood for PDF parsing as of 2024. Scanned PDFs without OCR text layers will produce poor extraction results. Password-protected files will fail silently in some configurations.

Chunking splits extracted text into segments that fit within the embedding model's token limit. Vertex AI Search's default chunking is fixed-size with overlap. As the Highmark case demonstrated, fixed-size chunking at document boundaries can produce chunks that are semantically incoherent — mid-sentence splits, separated question-answer pairs, orphaned tables. Google introduced configurable chunking strategies in Vertex AI Search in late 2024, including layout-aware chunking that respects HTML/DOCX structure.

Embedding converts each chunk into a dense vector representation using a Google embedding model (text-embedding-004 as of mid-2025). The semantic distance between these vectors determines what the retrieval step returns. Embedding model quality sets a ceiling on semantic retrieval quality that no amount of indexing optimization can exceed.

Indexing stores the vectors in Google's proprietary approximate nearest-neighbor index (descendant of the ScaNN algorithm published by Google Research in 2019). The index is built automatically and managed by the service — you do not configure it directly, but you can configure the number of results returned and the relevance threshold.

Layout-Aware Chunking — GA November 2024

Vertex AI Search's layout-aware chunking mode, released to GA in November 2024, uses document structure signals (HTML headers, DOCX styles, PDF bookmark trees) to split at meaningful semantic boundaries. For enterprise document corpora with consistent formatting, this significantly improves chunk coherence and retrieval precision. It requires that source documents have structural metadata — flat-text PDFs do not benefit.

Retrieval Quality: Precision vs. Recall

Vertex AI Search retrieval operates as a two-stage pipeline: approximate nearest neighbor (ANN) search over the vector index returns a candidate set, then a re-ranking model orders the candidates by relevance. The final ranked list is what the agent receives. Both stages have tunable parameters.

The primary tension is between precision and recall. Requesting more candidates from the ANN stage improves recall (more relevant documents are in the candidate set) but increases re-ranking latency and risks diluting the top results with irrelevant documents. Google's recommended starting configuration for enterprise deployments is 10–20 candidates with re-ranking enabled, evaluated against a set of representative user queries.

A critical and frequently overlooked feature is extractive answers: Vertex AI Search can return not just document chunks but the specific passage within a chunk most likely to answer the query. This reduces the context the agent must reason over and improves answer precision. Extractive answers are enabled via the contentSearchSpec parameter in the Search API and are available for data stores backed by unstructured documents.

Key Concepts

Chunking strategy The method by which ingested documents are split into indexable segments. Options in Vertex AI Search include fixed-size (default), fixed-size with overlap, and layout-aware (GA November 2024). Chunking strategy is a primary determinant of retrieval precision for long-document corpora.

Extractive answers A Vertex AI Search feature that identifies and returns the specific passage within a retrieved chunk most likely to directly answer the query. Enabled via contentSearchSpec. Reduces downstream reasoning burden on the agent model.

Re-ranking A second-stage relevance model applied to the initial ANN candidate set to reorder results by estimated relevance to the specific query. Vertex AI Search applies re-ranking automatically when enabled, using a cross-encoder model distinct from the embedding model.

Document metadata filtering Structured metadata fields attached to documents at ingestion time (e.g., document_type, effective_date, department) that can be used to filter retrieval results. Filtering before semantic search significantly improves precision in large heterogeneous corpora.

When Vertex AI Search Is the Wrong Tool

Vertex AI Search is optimized for semantic retrieval over large unstructured corpora. It is the wrong tool when the agent needs exact record lookup (use BigQuery or a SQL database), when the corpus changes faster than the minimum sync interval (use direct API access), or when the retrieval logic requires complex multi-hop joins across entities (use a knowledge graph or structured database with explicit traversal logic).

A common mistake is deploying Vertex AI Search for data that is fundamentally tabular — product SKUs, customer IDs, order numbers — because semantic similarity is a poor substitute for exact match on structured identifiers. Searching a Vertex AI Search index for "order #4821-C" will return semantically similar documents, not the specific order record. For exact-match use cases, BigQuery direct access or Firestore lookups are appropriate.

Lesson 3 Quiz

Four questions · Vertex AI Search and unstructured data

In the Highmark Health case, what two chunking problems caused low answer accuracy on complex questions?

Correct. Fixed-interval splits broke semantic coherence in long documents, and single-chunk embedding of short documents was too coarse for precise retrieval. Semantic chunking fixed both.

Re-read the Highmark case. The two specific problems were fixed-interval splits crossing section boundaries and overly-coarse single chunks for short documents.

Vertex AI Search's "extractive answers" feature is best described as:

Correct. Extractive answers identify the most relevant passage within a chunk — reducing the context the agent must reason over and improving precision. Enabled via contentSearchSpec.

Extractive answers are sub-chunk precision: they identify the specific passage most likely to answer the query within an already-retrieved chunk.

For which of the following data types is Vertex AI Search a poor choice as an agent data source?

Correct. Semantic similarity search is a poor substitute for exact-match on structured identifiers like SKUs and order IDs. BigQuery or Firestore is appropriate for that use case.

Vertex AI Search is wrong for exact-match lookup on structured identifiers. Semantic search on "order #4821-C" will return similar-sounding things, not the specific record.

Layout-aware chunking in Vertex AI Search (GA November 2024) provides improved chunk coherence specifically because:

Correct. Layout-aware chunking uses structural metadata to split at section and paragraph boundaries rather than arbitrary character intervals — producing semantically coherent chunks.

Layout-aware chunking works by reading document structure signals. Flat-text PDFs without structural metadata do not benefit from it.

Lab 3 — Vertex AI Search Configuration Advisor

Design an optimal Vertex AI Search data store for a document-heavy enterprise use case · Complete 3 exchanges to finish

Your Task

You are building a Vertex AI agent for a legal team. The agent must answer questions over 80,000 contract PDFs stored in Cloud Storage, ranging from 1-page NDAs to 400-page master agreements, updated monthly. Work through chunking strategy, metadata schema, extractive answer configuration, and retrieval tuning with the AI advisor.

Starter prompt: "I have 80,000 contract PDFs ranging from 1 page to 400 pages. How should I configure the Vertex AI Search data store — especially chunking — so that my agent can answer questions like 'what are the termination clauses in our contracts with Vendor X?' precisely?"

Vertex AI Search Configuration Advisor

Vertex AI Search · Document AI

Hello. I'm your Vertex AI Search configuration advisor. You have a challenging corpus — high variance in document length, legal language requiring precise clause retrieval, and monthly refresh cycles. Let's design a data store configuration that handles all of that. What's your first question?

Lesson 4 · AlloyDB, Hybrid Search, and Real-Time Agent Data

When the Agent Needs to Know Right Now

AlloyDB for PostgreSQL with pgvector enables hybrid search — combining exact SQL queries with semantic vector search — for agents that cannot wait for a batch sync.

What does hybrid search mean in practice, and when does AlloyDB outperform both BigQuery and Vertex AI Search for agent data access?

At Google Cloud Next 2024, Wayfair's engineering team presented an architecture in which a Vertex AI agent helped catalog specialists enrich product listings. The agent needed to answer two types of questions simultaneously: exact lookups ("does SKU WF-44821 already have a 'material' attribute?") and semantic search ("find five similar products with detailed dimension specifications we can use as templates"). A pure Vertex AI Search deployment answered the semantic queries well but could not perform the exact SKU lookups reliably. A pure BigQuery deployment answered exact queries quickly but semantic similarity search over 30 million product embeddings required custom infrastructure. The solution was AlloyDB for PostgreSQL with the pgvector extension — a single database that handled exact-match SQL and approximate nearest-neighbor vector search in the same query, with sub-100ms response times on both query types after index warming.

AlloyDB and the pgvector Extension

AlloyDB for PostgreSQL is Google Cloud's fully-managed, PostgreSQL-compatible database engine, first released to GA in November 2022. It uses a disaggregated storage architecture (log-structured storage with intelligent caching) that achieves 4x faster transactional write throughput and up to 100x faster analytical query performance compared to standard CloudSQL PostgreSQL on Google Cloud's benchmarks.

The pgvector extension, originally developed by Andrew Kane and open-sourced in 2021, adds a VECTOR data type to PostgreSQL and enables approximate nearest-neighbor (ANN) search using IVFFlat or HNSW indexes. AlloyDB added native support for pgvector in 2023 and optimized its execution in the AlloyDB Omni (on-premises) and managed cloud variants through 2024. As of mid-2025, AlloyDB supports HNSW indexes up to 64,000 dimensions with sub-50ms query latency at the 10-million-vector scale with appropriate index configuration.

The key capability that neither BigQuery nor Vertex AI Search alone provides is hybrid queries: combining WHERE clause filters (exact SQL) with ORDER BY vector_distance() (semantic similarity) in a single query execution plan. For agents that need to say "find the five most semantically similar products to this description, but only among products in category X with inventory > 0," AlloyDB executes this as a single index-accelerated query rather than two separate API calls with application-layer joining.

AlloyDB AI Integration — Vertex AI Embeddings

AlloyDB's Google ML integration (alloydb.create_embedding() function, GA 2024) allows embedding generation to be called directly from SQL using Vertex AI Embeddings API. An agent can insert new text, trigger embedding generation at the database level, and have the new vector immediately available for similarity search — no separate embedding pipeline required. This is documented in the AlloyDB AI documentation under "Work with embeddings."

When AlloyDB Beats the Alternatives

The decision framework is straightforward once the data characteristics are understood. Use AlloyDB when: the agent's queries combine structured filters with semantic search; data is updated frequently (transactional workload); the corpus is under ~100M vectors (beyond which managed vector databases or Vertex AI Vector Search become more cost-effective); and the team already has PostgreSQL expertise.

Use Vertex AI Search when: the corpus is primarily unstructured documents; semantic search is the primary retrieval mode; the team prefers a fully managed, no-schema abstraction; and freshness latency of minutes-to-hours is acceptable.

Use BigQuery direct access when: the data is in a warehouse and freshness must be real-time; the query is aggregational (sums, counts, averages) rather than retrieval; and per-query latency of 2–8 seconds is acceptable.

These are not mutually exclusive. The production architectures described at Google Cloud Next 2024 by Wayfair, DHL, and Highmark all used two or three of these systems together, routing agent queries to the appropriate backend based on query type classification in the agent's routing logic.

Key Concepts

Hybrid search A query that combines exact SQL filtering (WHERE clauses, JOINs) with vector similarity search (ORDER BY vector_distance) in a single execution plan. AlloyDB with pgvector executes hybrid queries with a single index scan, avoiding the latency and correctness risks of application-layer joining.

HNSW (Hierarchical Navigable Small World) An approximate nearest-neighbor index algorithm that uses a multilayer graph structure. HNSW provides better query recall and faster search than IVFFlat for most vector search use cases. AlloyDB supports HNSW indexes via pgvector as of mid-2025.

alloydb.create_embedding() An AlloyDB SQL function (GA 2024) that calls Vertex AI Embeddings API directly from within a SQL statement, enabling embedding generation at insert time without an external pipeline. Requires AlloyDB's Google ML integration to be enabled on the cluster.

Routing logic The agent component (often a classifier or rule set in the system prompt) that determines which backend data source to query for a given user request. In multi-source agent architectures, routing logic quality directly determines whether the agent reaches the right data store for each query type.

Module 1 Summary: The Decision Framework

This module established that data access is the primary determinant of agent quality — not model capability, not prompting technique, not inference infrastructure. The specific architecture choices covered were: Vertex AI Search for large unstructured corpora with latency tolerance; BigQuery direct access for real-time warehouse queries; AlloyDB for hybrid transactional-plus-semantic workloads; and the principle that production systems require routing across multiple backends.

The failures documented — Morgan Stanley's retrieval blindness, DHL's staleness, Highmark's chunking imprecision — are not edge cases. They are the default outcomes when data access architecture receives insufficient attention in the design phase. The engineers who avoided these failures did so by asking, before building, what the data looks like, how fast it changes, and what kind of query the agent actually needs to run. That sequence of questions is the through-line of this entire course.

Lesson 4 Quiz

Four questions · AlloyDB, hybrid search, and real-time agent data

In the Wayfair case at Google Cloud Next 2024, why was a pure Vertex AI Search deployment insufficient for the catalog enrichment agent?

Correct. Vertex AI Search is optimized for semantic similarity — exact-match lookup on structured identifiers like SKUs is unreliable in a semantic search index.

Re-read the Wayfair opening scene. The failure was that semantic search cannot reliably perform exact-match lookups on product SKUs.

The alloydb.create_embedding() function is valuable for agentic architectures specifically because:

Correct. By calling Vertex AI Embeddings from within SQL at insert time, new data is immediately searchable via vector similarity without an external pipeline stage.

The value is pipeline elimination: embeddings are generated at the database level during insert, making new records immediately vector-searchable.

According to the decision framework in Lesson 4, which scenario is the best fit for AlloyDB rather than Vertex AI Search or BigQuery?

Correct. This is the hybrid search pattern: semantic similarity (vector search) combined with structured filters (in-stock, category). AlloyDB executes this as a single query; alternatives require two separate calls.

The AlloyDB sweet spot is hybrid: combining semantic vector search with exact SQL filters in a single query. Identify which scenario requires both types of query simultaneously.

The Module 1 Summary states that the failures at Morgan Stanley, DHL, and Highmark all share a common root cause. What is it?

Correct. All three failures were preventable by asking upfront: what does the data look like, how fast does it change, and what kind of query does the agent need to run?

Re-read the Module 1 Summary. The through-line is explicit: these failures result from data access architecture receiving insufficient attention before building begins.

Lab 4 — AlloyDB Hybrid Search Design

Design a hybrid SQL + vector search architecture in AlloyDB for a Vertex AI agent · Complete 3 exchanges to finish

Your Task

You are building a Vertex AI agent for a pharmaceutical company. The agent helps researchers find clinical trial records. It needs to: (1) perform exact lookup by trial ID (NCT number), (2) find semantically similar trials by description, and (3) filter results to only trials in a specific therapeutic area with active enrollment status. Work through the AlloyDB schema design, pgvector index configuration, and query structure with the AI advisor.

Starter prompt: "I need a Vertex AI agent that can look up clinical trials by NCT ID, find semantically similar trials by description, and filter to active trials in oncology only. I'm considering AlloyDB with pgvector. How should I structure the table schema and indexes to support all three query types efficiently?"

AlloyDB Hybrid Search Advisor

AlloyDB · pgvector · Vertex AI

Hello. I'm your AlloyDB hybrid search advisor. Clinical trial data is a compelling use case for hybrid search — you need exact identifier lookup, semantic similarity, and structured filtering to work together. Let's design a schema and index configuration that handles all three. What's your first question?

Module 1 Test

15 questions · Pass at 80% (12/15) to complete the module

1. The RAG architecture was first formally described in a paper by which organization in May 2020?

Correct. Lewis et al. at Facebook AI Research published "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks" in May 2020.

The RAG paper was published by Lewis et al. at Facebook AI Research (FAIR) in May 2020.

2. Which failure mode causes an agent to answer with information that was accurate in the past but is now outdated?

Correct. Staleness occurs when the agent's knowledge is frozen at a training cutoff or last-sync timestamp.

Staleness is the failure mode for outdated information — the agent's knowledge is frozen at a past point in time.

3. Vannevar Bush's 1945 "As We May Think" essay is cited in the course introduction as a historical parallel because it identified what as the primary bottleneck in knowledge work?

Correct. Bush argued that an instrument that could calculate but not recall was a narrow tool — the bottleneck is retrieval, not compute.

Bush's insight was about retrieval — finding the right information when needed — not about compute, storage, or attention.

4. What is the minimum IAM role required for a Vertex AI agent's service account to query BigQuery data using the BigQuery tool in Agent Builder?

Correct. bigquery.dataViewer is the minimum data-access role; bigquery.jobUser is also required to run queries, but dataViewer is the minimum data role cited.

The lesson cites bigquery.dataViewer as the minimum required role for the agent's service account to access BigQuery data.

5. The "Lost in the Middle" paper (Stanford, July 2023) demonstrated that large language models:

Correct. Liu et al. showed model attention degrades for information buried in the middle of large contexts — one reason large context windows do not eliminate the need for retrieval.

The "Lost in the Middle" paper showed that models have degraded attention for relevant information in the middle of large context windows.

6. Which stage of the Vertex AI Search ingestion pipeline converts PDF and DOCX files into plain text and structured metadata?

Correct. Document extraction is the first stage, and Vertex AI Search uses Google's Document AI for PDF parsing as of 2024.

Document extraction is the first pipeline stage, converting raw files to text using Document AI. Chunking, embedding, and indexing follow.

7. AlloyDB for PostgreSQL was first released to General Availability on Google Cloud in:

Correct. AlloyDB for PostgreSQL reached GA in November 2022.

AlloyDB for PostgreSQL was released to GA in November 2022.

8. In Vertex AI Search, which API parameter enables extractive answers — the identification of the specific passage most likely to answer the query within a retrieved chunk?

Correct. Extractive answers are enabled via the contentSearchSpec parameter in the Search API.

The lesson specifies contentSearchSpec as the parameter for enabling extractive answers in the Vertex AI Search API.

9. The pgvector extension for PostgreSQL was originally developed by:

Correct. pgvector was developed by Andrew Kane and open-sourced in 2021.

pgvector was developed by Andrew Kane and open-sourced in 2021 — not a Google or PostgreSQL core team project.

10. According to the decision framework in Lesson 4, at approximately what corpus size does AlloyDB become less cost-effective than managed vector databases or Vertex AI Vector Search?

Correct. The lesson states AlloyDB is appropriate "for corpora under ~100M vectors" — beyond that, managed vector databases or Vertex AI Vector Search are more cost-effective.

The lesson specifies ~100 million vectors as the threshold beyond which AlloyDB becomes less cost-effective compared to managed vector services.

11. BigQuery BI Engine stores data in memory to reduce query latency. How is its capacity specified?

Correct. BI Engine capacity is reserved in GB-hour units as described in the Lesson 2 key terms.

The lesson specifies that BI Engine capacity is reserved in GB-hour units — not slots, result counts, or scan volume.

12. The ScaNN algorithm, whose descendants power Vertex AI Search's approximate nearest-neighbor index, was published by:

Correct. ScaNN was published by Google Research in 2019 and its descendants power the ANN index in Vertex AI Search.

ScaNN was published by Google Research in 2019.

13. Which of the following best describes "routing logic" in a multi-source agent architecture?

Correct. Routing logic classifies each incoming query and directs it to the appropriate data backend — BigQuery, AlloyDB, Vertex AI Search, or another source.

Routing logic is the agent-level decision component that routes each query to the appropriate data source based on query type.

14. When an agent generates SQL without date constraints on a BigQuery table partitioned by date, the primary consequence is:

Correct. Without a partition filter, BigQuery scans all partitions — the full table — dramatically increasing both cost and query latency.

Without partition filters, BigQuery performs a full table scan. On large tables this is very expensive and slow.

15. Gemini 1.5 Pro was released in February 2024 with a context window of:

Correct. Gemini 1.5 Pro launched with a 1-million-token context window — the longest commercially available at the time of its February 2024 release.

Gemini 1.5 Pro launched in February 2024 with a 1-million-token context window.