When Google announced Duet AI for BigQuery at Cloud Next 2023, the headline feature was natural-language-to-SQL generation inside the console. Behind that feature sat a critical architectural decision: the LLM would never touch production data directly. Every data interaction passed through a typed, auditable tool interface — an early public demonstration of the agent-tool pattern at enterprise scale.
BigQuery processes over 110 petabytes of data per day across Google's internal and customer workloads. For agents built on Vertex AI, it is the natural first-choice analytical backend for three structural reasons: serverless execution (no connection pools to manage), columnar storage (fast aggregations over huge tables), and IAM-native access control (service accounts grant granular, auditable permissions without storing credentials in agent code).
Unlike a traditional database where a client holds a persistent connection, BigQuery operates via REST and gRPC APIs. Each query is a stateless job submission — which maps cleanly onto the stateless tool-call model agents use. An agent fires a tool call, BigQuery runs the job asynchronously, the agent polls or awaits the result, and the conversation continues.
In LangChain's BigQuery integration and in Vertex AI Agent Builder's native connectors, a BigQuery tool is registered with a JSON schema describing its inputs (SQL string, project ID, dataset, optional timeout) and outputs (rows as JSON, schema metadata, job statistics). The LLM never writes to this schema — it only reads the spec, produces a valid call, and interprets the structured response.
Three tools are typically exposed. ListBigQueryTablesTool lets the agent discover what tables exist. GetBigQueryTableSchemaTool returns column names and types before the agent commits to a query. QueryBigQueryTool actually executes SQL and returns rows. This three-step discovery pattern — list, inspect, query — prevents the agent from hallucinating column names, a common failure mode when schema context is absent.
Agents running on Vertex AI infrastructure authenticate to BigQuery via Application Default Credentials (ADC). When deployed to Cloud Run, Vertex AI Workbench, or a Vertex AI Agent, the runtime environment automatically provides a service account identity. That service account must be granted roles/bigquery.dataViewer on the dataset and roles/bigquery.jobUser on the project — and nothing more. The principle of least privilege is enforced at the IAM layer, not inside agent code.
In Google's own internal deployments of Duet AI for BigQuery (documented in the 2023 Cloud Next technical session "Duet AI: Under the Hood"), every agent-initiated query runs under a dedicated service account with row-level access controls enforced via BigQuery's column-level security and row access policies. The LLM receives only the rows it is authorized to see — the access boundary is set at the data layer, not the prompt layer.
Lesson 2 dives into SQL generation strategies — how Gemini constructs queries, what prompting patterns reduce errors, and how the agent handles multi-step queries that require intermediate results.
You are designing a Vertex AI agent that will query a BigQuery dataset containing customer transaction records. Work through the architecture decisions: tool registration, IAM setup, and schema discovery flow.
At Google I/O 2024, the Gemini for Google Cloud team demonstrated a live natural-language analytics session in BigQuery Studio. An analyst typed: "Show me the top 10 products by revenue last quarter, broken down by region." The agent issued three sequential tool calls before writing a single byte of SQL: it listed available tables, fetched schemas for two candidate tables, then synthesized a JOIN query. The SQL was syntactically valid on the first attempt — a result the team attributed to schema-in-context, not model size alone.
The single biggest predictor of SQL correctness in LLM-generated queries is whether the full relevant schema is present in the context window at generation time. A study published by Google Research in 2023 ("Can LLMs Generate Correct SQL?") found that Gemini Pro's SQL accuracy on BigQuery benchmarks improved from 61% to 89% when schema context was injected — a 28-point gain without any fine-tuning.
In practice, agents inject schema by first calling GetBigQueryTableSchemaTool for each candidate table, then including the JSON schema response verbatim in the system prompt for the SQL-generation step. The token cost is modest — a typical BigQuery table schema is 200–800 tokens — and the accuracy gain is substantial.
Complex analytical questions rarely map to a single SQL statement. Agents that attempt one-shot SQL for multi-step logic (e.g., "find customers who bought product A but not B in the last 90 days, then rank them by lifetime value") frequently produce syntactically valid but logically wrong queries. The more robust pattern is query decomposition: the agent breaks the question into sub-queries, executes each as a separate tool call, and passes intermediate results back into the context before constructing the final query.
Standard SQL generation prompts trained on generic databases produce subtly wrong BigQuery SQL. The most common failures are use of backtick-free table references (valid in some engines, rejected by BigQuery's parser), missing project qualifiers, and using CURRENT_DATE() syntax from MySQL rather than BigQuery's CURRENT_DATE (no parentheses).
| Pattern | Generic SQL (Wrong in BQ) | BigQuery Correct |
|---|---|---|
| Table reference | FROM orders o | FROM `proj.dataset.orders` o |
| Current date | WHERE date = CURRENT_DATE() | WHERE date = CURRENT_DATE |
| Approx distinct | COUNT(DISTINCT user_id) | APPROX_COUNT_DISTINCT(user_id) |
| Array unnest | UNNEST(items) AS item | UNNEST(items) AS item WITH OFFSET pos |
| Date truncation | DATE_TRUNC('month', dt) | DATE_TRUNC(dt, MONTH) |
Include three to five BigQuery-specific syntax reminders in the system prompt whenever the agent may generate SQL. This costs roughly 100 tokens and eliminates the most common syntax errors observed in production Vertex AI agent deployments.
You are building an agent that answers natural-language questions about an e-commerce BigQuery dataset. Work through schema injection setup and practice decomposing a complex analytical question into sub-queries.
In late 2022, a major US retailer (documented in a BigQuery community forum post by a Google Cloud engineer) suffered a $47,000 single-day BigQuery bill when an internal automation script submitted a full-table scan on a 12TB events table inside a feedback loop. The script had no byte-limit guard and no dry-run step. The incident directly influenced Google's decision to make dry-run validation a default step in the Duet AI for BigQuery architecture — before any agent-generated query executes, BigQuery's dry-run API estimates bytes scanned and the agent checks against a configured threshold.
Every BigQuery query job can be submitted in dry-run mode — the query is parsed and optimized but not executed, and the response includes an estimate of bytes that would be processed. This costs nothing. For agents, dry-run is the mandatory first step before submitting any LLM-generated SQL to production tables.
Production deployments at Google Cloud enterprise customers typically implement five distinct guardrail layers, each catching a different failure class:
BigQuery allows tables to be marked with requirePartitionFilter — any query that does not include a filter on the partition column is rejected at parse time, before any bytes are scanned. For agent-generated SQL over date-partitioned event tables (a common pattern in analytics), this is the single most effective cost control: an agent that forgets a WHERE clause on the date column gets a hard error rather than a $10,000 bill.
Google's own documentation recommends this setting for any partitioned table larger than 100GB that an agent may query. It is set at table creation or via ALTER TABLE DDL and is enforced by BigQuery's query planner, not by application code — meaning it cannot be bypassed even if the agent's guardrail logic has a bug.
BigQuery charges $5 per TB scanned (on-demand pricing as of 2024). A single full-table scan on a 12TB events table costs $60 — and agents in feedback loops can submit dozens of queries per minute. Dry-run + partition filters + project-level max bytes billed together cap worst-case agent costs to a predictable ceiling regardless of what SQL the LLM generates.
Even a syntactically correct, cost-efficient query can overwhelm an agent's context window if it returns millions of rows. The standard pattern is to enforce a maximum result row count in the QueryBigQueryTool wrapper — returning at most 1,000 rows by default — and to have the agent summarize or aggregate before returning large result sets to the user. LangChain's built-in BigQueryTool enforces a 1,000-row default and surfaces the total row count so the agent can request aggregated queries if needed.
Your agent will query a 15TB event tracking table partitioned by day. Design the complete guardrail stack: dry-run threshold, partition filter requirement, max bytes billed, and result row cap. Discuss trade-offs with your assistant.
Wayfair, in a 2023 Google Cloud blog post co-authored with the Vertex AI team, described deploying a natural-language analytics agent over their BigQuery product catalog and sales data. The team's critical insight was that users stopped trusting the agent when it paraphrased query results — they wanted to see the exact rows that drove the answer. The solution was a grounding layer: every agent response that referenced data included a formatted table of the supporting rows, with the BigQuery job ID appended so analysts could re-run and audit the query directly in the console.
For small result sets (under ~200 rows), the standard pattern is to convert BigQuery rows directly to JSON or a Markdown table and inject them into the LLM's context as a tool response. The LLM then synthesizes its answer from this grounded data rather than from its parametric knowledge — a technique called Retrieval-Augmented Generation (RAG) over structured data.
For larger result sets, the agent uses BigQuery's Storage Read API to stream results in Arrow format, applying aggregations or sampling before injecting into context. The BigQuery Storage Read API (launched 2019, now generally available) delivers data at 10x the throughput of the REST export API — enabling real-time agent workflows over large tables without staging to Cloud Storage first.
Without explicit grounding instructions, LLMs frequently extrapolate beyond the returned data — inferring trends from small samples, adding context from training data, or smoothing over gaps. The documented production pattern is to add a strict grounding instruction to the synthesis step:
Answer ONLY using the data provided in the [DATA] section below. If the data does not contain enough information to answer the question, say so explicitly. Do not use your training knowledge to fill gaps. Cite the number of rows your answer is based on. If asked for a trend, only assert a trend if at least three data points support it.
Every BigQuery query job returns a unique job ID (format: project:region.job_XXXXXXXX). Including this ID in the agent's response gives analysts a direct audit trail — they can paste the job ID into the BigQuery console and see the exact SQL, execution time, bytes processed, and result set. This is the structured-data equivalent of citation links in a RAG document answer.
The Wayfair implementation referenced above exposed job IDs in a collapsible "Data Sources" section in their internal analytics chat UI, allowing data engineers to immediately audit any agent-generated insight. According to their blog post, this single feature reduced analyst trust questions from approximately 40% of sessions to under 5% — users with audit access simply checked the job rather than questioning the output.
Agents must be explicitly instructed on two failure modes. Empty result sets — where the query executes successfully but returns zero rows — should prompt the agent to acknowledge the absence of data and suggest why (wrong date range, mis-spelled filter value, data not yet loaded). Partial result sets — where the row limit was hit — should prompt the agent to disclose that the answer is based on a sample, not the full data, and offer to run an aggregated query instead.
You are building the synthesis layer of a BigQuery analytics agent. Design the grounding instruction, result formatting function, and provenance output format. Consider how the agent should handle empty and truncated result sets.