Module 7 · Lesson 1

Pub/Sub and Eventarc Foundations

How Google's event messaging backbone enables agents that react in milliseconds rather than minutes.

What architectural guarantees make Pub/Sub safe enough for financial transactions — and fast enough for fraud detection?

In March 2021, Google's Site Reliability Engineering team published a retrospective on how Google Pay's fraud pipeline had moved from batch overnight scoring to sub-second event streaming. The key change was replacing a polling architecture with Cloud Pub/Sub push subscriptions feeding a Dataflow pipeline. Payment events published in one region were globally replicated and fan-out delivered within 100 milliseconds at the 99th percentile, even under Black Friday traffic spikes of 40× baseline.

What Is Pub/Sub?

Cloud Pub/Sub is Google's fully managed, globally distributed message bus. Publishers send messages to a topic; subscribers receive them via a subscription — pull or push. The service guarantees at-least-once delivery, automatic message retention for up to 7 days, and ordering within a region when message ordering keys are set.

For agentic workflows, Pub/Sub is the nervous system. An agent doesn't poll a database every five seconds hoping something changed — it receives an event the instant something changes, triggering reasoning and action in the same second.

Core Concept

Topics & Subscriptions

A topic is a named resource. Subscriptions attach to topics and can fan out to multiple independent consumers. Each subscription maintains its own backlog cursor.

Delivery Modes

Pull vs. Push

Pull subscribers call the service to fetch messages (batchable, back-pressured). Push subscriptions send HTTP POST to a configured endpoint — ideal for triggering Cloud Run agents without polling.

Reliability

At-Least-Once Delivery

Messages are delivered at least once. Agents must handle idempotency. Exactly-once processing is achievable with Dataflow's Pub/Sub I/O connector, which uses acknowledgment IDs and deduplication windows.

Scale

Global Replication

Messages are synchronously replicated across Google's global network before acknowledgment. Publishers in Tokyo can trigger agents in Iowa with sub-200 ms end-to-end latency.

Eventarc: Structured Event Routing

While Pub/Sub handles raw byte streams, Eventarc adds a structured routing layer based on the CloudEvents 1.0 specification. Rather than publishing an unstructured payload, services emit typed events (e.g., google.cloud.bigquery.job.v1.completed) that Eventarc routes to Cloud Run, Cloud Functions, GKE, or Workflows.

This matters for agents because structured events carry semantic context. An agent receiving a BigQuery job completion event automatically knows the dataset, table, job ID, and outcome — without parsing raw bytes. It can immediately reason about what changed and what action to take.

GCS Object Created

→

Eventarc Trigger

→

Cloud Run Agent

→

Vertex AI Gemini

→

Pub/Sub Response Topic

Production Pattern

Spotify's 2023 data platform migration (documented at Google Cloud Next 2023) used Eventarc to trigger validation agents whenever a new Dataform release completed. The agent compared row counts, schema drift, and null rates against SLOs, then either auto-approved or opened a PagerDuty incident — replacing 4 hours of manual review per release.

Key Pub/Sub Concepts for Agent Design

Dead-Letter TopicMessages that fail delivery after max_delivery_attempts are routed here. Agents should monitor DLT backlogs as a health signal — a growing DLT means something in the processing chain is broken.

Message OrderingEnable ordering keys when event sequence matters (e.g., user session events). Pub/Sub guarantees ordered delivery within a key when ordering is enabled, at a slight throughput cost.

SeekAllows rewinding a subscription to a past timestamp or snapshot. Critical for agent replay: if a processing bug is discovered, you can seek back and reprocess without re-ingesting source data.

Acknowledge DeadlineThe window (10s–600s) in which a subscriber must ack before Pub/Sub redelivers. Agents calling LLMs should extend deadlines with modifyAckDeadline to prevent duplicate processing.

Design Principle

Always separate your ingestion topic (raw events) from your processing topic (enriched/validated events). Agents consume from the processing topic, giving you a clean retry boundary and the ability to replay enrichment without re-ingesting source data.

Lesson 1 Quiz

Pub/Sub and Eventarc Foundations — 4 questions

1. What delivery guarantee does Cloud Pub/Sub provide by default?

Correct. Pub/Sub guarantees at-least-once delivery per subscription. Messages may be delivered more than once, so agents must implement idempotency. Exactly-once processing requires Dataflow's dedicated connector.

Incorrect. Pub/Sub provides at-least-once delivery. Exactly-once requires Dataflow's Pub/Sub connector with deduplication.

2. What distinguishes Eventarc from raw Pub/Sub for agent triggering?

Correct. Eventarc uses the CloudEvents 1.0 spec to carry typed, structured events (e.g., BigQuery job completion with job ID and outcome) — so agents receive semantic context, not raw bytes.

Incorrect. The key distinction is structured CloudEvents routing with semantic type metadata, not speed or retention differences.

3. An agent calling a Gemini model to process each message takes 45 seconds. The default Pub/Sub acknowledge deadline is 10 seconds. What will happen?

Correct. After the acknowledge deadline expires, Pub/Sub assumes the subscriber failed and redelivers the message. The agent should call modifyAckDeadline periodically during long LLM calls to prevent duplicates.

Incorrect. Pub/Sub does not auto-extend deadlines. Without calling modifyAckDeadline, the message will be redelivered, causing duplicate processing.

4. What is the purpose of a Dead-Letter Topic in a Pub/Sub agent pipeline?

Correct. The Dead-Letter Topic receives messages that couldn't be delivered after max_delivery_attempts. Monitoring DLT backlog size is a key health signal — growth indicates a systemic processing failure in the agent chain.

Incorrect. The Dead-Letter Topic captures failed messages after max retry exhaustion, serving as a poison-pill quarantine and health indicator.

Lab 1: Designing a Pub/Sub Agent Pipeline

Practice with your AI lab assistant — complete 3 exchanges to finish

Scenario

You're building a fraud detection agent for a payments platform. Events arrive via Cloud Pub/Sub at up to 50,000 messages per second. Your agent calls Vertex AI Gemini to score each transaction and publishes a decision to a response topic within 500 ms.

Work through the architecture with your assistant: How do you handle ack deadlines with LLM latency? What ordering strategy do you use? How do you configure the Dead-Letter Topic? Ask any question about the design.

AI Lab Assistant

Pub/Sub Architecture

Module 7 · Lesson 2

Dataflow Streaming and Windowing Strategies

Transforming infinite event streams into bounded, actionable data windows that agents can reason over.

Why does the choice between tumbling, sliding, and session windows determine whether your agent sees fraud or misses it entirely?

When Twitter's data infrastructure team rebuilt their real-time analytics in 2022 (documented in their engineering blog), they found that their existing 1-minute tumbling windows were missing coordinated bot attacks that spanned 90 seconds — just enough to straddle two windows and appear as low-volume noise in each. Switching to 2-minute sliding windows with 30-second slides increased bot cluster detection by 34% without increasing false positives, because each suspicious pattern now appeared in at least one complete window.

Apache Beam and Cloud Dataflow

Cloud Dataflow is Google's fully managed execution service for Apache Beam pipelines. For streaming, Dataflow provides autoscaling worker pools, exactly-once processing semantics via Pub/Sub, and native integration with BigQuery, Bigtable, and Cloud Storage.

For event-driven agents, Dataflow plays the enrichment and aggregation layer — it's not the agent itself but the infrastructure that turns raw event streams into the contextual windows an agent needs to make decisions. An agent reasoning about "is this user's behavior anomalous?" needs a window of that user's recent activity, not a single isolated event.

Window Types and When to Use Them

Window Type

Tumbling (Fixed)

Non-overlapping windows of fixed size (e.g., every 5 minutes). Each event belongs to exactly one window. Best for: periodic reporting, rate calculations, batch-style aggregations.

Window Type

Sliding

Overlapping windows defined by size and slide interval. A 10-min window sliding every 1 min means each event belongs to 10 windows. Best for: moving averages, trend detection, bot detection spanning multiple periods.

Window Type

Session

Dynamic windows closed by a gap in activity (e.g., 30 min of inactivity). Each user's continuous activity forms one session window of variable length. Best for: user behavior analysis, session-level fraud scoring.

Window Type

Global

A single window for all events. Requires explicit triggering logic (count-based, time-based, or custom). Best for: accumulation patterns where you want a running total fired periodically.

Watermarks and Late Data

A watermark is Dataflow's estimate of how far behind real-time the slowest event is expected to be. When the watermark passes the end of a window, Dataflow fires that window's results. The challenge: late-arriving data.

A mobile payment event might be generated at 14:00:00 but arrive at the pipeline at 14:02:30 due to network latency. If the window closed at 14:01:00, the event is "late." Dataflow's allowed lateness setting specifies how long to keep a window open for late data — after which late events are either dropped or sent to a side output for separate handling.

Production Case — Yelp 2023

Yelp's review spam detection pipeline (presented at Beam Summit 2023) uses session windows with a 4-hour gap threshold to group review bursts. An agent receives each closed session and uses Vertex AI to score the burst pattern. Sessions are enriched with reviewer history from Bigtable lookups in the Dataflow pipeline before the agent ever sees them — reducing LLM input tokens by 60% while increasing precision.

Triggers: Controlling When Windows Fire

Triggers control when a window emits results — independent of when the window closes. This is powerful for agent use cases that need early, speculative results before a window is complete.

AfterWatermarkFire once when the watermark passes the window end. The default trigger — clean, one result per window. Use when completeness matters more than latency.

AfterProcessingTimeFire after a fixed wall-clock delay since the first element arrived. Use for "I need a result every 30 seconds regardless of data completeness."

AfterCountFire after N elements accumulate. Use for micro-batch patterns where you want an agent to act every 1,000 events rather than waiting for a time window.

Composite TriggersCombine triggers with AfterEach, AfterFirst, or Repeatedly. A common pattern: fire early every 30 seconds (speculative), then fire once when the window closes (final).

~50ms

Dataflow P99 latency (auto-scaling)

10M+

Events/sec per pipeline

7 days

Max allowed lateness

Exactly-once

With Pub/Sub connector

Agent Design Insight

Send windowed aggregates to agents, not raw events. A Dataflow pipeline that counts, groups, and enriches events before forwarding to a Vertex AI agent reduces LLM calls by orders of magnitude and makes agent reasoning far more reliable — one structured context document is better than 10,000 raw events.

Lesson 2 Quiz

Dataflow Streaming and Windowing — 4 questions

1. An agent needs to detect coordinated bot behavior that spans up to 90 seconds. Which window type is most appropriate?

Correct. Sliding windows with overlap ensure no 90-second pattern straddles two consecutive windows and disappears as noise in each. Session windows would also capture activity but depend on gaps rather than ensuring coverage of a fixed-duration pattern.

Incorrect. Tumbling windows would miss patterns spanning the boundary. Sliding windows with sufficient overlap ensure the full pattern appears in at least one complete window.

2. What does the watermark represent in a Cloud Dataflow streaming pipeline?

Correct. The watermark is Dataflow's heuristic estimate of how far behind real-time the slowest events are expected to be. When it passes a window's end time, the system fires that window — assuming all relevant data has arrived.

Incorrect. The watermark is the system's estimate of event-time lag, used to determine when to close and fire windows. It's not a throughput limit or a checkpoint file.

3. You need an agent to receive a speculative early result every 30 seconds AND a final complete result when the window closes. Which trigger pattern achieves this?

Correct. A composite trigger combining Repeatedly(AfterProcessingTime(30s)) as the early trigger with AfterWatermark() as the on-time trigger gives speculative early results plus a definitive final result when the window is complete.

Incorrect. You need a composite trigger: AfterWatermark() alone gives only the final result. The early speculative result requires a Repeatedly(AfterProcessingTime) early trigger combined with the watermark trigger.

4. Why does enriching events in Dataflow before sending to a Vertex AI agent improve cost and quality?

Correct. Sending a structured window summary (counts, patterns, enriched user context) as one prompt is far more token-efficient than sending thousands of raw events individually. It also gives the agent a complete picture rather than isolated data points.

Incorrect. The benefit is semantic: a windowed aggregate gives the agent richer context in fewer tokens, improving both cost efficiency and reasoning quality.

Lab 2: Windowing Strategy for Bot Detection

Practice with your AI lab assistant — complete 3 exchanges to finish

Scenario

You're designing a Dataflow pipeline to detect coordinated inauthentic behavior on a social platform. Events include likes, shares, follows, and comments. Coordinated attacks typically span 60–120 seconds across 10–200 accounts acting in near-unison.

Discuss the windowing strategy with your assistant: Which window type and parameters? How do you handle late mobile events? What features should the Dataflow pipeline extract before the agent call? How do you balance latency vs. accuracy?

AI Lab Assistant

Dataflow Windowing

Module 7 · Lesson 3

Vertex AI Agents Triggered by Live Events

Connecting Vertex AI Agent Builder to the event stream — from trigger to tool call to response in under a second.

How does a Vertex AI agent chained to Eventarc differ from a traditional rule-based alerting system — and when does the difference actually matter?

Wayfair's engineering team published a case study in Q3 2023 describing how they replaced a rules engine with a Vertex AI agent for real-time inventory anomaly response. The previous system had 847 hand-coded rules requiring monthly maintenance. The new agent received Eventarc notifications whenever inventory fell below dynamic threshold — then used tools to query warehouse APIs, check supplier lead times, and draft a purchase order recommendation. Time-to-recommendation dropped from 4 hours to 38 seconds. The agent handled novel situations (supplier strikes, weather delays) that no static rule anticipated.

Vertex AI Agent Builder Architecture

Vertex AI Agent Builder (previously Vertex AI Conversation / Dialogflow CX) provides a managed environment for building ReAct-style agents that use tools, maintain state, and can be triggered by both human input and automated events. For event-driven use cases, the agent is deployed as a Cloud Run endpoint that Eventarc or Pub/Sub push subscriptions call directly.

Eventarc Trigger

→

Cloud Run (Agent Endpoint)

→

Vertex AI Agent Builder

→

Tool Calls (APIs, BQ, Bigtable)

→

Action / Response

Event-to-Agent Integration Patterns

Pattern 1

Stateless Single-Event Agent

Each event triggers a fresh agent invocation with no memory of previous events. Simple, horizontally scalable. Ideal for: document classification, single-transaction fraud scoring, alert triage.

Pattern 2

Stateful Session Agent

Agent maintains conversation state across multiple related events using Firestore or Memorystore. Events within a session are correlated. Ideal for: multi-step checkout fraud, user journey analysis, incident management.

Pattern 3

Fan-Out Agent Orchestration

One trigger event spawns multiple parallel sub-agents via Workflows. Each sub-agent handles one aspect (inventory, pricing, compliance). Results are aggregated before action. Ideal for: complex business decisions requiring multiple domain checks.

Pattern 4

Streaming Accumulator Agent

Agent receives windowed Dataflow output (aggregated context) rather than individual events. Most token-efficient. Ideal for: anomaly detection over populations, trend-based alerting, market surveillance.

Tool Design for Real-Time Agents

Event-driven agents operate under strict latency budgets. Every tool call adds latency. The design principle: tools must be fast, agents must be decisive. Aim for tool P99 latencies under 200 ms; use Bigtable for key-value lookups, BigQuery with BI Engine for cached metrics, and Memorystore (Redis) for ephemeral state.

Grounding ToolFetches current state from authoritative sources (Bigtable for user profiles, BigQuery for aggregated metrics). Called at the start of reasoning to establish context. Must be idempotent and cacheable.

Action ToolMutates state: updates a database, calls an external API, publishes to a Pub/Sub topic. Should be called at most once per agent invocation. Use Workflows for guaranteed exactly-once execution of actions.

Escalation ToolPublishes to a human-review queue (Cloud Tasks + internal ticketing API). Used when agent confidence is low or the decision has irreversible high-value consequences. Always include the agent's reasoning trace.

Circuit BreakerIf an external tool fails N consecutive times, the agent should stop calling it and escalate rather than retrying indefinitely. Implement via a shared Memorystore counter per tool endpoint.

Real-World Implementation — DoorDash 2023

DoorDash's Dasher fraud team (Google Cloud Next 2023 session) used a stateful Vertex AI agent triggered by Eventarc on each delivery completion event. The agent checked against Bigtable for the Dasher's 30-day delivery pattern, called a geospatial tool to verify route plausibility, and either approved payment or escalated to human review within 800 ms — handling 2 million deliveries per day with a false-positive rate under 0.3%.

Latency Budget Framework

Total budget: 500 ms. Allocate: Event routing (Eventarc/Pub/Sub): ~20 ms. Agent startup (Cloud Run, warm): ~30 ms. Grounding tool calls (2 × Bigtable): ~80 ms. Gemini reasoning: ~200 ms. Action tool call: ~100 ms. Response publication: ~20 ms. Buffer: 50 ms. Each layer must be profiled with Cloud Trace to identify bottlenecks.

Lesson 3 Quiz

Vertex AI Agents Triggered by Live Events — 4 questions

1. A Vertex AI agent handles fraud scoring for individual payment events. It needs no memory of previous events. Which pattern is most appropriate?

Correct. When each event is independent and requires no memory of previous events, stateless single-event agents are ideal. They scale horizontally with Pub/Sub throughput and have no state management overhead.

Incorrect. For independent, stateless decisions per event, the stateless single-event pattern is most appropriate. Session state adds overhead without benefit when events are independent.

2. What is the primary role of a "Grounding Tool" in a real-time event-driven agent?

Correct. A grounding tool establishes the factual context the agent needs to reason: user profiles, recent behavior, cached metrics. It must be fast (sub-100 ms) and idempotent since it's called at the start of every invocation.

Incorrect. The grounding tool fetches authoritative current state to establish context. Publishing decisions is an action tool; escalation is an escalation tool.

3. A tool that calls an external supplier API is failing 30% of the time due to the supplier's instability. What is the correct agent design response?

Correct. A circuit breaker tracks consecutive failures in Memorystore. After N failures, the agent stops retrying and escalates rather than blocking on a broken dependency — preserving throughput and ensuring critical decisions still get human oversight.

Incorrect. Retrying indefinitely blocks the agent and wastes the latency budget. A circuit breaker stops retrying after N failures and escalates, maintaining throughput and ensuring human oversight.

4. In a 500 ms latency budget for an event-driven agent, Gemini reasoning is allocated 200 ms. The total tool call time is consistently 350 ms. What is the correct remediation?

Correct. Tool calls at 350 ms are the bottleneck. Cloud Trace identifies which tool is slow. Migrating key-value lookups from BigQuery (hundreds of ms) to Bigtable or Memorystore (single-digit ms) is the right fix — not reducing the model size or throughput.

Incorrect. If tool calls consume 350 ms of a 500 ms budget, the fix is optimizing tool latency — migrate slow BigQuery lookups to Bigtable or Memorystore. Reducing model capability or throughput doesn't address the bottleneck.

Lab 3: Designing an Inventory Anomaly Agent

Practice with your AI lab assistant — complete 3 exchanges to finish

Scenario

You're building a Vertex AI agent that responds to Eventarc notifications when inventory levels breach dynamic thresholds. The agent must query warehouse systems, check supplier lead times, and either auto-generate a purchase order or escalate to a buyer — all within 500 ms.

Design the agent's tool set with your assistant: What grounding tools does it need? When should it escalate vs. act autonomously? How do you handle the circuit breaker for unreliable supplier APIs? What does the agent's reasoning trace look like?

AI Lab Assistant

Event-Driven Agent Design

Module 7 · Lesson 4

Observability, Idempotency, and Production Hardening

The operational discipline that separates a working prototype from a production event-driven agent that actually stays working.

When a Pub/Sub message is delivered twice and your agent acts twice — blocking a customer account twice — how does idempotency design prevent that from being your crisis?

In March 2020, Robinhood experienced a sequence of outages during peak trading volatility. Post-incident analysis (published in their engineering blog) revealed that a retry storm in their event processing pipeline had caused duplicate order submissions — because their event consumers lacked idempotency keys. A single trade event, redelivered three times by the message bus, triggered three order submissions before the duplicate was caught by a downstream circuit breaker. The fix required not just idempotency keys but exactly-once semantics at the persistence layer, enforced by a deduplication window in Cloud Spanner.

Idempotency in Event-Driven Agent Pipelines

Because Pub/Sub provides at-least-once delivery, every agent must be designed to produce the same outcome whether it receives a message once or ten times. This is idempotency. For read-only agents (classification, scoring), idempotency is natural. For agents that take actions (write to a database, call an API, send an email), idempotency requires explicit design.

The standard pattern: derive a deterministic idempotency key from the event (e.g., SHA-256 of the message ID + action type), check Memorystore or Spanner before executing the action, and write the key atomically with the action. If the key already exists, skip and ack.

Technique

Message Deduplication

Use Pub/Sub's message ID as a deduplication key stored in Memorystore (TTL = 2× your max redelivery window). Check before processing; write atomically after. Handles ~99% of duplicate cases.

Technique

Conditional Writes

For Firestore/Spanner: use conditional mutations (if-not-exists or compare-and-swap). The action only commits if the idempotency key doesn't exist. Guarantees exactly-once side effects at the persistence layer.

Technique

Outbox Pattern

Write the intended action to an outbox table in the same transaction as the state change. A separate worker reads the outbox and executes actions, marking them done. Decouples event processing from action execution.

Technique

Seek-and-Replay Safety

When replaying events using Pub/Sub Seek, your idempotency layer automatically prevents duplicate actions. This makes replay safe to use for bug fixes — you can reprocess the last 7 days without fear of duplicate side effects.

Observability: The Three Signals

Google's SRE book defines three primary observability signals for production systems. For event-driven agent pipelines, each requires specific instrumentation.

Metrics (Cloud Monitoring)Track: subscription/oldest_unacked_message_age (DLT backlog age), agent invocation latency (P50/P99/P999), tool call success rate by tool name, agent decision distribution (action vs. escalate vs. skip). Alert on: oldest_unacked_message_age > 60s (pipeline stall), P99 latency > budget × 1.5.

Traces (Cloud Trace)Instrument every agent invocation as a root trace span. Add child spans for each tool call, LLM invocation, and action. Use trace sampling at 5% in steady state, 100% for DLT messages. Correlate trace IDs with Pub/Sub message IDs for end-to-end debugging.

Logs (Cloud Logging)Structured JSON logs only. Mandatory fields: message_id, event_type, agent_decision, reasoning_summary (truncated), latency_ms, tools_called[]. Never log PII. Export decision logs to BigQuery for audit and model improvement.

Production Pattern — Stripe 2022

Stripe's fraud engineering team (documented in their engineering blog, September 2022) built a dedicated "agent observability dashboard" in Looker Studio fed by BigQuery export of structured agent logs. Every agent decision was logged with its reasoning chain hash, tools called, latency breakdown, and outcome. When a model update degraded precision by 2%, the dashboard surfaced it within 10 minutes — before it caused significant financial impact. Human-review queue volume was the canary metric.

Production Hardening Checklist

Before promoting an event-driven agent to production, validate each of these properties:

Idempotency: Replay 1 hour of production events in staging. Verify zero duplicate actions. Measure deduplication cache hit rate.

Latency under load: Run 150% of peak expected Pub/Sub throughput for 15 minutes. Measure P99 and P999 agent latency. Verify Ack deadline extensions are firing correctly for slow LLM calls.

DLT monitoring: Deliberately poison 0.1% of test messages. Verify DLT backlog alert fires within 2 minutes. Verify DLT replay procedure works end-to-end.

Graceful degradation: Kill one of three tool endpoints. Verify circuit breaker activates, agent escalates rather than times out, and throughput is maintained.

Cost ceiling: Set Cloud Billing budget alerts at 110% and 130% of expected daily cost. Pub/Sub and Vertex AI costs both scale with volume — an upstream data spike should not cause unbounded spend.

7 days

Pub/Sub replay window

<2 min

Target DLT alert latency

100%

DLT trace sampling rate

0 PII

In structured logs

The Hardening Mindset

Event-driven agents fail in two modes: silent failures (messages processed incorrectly with no error logged) and noisy failures (DLT backlog grows, alerts fire, humans intervene). Design for noisy failures — they are recoverable. Silent failures erode trust invisibly until a business impact surfaces weeks later. Comprehensive structured logging and aggressive alerting are your insurance policy.

Lesson 4 Quiz

Observability, Idempotency, and Production Hardening — 4 questions

1. An agent sends a promotional email on each Pub/Sub message. The same message is redelivered 3 times due to a network hiccup. Without idempotency, how many emails does the customer receive?

Correct. Without idempotency, each of the 3 deliveries triggers a separate email send. Pub/Sub guarantees at-least-once, not at-most-once. The agent must implement its own deduplication using the message ID as an idempotency key.

Incorrect. Pub/Sub provides at-least-once delivery with no built-in deduplication at the application level. Without idempotency in the agent, all 3 deliveries trigger email sends.

2. What is the correct approach to idempotency for an agent that writes a purchase order to Cloud Spanner?

Correct. A conditional mutation in Spanner (insert-or-ignore / compare-and-swap) with an idempotency key derived from the message ID guarantees exactly-once writes at the persistence layer, even if the agent processes the same event multiple times.

Incorrect. The correct pattern is a conditional mutation (if-not-exists) in Spanner using an idempotency key. Checking logs is unreliable and slow; message retention doesn't prevent redelivery within the retention window.

3. The subscription/oldest_unacked_message_age metric is growing at 5 seconds per minute. What does this indicate?

Correct. A growing oldest_unacked_message_age means the backlog is accumulating — the agent is falling behind the arrival rate. Left uncorrected, this becomes a full pipeline stall. Autoscaling Cloud Run or adding workers is the immediate response.

Incorrect. oldest_unacked_message_age growth signals a growing backlog — messages are arriving faster than the agent processes them. This is a pipeline stall in development.

4. Why should Dead-Letter Topic messages be traced at 100% sampling rate rather than the standard 5%?

Correct. DLT messages are exceptions — they failed delivery or processing. Every failure is diagnostic data. At 5% sampling you'd miss 95% of failure root causes. 100% DLT tracing gives you complete visibility into why messages are failing.

Incorrect. 100% DLT sampling is warranted because every DLT message represents a failure. Sampling failures at 5% means missing 95% of root-cause information — exactly when you need complete data the most.

Lab 4: Hardening a Production Agent Pipeline

Practice with your AI lab assistant — complete 3 exchanges to finish

Scenario

Your event-driven Vertex AI agent has been in production for 2 weeks. You've discovered a bug: 0.3% of processed events are generating duplicate database writes. The DLT backlog is also growing at low but non-zero rate. You need to harden the pipeline before the upcoming holiday traffic spike.

Work through the production hardening plan with your assistant: How do you implement idempotency to stop duplicate writes? What observability gaps does your DLT growth reveal? How do you load-test the pipeline safely before the traffic spike? What's your rollback plan if the fix introduces new issues?

AI Lab Assistant

Production Hardening

Module 7 Test

Real-Time Data and Event-Driven Agents — 15 questions · 80% to pass

1. Which Cloud Pub/Sub feature allows rewinding a subscription to reprocess events from 3 days ago?

Correct. Pub/Sub Seek allows winding back a subscription to any timestamp within the retention window (up to 7 days), enabling safe event replay.

Incorrect. The Seek feature allows rewinding to a past timestamp or snapshot for replay.

2. Eventarc is based on which specification for structured event interchange?

Correct. Eventarc implements the CloudEvents 1.0 CNCF specification, providing typed, structured event routing with standardized metadata fields.

Incorrect. Eventarc uses the CloudEvents 1.0 specification from the CNCF.

3. A Dataflow session window has a 30-minute gap threshold. A user is active at 10:00, 10:15, and then again at 11:00. How many session windows are created?

Correct. The 45-minute gap between 10:15 and 11:00 exceeds the 30-minute threshold, so the first session closes at 10:15+30min and a new session opens at 11:00.

Incorrect. The gap between 10:15 and 11:00 is 45 minutes — exceeding the 30-minute threshold — so two separate sessions are created.

4. What is the key operational difference between push and pull Pub/Sub subscriptions for agent endpoints?

Correct. Push subscriptions HTTP-POST to a configured endpoint (ideal for Cloud Run agents that don't poll). Pull subscriptions require the consumer to call Pub/Sub.pull() to receive messages.

Incorrect. The key difference is delivery direction: push sends to endpoint, pull requires the consumer to fetch. Both provide at-least-once delivery.

5. An agent using sliding windows (10-min window, 2-min slide) receives an event at 14:05:30. To how many windows does this event belong?

Correct. With a 10-minute window and 2-minute slide, the window-size/slide-interval ratio is 10/2 = 5. Each event belongs to 5 overlapping windows.

Incorrect. For a sliding window, events belong to window-size ÷ slide-interval windows. 10 min ÷ 2 min = 5 windows per event.

6. Google Pay's fraud pipeline achieved sub-200 ms global event delivery by using which Pub/Sub delivery mode?

Correct. Google Pay replaced polling with Pub/Sub push subscriptions feeding Dataflow, achieving sub-200 ms P99 global delivery even under 40× traffic spikes.

Incorrect. Google Pay used push subscriptions feeding Dataflow pipelines to achieve sub-200 ms global event delivery.

7. What does the outbox pattern achieve in an event-driven agent pipeline?

Correct. The outbox pattern writes intended actions to a transactional table in the same transaction as state changes. A separate worker reads and executes actions, marking them done — guaranteeing exactly-once side effects even under failure.

Incorrect. The outbox pattern writes intended actions to a transactional table, decoupling event processing from action execution to guarantee exactly-once side effects.

8. When should an event-driven agent use a session window rather than a tumbling window?

Correct. Session windows capture a user's continuous behavioral activity (all events until a gap exceeds the threshold) as one coherent unit — ideal for user journey analysis, session-level fraud scoring, and activity pattern detection.

Incorrect. Session windows are chosen when continuous user activity (closed by inactivity gaps) is the meaningful unit of analysis, not for throughput or delivery guarantee reasons.

9. Spotify's validation agent (Google Cloud Next 2023) replaced manual review of Dataform releases. What was the trigger event?

Correct. Spotify's agent was triggered by an Eventarc notification on Dataform release completion. The agent then compared row counts, schema drift, and null rates before auto-approving or escalating.

Incorrect. Spotify's validation agent was triggered by Eventarc on Dataform release completion events, replacing 4 hours of manual review per release.

10. What is the correct latency target for a Bigtable grounding tool call in a 500 ms agent budget?

Correct. In the 500 ms framework: two Bigtable lookups budgeted at ~40 ms each (total ~80 ms), leaving ~200 ms for Gemini reasoning and ~100 ms for action tools. Bigtable P99 latencies are typically 5–15 ms in practice.

Incorrect. Grounding tool calls should be under ~100 ms total to leave adequate headroom for LLM reasoning (~200 ms) and action tools (~100 ms) within the 500 ms budget.

11. What does a circuit breaker for an agent tool endpoint track in Memorystore?

Correct. The circuit breaker pattern uses a Memorystore counter of consecutive failures per tool endpoint. When the count exceeds a threshold, the circuit "opens" — the agent stops calling the tool and escalates instead of retrying indefinitely.

Incorrect. Circuit breakers track consecutive failure counts per tool endpoint. Exceeding the threshold triggers circuit-open state: escalate rather than retry.

12. Yelp's review spam detection pipeline uses what window type and what Vertex AI component for scoring?

Correct. Yelp's pipeline (Beam Summit 2023) used session windows with a 4-hour gap threshold to group review bursts, then sent enriched session summaries to a Vertex AI agent for pattern scoring.

Incorrect. Yelp used session windows with a 4-hour gap threshold, with enriched session summaries sent to a Vertex AI agent.

13. Which metric in Cloud Monitoring best indicates a developing pipeline stall in an event-driven agent system?

Correct. subscription/oldest_unacked_message_age measures how old the oldest unprocessed message is. A growing value indicates the agent is falling behind the arrival rate — a pipeline stall in development.

Incorrect. oldest_unacked_message_age is the key metric: a growing backlog age reveals that processing is slower than arrival rate.

14. DoorDash's Dasher fraud agent (Google Cloud Next 2023) achieved what end-to-end decision latency handling 2 million deliveries per day?

Correct. DoorDash's stateful Vertex AI agent processed delivery completion events, checked Bigtable history, ran geospatial verification, and made approve/escalate decisions within 800 ms at 2 million deliveries per day.

Incorrect. DoorDash's agent achieved under 800 ms end-to-end latency at 2 million deliveries per day with a false-positive rate under 0.3%.

15. What must be included in structured agent decision logs for later model improvement analysis in BigQuery?

Correct. Structured decision logs should include message_id, event_type, agent_decision, reasoning_summary (truncated), latency_ms, and tools_called[] — providing complete audit and improvement data while explicitly excluding PII.

Incorrect. Logs need message ID, event type, decision, truncated reasoning, latency breakdown, and tools called — never PII. Logging only the final decision is insufficient for model improvement.