Multi-cloud architectures almost never result from a clean architectural decision. They accumulate. An acquisition brings an Azure-native analytics stack. A machine learning team standardizes on SageMaker before the platform team commits to Vertex AI. A regulatory requirement mandates that certain data never leave a specific geography where only one provider has compliant regions.
Understanding this history matters for agentic pipelines because the data sources your agents must reach were designed for different principals, different auth systems, and different network models. The agent can't assume any shared control plane.
A Vertex AI agent reading from BigQuery can rely on a single IAM model, unified logging, and shared VPC networking. The moment that agent also needs to read from Amazon S3, it encounters a parallel universe: IAM roles are AWS-native, credentials are signed with SigV4, and access policies live in a completely separate console.
The agent itself doesn't care — it calls a tool, the tool returns data. But the engineering work that makes that tool safe, monitored, and auditable must bridge two fundamentally different identity and access frameworks. This lesson establishes the conceptual model; subsequent lessons address each connection pattern in depth.
When connecting an agentic workflow on Google Cloud to external cloud data sources, three architectural patterns emerge. They are not mutually exclusive — most production deployments combine all three depending on data volume, latency requirements, and cost constraints.
When Twitter (now X) migrated parts of its infrastructure in 2022, engineering teams publicly documented the challenge of maintaining data pipelines that spanned both AWS and GCP simultaneously during the transition. Kafka clusters on AWS fed Dataflow jobs on GCP — the event bridge pattern at production scale.
Network connectivity between clouds is solved: VPN, Interconnect, or simply HTTPS over the public internet. The genuinely hard problem is identity. How does a Vertex AI agent, running as a GCP service account, prove to AWS IAM or Azure AD that it has permission to read a specific S3 bucket or Azure Blob container?
The naive answer — store AWS access keys or an Azure client secret in Secret Manager — works but creates a credential management burden and eliminates the auditability benefits of federated identity. The preferred answer is Workload Identity Federation, which we explore in depth in Lesson 2.
Multi-cloud data access for agents is fundamentally an identity problem dressed as a networking problem. Once you have credential federation working, the actual data retrieval is standard API calls. Most of the engineering work lives in the auth layer.
Your organization runs analytics on Google Cloud (BigQuery + Vertex AI) but acquired a company whose entire data estate lives in AWS S3, plus a legacy HR system that writes to Azure SQL Database. You need to design an agentic pipeline that can pull from all three sources to generate a weekly workforce analytics report.
Workload Identity Federation works through a three-party exchange. Understanding each step clarifies where to configure permissions and where failures surface when debugging.
On the AWS side, you create an IAM Identity Provider that trusts GCP's OIDC endpoint, then an IAM Role with a trust policy allowing the specific GCP service account to assume it. The critical constraint: the GCP service account email must match the condition in the AWS trust policy, otherwise AssumeRoleWithWebIdentity returns an access denied error.
On the GCP side, you generate a credential configuration JSON using gcloud iam workload-identity-pools create and store it alongside your agent code (not in Secret Manager — it contains no secrets, only configuration). The Google Cloud client libraries automatically handle the token exchange when this file is set as the Application Default Credentials source.
Azure federation requires registering a Workload Identity Pool provider in GCP that points to Azure AD's OIDC discovery endpoint (https://login.microsoftonline.com/TENANT_ID/v2.0). The Azure-side configuration creates an app registration with a federated credential that specifies the GCP issuer, pool, and subject.
A key difference from AWS: Azure issues access tokens scoped to specific resources (e.g., Azure SQL, Azure Blob Storage), not to a generic role. Your agent must request tokens for the specific Azure resource it intends to access, which means the credential configuration is resource-type-specific rather than account-level.
The GCP service account numeric ID (not email) is the identifier used in AWS trust policies. If you grant trust to the service account email without pinning to the numeric ID, account renaming could allow a different account to assume your AWS role. Always use the numeric ID in the sub condition.
The three most common federation failures in production and how to diagnose them:
1. "Token audience mismatch" — The OIDC token was issued for a different audience than what the AWS or Azure provider expects. Check that the credential configuration file's audience field matches the OIDC provider configuration in the target cloud.
2. "No matching statement" (AWS) or "AADSTS70021" (Azure) — The service account subject claim doesn't match the trust policy condition. Verify the numeric ID (not email) is in the AWS condition, or that the federated credential subject on the Azure app registration matches the pool subject pattern.
3. "Token expired" — Federation tokens have short lifetimes. Ensure your agent code uses the Google Cloud client libraries rather than manually caching credentials. The libraries handle automatic refresh.
Workload Identity Federation eliminates secret rotation but introduces a new audit surface: Workload Identity Pool activity logs in Cloud Audit Logs and AWS CloudTrail AssumeRole events in parallel. Monitor both — a compromised agent shows up in both logs as unusual AssumeRole frequency from an unexpected source IP.
Your Vertex AI agent is returning an error when attempting to access an S3 bucket via Workload Identity Federation. The error message is: "An error occurred (AccessDenied) when calling the AssumeRoleWithWebIdentity operation: Not authorized to perform sts:AssumeRole". Your credential configuration file looks syntactically correct. Walk through the debugging process with the assistant.
BigQuery Omni is Google's managed compute deployed in specific AWS and Azure regions. When you create an Omni dataset, you choose a region that maps to an AWS or Azure region (e.g., aws-us-east-1 maps to AWS us-east-1). BigQuery tables in that dataset are backed by data in S3 or ADLS Gen2 — specifically, by external tables or BigLake tables pointing to files in those storage services.
When you run a query against an Omni dataset, BigQuery routes the query to the Omni compute cluster in the corresponding cloud region. That compute reads from S3 or ADLS Gen2 over local networking (no cross-cloud egress), executes the query, and returns results to your BigQuery project. The result data does cross cloud boundaries — but typically query results are much smaller than the underlying data.
The first step is creating a BigQuery connection to S3 or ADLS Gen2. This connection uses a dedicated Google-managed service account that must be granted read access to the source storage.
From the agent's perspective, an Omni external table is indistinguishable from any other BigQuery table. The agent's BigQuery tool calls standard BigQuery APIs — it doesn't need to know that the underlying data lives in S3. This is the key architectural benefit: cross-cloud data complexity is absorbed by the BigQuery layer, not the agent layer.
In practice, agents should be aware of Omni query latency. A query that runs in 2 seconds on BigQuery-native data might take 6-8 seconds on Omni, because the compute cluster in AWS has a cold-start overhead and the query is routed cross-cloud. For synchronous agent flows, this matters; for batch or background analysis, it typically doesn't.
When you JOIN an Omni table (data in S3) with a BigQuery-native table (data in GCP), BigQuery must physically broadcast one side of the join across clouds. For large tables, this can incur significant data transfer costs and latency. Structure your queries to filter Omni data heavily before joining — push predicates into the Omni scan, not the join.
For Omni workloads, Google recommends BigLake tables over standard external tables. BigLake adds row- and column-level security policies that are enforced even when the data is accessed via third-party tools (Spark, Presto) outside of BigQuery. For agentic pipelines where the agent's output might be consumed by multiple downstream tools, BigLake's centralized access control is significantly easier to manage than per-tool S3 bucket policies.
Omni is not appropriate when: (1) your S3 data is in a region without Omni coverage, (2) query latency under 3 seconds is required, (3) your queries require complex cross-cloud JOINs on large tables, or (4) data must be transformed before analysis (use Dataflow replication instead). Use Omni when data gravity and egress costs favor keeping data in the source cloud.
You have 5TB of daily Parquet files in S3 (us-east-1) representing e-commerce order events. Your Vertex AI agent needs to produce a daily summary joining this S3 data with a 10M-row customer dimension table in BigQuery. Current query runtime is 45 seconds and costs $2.50 per execution. Your target is under 15 seconds and under $0.75.
All cross-cloud streaming approaches share a fundamental latency constraint: data must physically traverse the internet between cloud providers. The minimum round-trip time between major cloud regions in the same geography is approximately 10-30ms for the network hop alone. When you add serialization, protocol overhead, and buffering, the practical minimum end-to-end latency for cross-cloud streaming is 200-500ms under good conditions and 800-1500ms when including buffering for reliability.
For most agentic workflows — fraud analysis on 10-second windows, demand forecasting, log anomaly detection — this latency is acceptable. For HFT or real-time gaming, it is not. Know your latency budget before choosing cross-cloud streaming.
The most direct path from AWS Kinesis to Google Cloud Pub/Sub uses the Dataflow Kinesis to Pub/Sub template, which Google provides as a managed Dataflow job. The Dataflow job runs in GCP, polls Kinesis shards using the Kinesis Consumer Library (KCL), and publishes each record as a Pub/Sub message.
Authentication for the Dataflow job to read from Kinesis uses Workload Identity Federation (covered in Lesson 2). The Dataflow worker service account federates into an AWS IAM role with Kinesis:GetRecords and Kinesis:GetShardIterator permissions.
The Dataflow job checkpoints shard progress in Kinesis (using the sequence number) and in Cloud Spanner or GCS, providing at-least-once delivery semantics. Messages may be duplicated across the bridge — downstream Dataflow jobs reading from Pub/Sub should implement deduplication logic using the Kinesis sequence number preserved in the Pub/Sub message attributes.
Azure Event Hubs exposes an AMQP 1.0 interface that is compatible with Apache Kafka's wire protocol. This means a Kafka consumer running in GCP can read from Event Hubs using the standard Kafka Java client, configuring Event Hubs as if it were a Kafka broker. The Dataflow KafkaIO source supports this directly.
For simpler scenarios — particularly when the source system supports webhooks or HTTP push — you can configure AWS Lambda or Azure Functions to publish events directly to a Pub/Sub REST endpoint. The source function calls the Pub/Sub publish API using a service account token obtained via Workload Identity Federation.
This approach adds no persistent infrastructure (no Dataflow job to manage) but limits throughput. The Pub/Sub HTTP API handles millions of messages per second globally, so the bottleneck is typically the Lambda/Function concurrency limit at the source. For event rates above ~10,000/second from a single source, prefer the Dataflow bridge approach.
Once events reach Pub/Sub, the agent pipeline is identical to a native GCP streaming scenario. A Vertex AI agent tool can subscribe to a Pub/Sub subscription, read messages, parse the payload, and take action — completely unaware that the events originated in Kinesis or Event Hubs.
The design consideration for agents in this context is windowing. Because cross-cloud bridges introduce variable latency, events may arrive out of order relative to their source timestamps. Agents performing time-windowed analysis should use Dataflow's event-time windowing (which handles late arrivals via watermarks) rather than processing-time windows. This is particularly important for fraud detection and anomaly detection use cases where event ordering matters.
Pub/Sub message ingestion is priced per TiB of message payload. For high-volume cross-cloud streams, compress messages before publishing. Avro with snappy compression typically reduces Pub/Sub costs by 60-75% compared to JSON, at the cost of requiring schema management via Schema Registry or Pub/Sub Schemas.
Cross-cloud streaming is a solved problem — the bridge patterns are well-documented and managed templates exist. The non-trivial design decisions are: (1) latency budget analysis against your use case, (2) deduplication strategy for at-least-once delivery, (3) out-of-order event handling via event-time windowing, and (4) compression strategy to control Pub/Sub ingestion costs at volume.
A payment processor runs their transaction event stream on AWS Kinesis (us-east-1, 50,000 events/sec peak). A Vertex AI agent on Google Cloud performs fraud scoring using a custom model. The agent needs transaction events within 2 seconds of occurrence at the source. Design the complete pipeline from Kinesis to the Vertex AI agent, including bridge architecture, deduplication, windowing, and failure handling.