Module 3 · Lesson 1

Cloud Storage as an Agent's Document Backbone

How GCS buckets, object events, and IAM become the foundation every document agent relies on.

What must an agent know about storage before it can act on a single document?

In 2022, Google Cloud published a case study on how Lowe's Companies restructured its supplier invoice pipeline. Each month, more than 4 million supplier documents arrived in varied formats — PDF invoices, scanned receipts, Excel remittances. The team's first architectural decision was not an AI model choice: it was agreeing on a single, auditable landing zone in Cloud Storage before any agent logic ran. Every downstream automation depended on that foundation being correct.

Why Storage Architecture Comes First

Document agents are stateless by design — they wake on a trigger, perform a task, and exit. The document itself, however, must persist reliably before, during, and after agent execution. Google Cloud Storage (GCS) provides eleven nines of durability, consistent strong read-after-write semantics, and a flat namespace that maps cleanly to agent task queues.

Three properties of GCS are especially important for agent workflows: object versioning (agents can audit every state a document passed through), object metadata (agents attach and read custom key-value pairs without touching document content), and Pub/Sub notifications (a bucket can push an event to any subscriber the instant a new object lands).

Storage Class

Standard vs. Nearline vs. Archive

Active pipeline documents use Standard. Processed archives use Nearline (30-day minimum). Long-term audit copies use Archive. Lifecycle rules demote automatically.

Object Naming

Prefix-Based Sharding

GCS has no true folders. Prefixes like incoming/2024-11/ or processed/invoices/ create virtual directory trees agents navigate with list operations.

Event Trigger

Pub/Sub Notifications

A bucket-level notification config pushes OBJECT_FINALIZE events to a Pub/Sub topic. Any agent subscribed to that topic acts within seconds of upload.

Access Control

Service Account IAM

Agents run as service accounts. Least-privilege roles — roles/storage.objectViewer for read, roles/storage.objectCreator for write — limit blast radius.

The Bucket Layout Pattern

Production document pipelines almost universally adopt a three-bucket pattern rather than a single bucket with complex prefix logic. Separation makes IAM boundaries clean and prevents an agent bug from overwriting source documents.

📥 landing-bucket
raw uploads, WORM lock optional

→ agent reads →

⚙️ processing-bucket
in-flight, temp objects

→ agent writes →

✅ output-bucket
structured results, BigQuery load targets

Pub/Sub Trigger Configuration

The gcloud storage buckets notifications create command binds a bucket to a Pub/Sub topic. The agent consumes messages from a subscription attached to that topic. The critical field in each notification is name — the full GCS object path — which the agent uses to fetch and process the document.

# Create a Pub/Sub topic for document arrival events
gcloud pubsub topics create doc-landing-events

# Bind bucket to topic: fire on every new object finalized
gcloud storage buckets notifications create \
  gs://my-landing-bucket \
  --topic=doc-landing-events \
  --event-types=OBJECT_FINALIZE \
  --payload-format=JSON_API_V1

# Create a pull subscription for the agent to consume
gcloud pubsub subscriptions create doc-agent-sub \
  --topic=doc-landing-events \
  --ack-deadline=60

Agent Design Rule

Always set an ack deadline longer than your agent's expected Document AI processing time. If the agent crashes mid-run before acknowledging, Pub/Sub redelivers the message and a second agent instance retries — providing at-least-once delivery without extra retry logic.

Key Terms

OBJECT_FINALIZEThe GCS Pub/Sub event type emitted when an object upload completes successfully. Agents should trigger on this event, not on OBJECT_CREATE, which can fire for partial resumable uploads.

Object VersioningGCS feature that retains prior versions of an object when it is overwritten. Enables agents to inspect document history and roll back corrupted outputs.

WORM LockWrite-Once-Read-Many via GCS Object Retention or Bucket Lock. Prevents any principal, including service accounts, from deleting or overwriting protected objects during the retention period.

Uniform Bucket-Level AccessDisables per-object ACLs in favor of IAM-only access control. Recommended for all agent pipelines because it makes permission auditing deterministic.

Real-World Note

In the Lowe's pipeline, the team enforced uniform bucket-level access on the landing bucket and granted the Document AI service account only roles/storage.objectViewer on that bucket and roles/storage.objectCreator on the output bucket. This contained a misconfiguration incident in 2023 where a test agent nearly overwrote live invoice data — the IAM boundary stopped it cold.

Module 3 · Lesson 1

Quiz: Cloud Storage Foundations

Four questions — choose the best answer for each.

Which GCS Pub/Sub event type should a document agent trigger on to ensure it only processes fully uploaded objects?

Correct. OBJECT_FINALIZE fires only when an upload fully completes, including resumable uploads. OBJECT_CREATE can fire mid-upload for resumable sessions.

Not quite. OBJECT_CREATE can fire for incomplete resumable uploads. OBJECT_FINALIZE is the safe trigger because it fires only on successful completion.

A document agent needs to read from a landing bucket and write results to an output bucket. Applying least privilege, which two IAM roles are appropriate?

Correct. The agent only needs to read from landing (objectViewer) and create new objects in output (objectCreator). Admin or Owner roles grant far more than necessary.

Not quite. The agent reads from the landing bucket (needs Viewer) and writes to the output bucket (needs Creator). The roles are reversed in option C, and admin/owner in options A/D is excessive.

Why is the three-bucket pattern (landing → processing → output) recommended over a single bucket with prefix logic?

Correct. IAM policies apply at the bucket level, not the prefix level. Separate buckets mean a bug in the processing agent cannot reach the original landing documents.

Not quite. The primary reason is IAM isolation. Bucket-level IAM is cleaner and safer than relying on prefix logic within a single bucket to separate access concerns.

What is the primary purpose of setting a Pub/Sub subscription ack-deadline longer than the Document AI processing time?

Correct. If the agent crashes before acknowledging, Pub/Sub redelivers after the deadline expires, giving another agent instance a chance to process the document without manual intervention.

Not quite. The ack deadline is about reliability, not cost or message size. A deadline longer than processing time ensures crash recovery via automatic redelivery.

Module 3 · Lab 1

Lab: Designing a GCS-Backed Document Agent

Practice with an AI tutor. Complete 3 exchanges to finish the lab.

Scenario

You are architecting a GCS-based document pipeline for a regional logistics company. They receive 50,000 PDF shipment manifests per day from 200 carrier partners. Your task is to design the bucket layout, IAM roles, and Pub/Sub trigger configuration for the agent that will hand documents off to Document AI.

Ask the tutor about bucket naming conventions, IAM role assignments, notification config flags, or object lifecycle rules. Explore at least one design tradeoff.

Document Storage Architect

GCS + Pub/Sub

Ready to work through your GCS pipeline design. You have 50,000 PDF manifests per day landing from 200 carriers. What's your first architectural question — bucket layout, IAM, Pub/Sub configuration, or lifecycle rules?

Module 3 · Lesson 2

Document AI: Processors, Parsers, and the API

Understanding the processor model, synchronous vs. batch processing, and structured output that agents consume.

How does Document AI convert a raw PDF into structured data an agent can reason over?

In Q3 2023, Wayfair presented at Google Cloud Next on their use of Document AI to process supplier co-op invoices — marketing reimbursement claims that arrive as PDFs containing tables, handwritten annotations, and multi-page line-item grids. Before Document AI, OCR accuracy on these documents was around 71%. After migrating to the Form Parser processor and tuning entity confidence thresholds in their agent, accuracy reached 96.4%, reducing manual review queues by roughly 3,400 documents per week.

The Processor Model

Document AI organizes its capabilities into processors — specialized models trained on specific document types. Each processor has a processor ID and a version. An agent calls a processor by ID, passes a document (inline bytes or a GCS URI), and receives back a Document proto containing extracted text, layout, entities, and form fields.

Key processor types include the General Form Parser (extracts key-value pairs from any form), Invoice Parser (understands invoice-specific fields like line items, tax, totals), Custom Document Extractor (trained on your specific schema via labeled examples), and Splitter/Classifier (routes multi-document bundles to the right downstream processor).

Processor Type

Invoice Parser

Pre-trained on millions of invoices. Extracts supplier name, invoice number, line items, totals, tax, and payment terms as typed entities with confidence scores.

Processor Type

Form Parser

Extracts key-value pairs from any structured form. Returns field-value pairs with bounding box coordinates and confidence scores for downstream validation.

Processor Type

Custom Extractor

Fine-tuned on your labeled examples via the Document AI Workbench. Use when pre-built processors miss domain-specific fields or layouts.

Processor Type

Splitter / Classifier

Identifies document boundaries and types within a multi-document PDF. Returns split_type and classification with confidence — enables agent routing logic.

Synchronous vs. Batch Processing

Document AI exposes two processing modes. Synchronous (online) processing via process_document() is appropriate for documents up to 15 pages and returns results inline within the API response. Latency is typically 1–5 seconds. This is the right choice when an agent needs to make an immediate routing or validation decision.

Batch processing via batch_process_documents() submits a list of GCS URIs and returns immediately with a long-running operation (LRO). Results are written back to a specified GCS output prefix. Batch mode handles up to 1,000 documents per request and supports files up to 2,000 pages. Agents poll the LRO or listen for a completion Pub/Sub notification.

Decision Rule

Under 15 pages and needs immediate response? Use synchronous. Over 15 pages, bulk jobs, or latency-tolerant workflows? Use batch. Mixing modes in one agent is common: synchronous for real-time invoice validation, batch for nightly statement reconciliation.

The Document Proto and Entity Extraction

Every Document AI response is a google.cloud.documentai.v1.Document proto. The agent's job is to traverse this structure and extract meaning. The three most important fields are entities (typed extractions with confidence), pages (layout, form fields, tables per page), and text (full OCR text of the document).

# Synchronous process call — Python SDK
from google.cloud import documentai_v1 as documentai

client = documentai.DocumentProcessorServiceClient()
processor_name = client.processor_path(
    project="my-project",
    location="us",
    processor="abc123def456"
)

with open("invoice.pdf", "rb") as f:
    raw_doc = documentai.RawDocument(
        content=f.read(),
        mime_type="application/pdf"
    )

request = documentai.ProcessRequest(
    name=processor_name,
    raw_document=raw_doc
)
result = client.process_document(request=request)
doc = result.document

# Iterate extracted entities
for entity in doc.entities:
    print(f"{entity.type_}: {entity.mention_text} "
          f"(confidence: {entity.confidence:.2f})")

Confidence Thresholds and Agent Routing

Every entity carries a confidence score between 0 and 1. A well-designed agent uses confidence thresholds to route documents: high confidence entities go straight to automated processing, medium confidence triggers a human review flag, and low confidence entities can trigger a re-extraction attempt with a different processor version.

LRO (Long-Running Operation)A Batch Process request returns an LRO name. The agent polls via GetOperation or sets up a Pub/Sub notification on the operation to know when results are ready in GCS.

Processor VersionEach processor has stable, rc (release candidate), and pretrained versions. Agents should pin to a specific version ID for production pipelines rather than using the "default" alias, which may change.

Human ReviewDocument AI integrates with Human Review (Specialized Processors) to route low-confidence extractions to a labeling UI. Agents trigger this by writing to a HITL (Human-in-the-Loop) queue.

Module 3 · Lesson 2

Quiz: Document AI Processors and API

Four questions on Document AI concepts.

An agent receives a 300-page multi-document PDF bundle from GCS. Which Document AI processing approach is most appropriate?

Correct. Batch processing handles up to 2,000 pages per document and is designed for large or multi-document jobs. The agent submits and then monitors the LRO.

Not quite. Synchronous processing has a 15-page limit. For a 300-page bundle, batch_process_documents() with GCS URI input is the appropriate path.

Which Document AI processor would an agent use to identify and separate different document types within a mixed PDF before routing each to a specialized extractor?

Correct. The Splitter/Classifier identifies document boundaries and types within a bundle, returning split_type and classification labels the agent uses for downstream routing.

Not quite. The Splitter/Classifier is the processor designed to segment and classify mixed-type document bundles. The others extract fields but don't perform document-type routing.

A Document AI entity is returned with a confidence score of 0.43. According to standard agent routing practice, what should the agent do?

Correct. Low confidence scores (typically below 0.7–0.8 depending on the use case) should trigger human review or a re-extraction attempt, not silent acceptance or discard.

Not quite. A confidence of 0.43 is too low to trust for automated processing. Standard practice routes low-confidence extractions to a human review queue or attempts re-extraction.

Why should production agents pin to a specific processor version ID rather than using the "default" alias?

Correct. When Google releases a new processor version, the default alias updates. If an agent relies on "default," its extraction schema and confidence calibration may silently change, breaking downstream pipelines.

Not quite. The core risk is behavioral: Google may update the model version that "default" points to, changing extraction results in ways that break the agent's downstream logic without any code change.

Module 3 · Lab 2

Lab: Selecting and Calling Document AI Processors

Practice with an AI tutor. Complete 3 exchanges to finish the lab.

Scenario

You are building an agent for a healthcare insurer that processes three document types: prior authorization forms, Explanation of Benefits PDFs, and multi-page medical records bundles (up to 500 pages). You must select the right processor for each, configure the API calls correctly, and define confidence thresholds for routing.

Ask about processor selection, API call structure, confidence thresholds, batch vs. sync decisions, or HITL configuration. Work through the healthcare document types specifically.

Document AI Processor Advisor

Document AI API

Let's work through your healthcare document processing pipeline. You have three document types: prior auth forms, EOB PDFs, and large medical record bundles. Which one would you like to tackle first — and what's your main question about processor selection or API configuration?

Module 3 · Lesson 3

Wiring the Agent: Cloud Functions, Workflows, and Vertex AI

How orchestration frameworks connect GCS triggers to Document AI calls and downstream sinks.

What happens between a file landing in a bucket and structured data appearing in BigQuery?

In 2022, Deutsche Bank's technology group published details of their contract intelligence platform built on Google Cloud. The system ingests legal contracts — NDAs, credit agreements, ISDA schedules — from a GCS landing zone, routes each through a Document AI Custom Extractor trained on financial contract schemas, and uses Cloud Workflows to orchestrate the multi-step pipeline. The Workflows YAML definition acts as the agent's brain: it calls Document AI, evaluates confidence, either writes to BigQuery or routes to a review queue, and sends a Pub/Sub completion event. The entire pipeline processes a 40-page credit agreement in under 8 seconds end-to-end.

Three Orchestration Patterns

There is no single "right" way to wire a document agent. The choice depends on complexity, latency requirements, and how much orchestration state needs to be tracked. Three patterns dominate production deployments.

Pattern 1

Cloud Functions (Simple)

A single Python Cloud Function triggers on a Pub/Sub message, calls Document AI synchronously, and writes results. Best for simple, low-volume, single-step pipelines. Hard to manage complex branching logic.

Pattern 2

Cloud Workflows (Orchestrated)

A YAML/JSON workflow definition orchestrates multiple steps with built-in retry, branching, and parallel execution. Calls Cloud Functions, HTTP endpoints, or GCP service connectors. State is managed by the Workflows runtime.

Pattern 3

Vertex AI Agent Builder

Uses a ReAct-style LLM agent that calls Document AI, GCS read/write, and BigQuery as tools. Best when the agent needs to reason about extraction results and decide non-deterministically between multiple next steps.

Cloud Workflows: Step-by-Step Pipeline

Cloud Workflows is a serverless orchestrator that executes steps defined in YAML. Each step can call HTTP endpoints, GCP service connectors, or other workflows. The state of variables persists across steps, enabling agents to carry Document AI results forward into branching logic.

# Cloud Workflows YAML — document agent pipeline
main:
  params: [args]
  steps:
    - init:
        assign:
          - project: ${sys.get_env("GOOGLE_CLOUD_PROJECT_ID")}
          - gcs_uri: ${args.gcs_uri}

    - call_document_ai:
        call: http.post
        args:
          url: ${"https://us-documentai.googleapis.com/v1/projects/"
                + project + "/locations/us/processors/abc123:process"}
          auth:
            type: OAuth2
          body:
            gcsDocument:
              gcsUri: ${gcs_uri}
              mimeType: "application/pdf"
        result: doc_result

    - check_confidence:
        switch:
          - condition: ${doc_result.body.document.entities[0].confidence > 0.85}
            next: write_to_bigquery
          - condition: true
            next: send_to_review

    - write_to_bigquery:
        call: http.post
        args:
          url: "https://bigquery.googleapis.com/bigquery/v2/..."
          body: ${doc_result.body.document.entities}
        next: end

    - send_to_review:
        call: http.post
        args:
          url: "https://pubsub.googleapis.com/v1/..."
          body:
            message:
              data: ${base64.encode(json.encode(doc_result))}
        next: end

Vertex AI Agent Builder for Document Workflows

For document workflows that require judgment — "does this contract clause conflict with our standard terms?" or "should this invoice be escalated to AP management?" — a Vertex AI agent using Gemini with tool use is more appropriate than a deterministic workflow. The agent receives the Document AI extraction as context and uses tools to query BigQuery for vendor history, write back to GCS, or trigger downstream Pub/Sub events.

The key architecture decision: use Cloud Workflows for deterministic multi-step orchestration; use Vertex AI agents for reasoning-required decision points. These can be composed — a Workflow step can invoke a Vertex AI agent as an HTTP call, combining both strengths.

Deutsche Bank Pattern

Deutsche Bank's contract intelligence system used exactly this composition: Cloud Workflows handled the deterministic steps (fetch document, call Document AI, check confidence, route), while a Vertex AI agent handled the reasoning step — determining whether an extracted clause was "standard" or "non-standard" by comparing it against a BigQuery table of 50,000 historical contract clauses.

Cloud Workflows ConnectorPre-built steps for calling GCP services (BigQuery, Pub/Sub, Cloud Storage, Document AI) without writing raw HTTP — handles auth, request formatting, and response parsing automatically.

EventarcGoogle Cloud's event routing service. Can trigger a Cloud Workflow or Cloud Run function directly from a GCS OBJECT_FINALIZE event, replacing the manual Pub/Sub subscription wiring.

ReAct LoopReason + Act. The Vertex AI agent iteratively reasons about the document extraction, selects a tool (BigQuery lookup, GCS write, etc.), observes the result, and reasons again until the task is complete.

Module 3 · Lesson 3

Quiz: Orchestration and Agent Wiring

Four questions on orchestration patterns.

A document pipeline needs to call Document AI, then conditionally write to BigQuery or route to a human review Pub/Sub topic based on confidence score. Which orchestration tool is most appropriate for this deterministic branching logic?

Correct. Cloud Workflows is designed for deterministic, multi-step orchestration with branching. The switch/condition step maps directly to the confidence threshold routing requirement.

Not quite. For deterministic branching based on a numeric confidence threshold, Cloud Workflows is the right tool. Vertex AI agents are better for reasoning-based decisions; Cloud Functions lack native orchestration state.

Which Google Cloud service can trigger a Cloud Workflow directly from a GCS OBJECT_FINALIZE event, without manually configuring a Pub/Sub subscription?

Correct. Eventarc routes GCS events (including OBJECT_FINALIZE) directly to Cloud Workflows, Cloud Run, or Cloud Functions without requiring manual Pub/Sub subscription setup.

Not quite. Eventarc is Google Cloud's event routing service that can trigger workflows directly from GCS events. Cloud Scheduler is time-based; Cloud Tasks is queue-based; Cloud Run Jobs require explicit invocation.

When should a Vertex AI agent using Gemini be used instead of (or in addition to) Cloud Workflows in a document pipeline?

Correct. Vertex AI agents excel at reasoning-required decisions — comparing clauses to historical norms, classifying ambiguous intent, or selecting among non-deterministic next steps. Deterministic routing should stay in Workflows.

Not quite. Vertex AI agents add the most value when a step requires reasoning or judgment. Deterministic steps (confidence threshold checks, field routing) are better handled by Cloud Workflows for reliability and cost.

In Cloud Workflows, what is the advantage of using a built-in connector for BigQuery over writing a raw HTTP step?

Correct. Connectors abstract away OAuth2 auth, correct API URL construction, and response schema handling. This reduces boilerplate and common errors in raw HTTP steps.

Not quite. Connectors don't bypass IAM — they still use the workflow's service account permissions. Their value is abstracting away auth mechanics, request formatting, and response parsing.

Module 3 · Lab 3

Lab: Designing a Cloud Workflows Document Pipeline

Practice with an AI tutor. Complete 3 exchanges to finish the lab.

Scenario

You are wiring a Cloud Workflows pipeline for a mortgage lender. When a loan application PDF lands in GCS, the workflow must: (1) call Document AI to extract applicant data, (2) check confidence, (3) either write to BigQuery or route to a review queue, and (4) trigger a downstream notification. You need to decide on Eventarc vs. Pub/Sub trigger, error handling, and where to add a Vertex AI reasoning step.

Ask about Workflows YAML structure, Eventarc vs Pub/Sub trigger choice, error handling steps, retry configuration, or where to insert a Vertex AI reasoning call in the pipeline.

Cloud Workflows Pipeline Advisor

Workflows + Eventarc

Let's design your mortgage loan pipeline in Cloud Workflows. You need GCS trigger → Document AI → confidence routing → BigQuery or review queue → notification. What's your first design question — the trigger mechanism, the Workflows YAML structure, confidence branching, or error handling?

Module 3 · Lesson 4

Reliability, Observability, and Production Hardening

Error handling, dead-letter queues, Cloud Logging integration, and the operational patterns that keep document agents running at scale.

How do you know when your document agent fails — and how do you make it recover automatically?

In March 2023, a major European bank (disclosed in a Google Cloud reference architecture blog post, anonymized) ran a document agent processing trade confirmations from 180 broker counterparties. A single broker began sending malformed PDFs with corrupted byte-order marks. The Document AI processor returned HTTP 400 errors. Because the agent had no dead-letter queue and Pub/Sub ack-deadlines were set to 10 seconds, the corrupted documents triggered continuous redelivery — approximately 4,200 redelivery attempts per hour. The spike in Document AI API calls consumed the project quota within 90 minutes, blocking all other document processing. The fix: a dead-letter topic that captured documents after 5 failed attempts, plus a Cloud Monitoring alert on dead-letter queue depth.

Dead-Letter Queues

A dead-letter topic (DLT) is a Pub/Sub topic that receives messages after a configured maximum delivery attempt count. When a document causes repeated processing failures, Pub/Sub moves it to the DLT instead of retrying indefinitely. The agent team is then alerted, can inspect the failed document, fix the root cause, and replay from the DLT.

# Create a dead-letter topic
gcloud pubsub topics create doc-dead-letter

# Update subscription to use DLT after 5 failed attempts
gcloud pubsub subscriptions modify-push-config doc-agent-sub \
  --dead-letter-topic=doc-dead-letter \
  --max-delivery-attempts=5

# Grant Pub/Sub service account permission to publish to DLT
gcloud pubsub topics add-iam-policy-binding doc-dead-letter \
  --member="serviceAccount:service-PROJECT_NUM@gcp-sa-pubsub.iam.gserviceaccount.com" \
  --role="roles/pubsub.publisher"

Error Handling in Cloud Workflows

Cloud Workflows provides a try/except construct. Each step that calls Document AI or an external service should be wrapped in error handling that catches HTTP 4xx (client errors — likely bad document, do not retry) separately from HTTP 5xx (server errors — retry with backoff). This distinction prevents the quota exhaustion scenario described above.

# Cloud Workflows try/except with HTTP error discrimination
- call_doc_ai_safe:
    try:
      call: http.post
      args:
        url: ${processor_url}
        body: ${request_body}
        auth:
          type: OAuth2
      result: doc_result
    except:
      as: e
      steps:
        - check_error_type:
            switch:
              - condition: ${e.code == 400 or e.code == 422}
                next: send_to_dead_letter
              - condition: ${e.code >= 500}
                next: retry_with_backoff
        - retry_with_backoff:
            call: sys.sleep
            args:
              seconds: 30
            next: call_doc_ai_safe

Observability: Cloud Logging and Monitoring

Every component in the document agent pipeline — Cloud Functions, Cloud Workflows, Cloud Run, Document AI — writes structured logs to Cloud Logging. The critical practice is writing structured JSON logs with consistent fields: document_id, processor_id, confidence_score, processing_time_ms, and routing_decision. This enables Cloud Logging log-based metrics and BigQuery log sink analytics.

Metric

Dead-Letter Queue Depth

Alert when DLT subscription undelivered_message_count exceeds threshold. This is the primary signal that documents are failing in the pipeline. Paging on this metric prevented the bank incident from recurring.

Metric

Document AI Quota Utilization

Monitor documentai.googleapis.com/quota/online_requests/utilization. Alert at 70% to enable proactive quota increase requests before hitting limits.

Metric

p99 Processing Latency

A log-based metric on processing_time_ms. P99 spikes indicate Document AI regional issues or abnormally large documents that should be routed to batch processing.

Metric

Low-Confidence Rate

Track the percentage of documents routed to human review. A sudden increase signals document format drift — a carrier changed their PDF template and the processor needs retraining.

Idempotency and Deduplication

Because Pub/Sub provides at-least-once delivery, a document agent may process the same document twice. The agent must be idempotent: processing the same document twice should produce the same output without creating duplicate records. The standard pattern is to use the GCS object's generation number — a unique integer that changes only when the object content changes — as a deduplication key in BigQuery or Firestore before writing results.

Production Checklist

1. Dead-letter topic with max 5 delivery attempts. 2. Separate retry logic for 4xx vs 5xx errors. 3. Structured JSON logging with document_id on every log line. 4. Monitoring alerts on DLT depth, quota utilization, and low-confidence rate. 5. Idempotency via GCS generation number deduplication. 6. Ack deadline longer than max expected processing time.

Dead-Letter TopicA Pub/Sub topic that receives messages after a configured max-delivery-attempts count. Prevents infinite retry loops from consuming API quota on persistently failing documents.

GCS Generation NumberA unique integer assigned to each GCS object version. Can serve as a deterministic deduplication key for at-least-once delivery scenarios in agent pipelines.

Log-Based MetricA Cloud Monitoring metric derived from Cloud Logging log entries. Enables alerting on custom application-level signals like confidence scores or routing decisions without custom metric writes.

Module 3 · Lesson 4

Quiz: Reliability and Observability

Four questions on production hardening.

A document agent is receiving repeated HTTP 400 errors from Document AI on one specific document. Without a dead-letter queue, what happens?

Correct. Without a DLT, Pub/Sub retries indefinitely within the message retention period. A persistently failing document can trigger thousands of failed API calls, exhausting quota as described in the bank incident.

Not quite. Without a dead-letter topic, Pub/Sub will keep redelivering the failed message — potentially thousands of times — consuming API quota and blocking other documents from processing.

In Cloud Workflows error handling, why should HTTP 400 errors route to the dead-letter path while HTTP 500 errors route to a retry-with-backoff path?

Correct. HTTP semantics: 4xx are client errors (the request itself is wrong), 5xx are server errors (the server failed temporarily). Retrying a bad document wastes quota; retrying a transient server error is exactly right.

Not quite. Standard HTTP semantics: 400-series errors mean the document or request is malformed — retrying won't help. 500-series errors are server-side transient failures that may resolve with a retry after a delay.

A monitoring alert fires: the low-confidence routing rate for the Invoice Parser has jumped from 4% to 31% over 48 hours. What is the most likely root cause?

Correct. A sudden increase in low-confidence routing is a classic signal of document format drift — a supplier redesigned their invoice template and the processor's training data doesn't match the new layout.

Not quite. A sudden jump in low-confidence rate almost always means the incoming documents have changed in some way — typically a supplier updating their invoice template — rather than an infrastructure issue.

Why should a document agent use the GCS object's generation number as a deduplication key when writing to BigQuery?

Correct. Pub/Sub guarantees at-least-once delivery, not exactly-once. The GCS generation number is a stable, content-dependent unique identifier that allows the agent to check whether it already processed this exact object version before writing to BigQuery.

Not quite. The reason is idempotency under at-least-once delivery. Since Pub/Sub may redeliver messages, the same document can be processed twice. The generation number is a stable unique key to detect and reject duplicate writes to BigQuery.

Module 3 · Lab 4

Lab: Production Hardening a Document Agent

Practice with an AI tutor. Complete 3 exchanges to finish the lab.

Scenario

You are reviewing an existing Cloud Workflows document agent for a retail bank. The agent currently has no dead-letter queue, no structured logging, no idempotency check, and uses the Document AI "default" processor alias. It has been running in production for 3 months and has experienced two quota exhaustion incidents. Your task is to design the full reliability and observability upgrade.

Ask about DLT configuration commands, structured logging fields, BigQuery deduplication query patterns, Cloud Monitoring alert thresholds, or how to safely migrate from the "default" alias to a pinned processor version.

Pipeline Reliability Advisor

DLT + Monitoring

Let's harden your retail bank document agent. You have four known gaps: no dead-letter queue, no structured logging, no idempotency check, and using the "default" processor alias. Two quota incidents in three months — that's a sign the retry loop is uncontrolled. Which gap would you like to address first?

Module 3

Module Test: Document Agents with Cloud Storage and Document AI

15 questions · Score 80% or higher to pass · All four lessons covered

1. What GCS feature retains prior versions of an object when it is overwritten, enabling agents to audit document history?

Object Versioning retains all prior versions of an object when overwritten or deleted, creating an audit trail agents can inspect.

Object Versioning is the correct feature. Bucket Lock enforces retention policies; UBLA controls IAM; Lifecycle Management demotes storage classes.

2. An agent needs to read from a landing bucket but must never write to it. Which IAM role on the landing bucket satisfies least privilege?

Correct. roles/storage.objectViewer grants only get and list permissions on objects — no create or delete.

objectViewer is the least-privilege read-only role. Creator and admin grant write access; legacyBucketWriter grants broader permissions than needed.

3. What is the primary advantage of Uniform Bucket-Level Access for document agent pipelines?

Correct. UBLA eliminates the complexity of per-object ACLs, ensuring all access is governed by IAM policies alone — making security auditing reliable and predictable.

UBLA's value is security: it removes per-object ACLs so all access is controlled exclusively through IAM, making permission auditing deterministic.

4. A bucket notification is configured with event type OBJECT_FINALIZE. When exactly does this event fire?

Correct. OBJECT_FINALIZE fires only on successful completion of the upload, making it safe for agents to read the object immediately upon receiving the notification.

OBJECT_FINALIZE fires on successful upload completion. Metadata updates fire OBJECT_METADATA_UPDATE; deletions fire OBJECT_DELETE.

5. Which Document AI processor is specifically designed to identify and separate different document types within a mixed-content PDF bundle?

Correct. The Splitter/Classifier identifies document boundaries and classifies document types within a bundle, enabling agent routing to appropriate downstream processors.

The Splitter/Classifier is built for this purpose — it finds document boundaries and identifies types within a multi-document bundle. Other processors extract fields from documents of known type.

6. A Document AI synchronous call returns a 200 response. Where in the response proto does the agent find extracted entities with confidence scores?

Correct. The entities field contains typed extractions each with a type_, mention_text, and confidence score. The text field has raw OCR; pages contains layout and form field data.

Entities with confidence scores are in response.document.entities. The text field has raw OCR content; pages has layout data; uri references the source document.

7. Why does batch_process_documents() return immediately rather than waiting for extraction results?

Correct. Batch processing can handle thousands of documents and take significant time. The LRO pattern allows the caller to proceed and check back later rather than blocking.

The LRO pattern is necessary because batch jobs can take minutes to hours. An HTTP connection can't remain open that long, so the API returns a handle the agent polls asynchronously.

8. What is the recommended approach for a document agent that needs deterministic multi-step orchestration WITH a reasoning step for ambiguous document classification?

Correct. Cloud Workflows handles deterministic steps reliably and cost-effectively; a Vertex AI agent call can be embedded as one HTTP step for the reasoning-required decision point.

The composition pattern is best: Cloud Workflows for deterministic routing, with Vertex AI invoked as one step where reasoning is needed. This leverages the strengths of both without over-engineering.

9. Eventarc can trigger Cloud Workflows when a GCS object is finalized. What does this replace compared to the manual Pub/Sub approach?

Correct. Eventarc abstracts away the manual Pub/Sub topic creation, bucket notification configuration, and subscription wiring — a Workflow trigger can be created in a single Eventarc rule.

Eventarc replaces the manual Pub/Sub plumbing: creating a topic, configuring the bucket notification, creating a subscription, and wiring it to the workflow trigger — all handled by one Eventarc configuration.

10. A Pub/Sub subscription is configured with max-delivery-attempts=5 and a dead-letter topic. A document causes a Document AI HTTP 400 error on attempt 1. What happens after attempt 5?

Correct. After max-delivery-attempts, Pub/Sub moves the message to the configured dead-letter topic. There it can be inspected, the root cause fixed, and messages replayed from the DLT subscription.

After max-delivery-attempts, Pub/Sub forwards the message to the dead-letter topic. GCS doesn't quarantine objects automatically; Pub/Sub doesn't delete messages or retry indefinitely when a DLT is configured.

11. Which structured log field is most critical for correlating a Cloud Workflows step log entry with the specific document that triggered it?

Correct. document_id is the correlation key that links log entries across all pipeline steps for a single document. Without it, tracing a failure to its source document is impossible at scale.

document_id is the critical correlation key. All other fields (latency, confidence, routing) are valuable, but without a document_id on every log entry you cannot trace a failure back to its source document.

12. The three-bucket pattern (landing → processing → output) provides which primary benefit over a single bucket with prefix-based separation?

Correct. IAM policies apply at the bucket level, not prefix level. Three buckets means a processing agent cannot accidentally reach the original landing documents even with a code bug.

The key benefit is IAM isolation. With separate buckets, each agent's service account can be granted exactly the permissions needed for its specific bucket — no bug in a processing agent can reach the landing data.

13. An agent produces duplicate BigQuery records after a network timeout caused Pub/Sub to redeliver a message. Which deduplication strategy uses a stable, content-dependent GCS identifier?

Correct. The GCS generation number is a unique integer that changes only when the object content changes. It persists across Pub/Sub redeliveries and is independent of agent runtime state.

The GCS generation number is the stable content-dependent key. Pub/Sub message IDs differ between deliveries; UUIDs differ between agent runs; processor timestamps are not deterministic.

14. A sudden increase in the low-confidence routing rate from 3% to 28% on the Invoice Parser should trigger which investigation first?

Correct. A sudden spike in low-confidence rate is the canonical signal of document format drift — a supplier updated their template. The fix is retraining the processor on the new format.

A sudden confidence drop almost always means document format drift — a supplier changed their invoice layout. Reviewing recent incoming documents to identify format changes is the right first step.

15. Which Cloud Monitoring metric alert should be configured as the primary operational signal that documents are actively failing in a Pub/Sub-triggered document agent pipeline?

Correct. A growing DLT queue depth is the clearest signal that documents are failing in the pipeline. CPU, storage, and BigQuery metrics don't directly indicate document processing failures.

DLT queue depth is the primary document failure signal. Documents move to the DLT only when processing fails repeatedly — a non-zero and growing DLT count means something is broken in the pipeline.