Building Production Agents with Vertex AI

1. The Booking.com engineers presenting at Google Cloud Next in March 2024 emphasized that time spent on Playbook structure at creation is equivalent in value to how much subsequent prompt tuning?

Correct. The Booking.com engineers were explicit: fifteen minutes of Playbook structure work at creation time is worth more than the next three days of prompt tuning. Front-loading architectural decisions pays disproportionately in agent development.

Incorrect. The Booking.com engineers said fifteen minutes on Playbook structure at creation is worth more than three days of subsequent prompt tuning. This reflects how foundational Playbook design is to agent reliability.

2. Which type of feedback data is generated when a human reviewer modifies (rather than simply approves or rejects) an agent's proposed action?

Correct. A modification creates a counterfactual label — the pairing of the agent's rejected proposal with the human's correct alternative. This is the highest-value fine-tuning signal.

When a reviewer modifies an action, they create a counterfactual label: (rejected action, correct action). This is more valuable than a simple approval or rejection.

3. How does A2A handle real-time streaming of task progress for long-running operations?

Correct. tasks/sendSubscribe opens an SSE stream, and the remote agent pushes TaskStatusUpdateEvent and TaskArtifactUpdateEvent messages as they occur — enabling real-time progress visibility without polling.

A2A supports streaming via tasks/sendSubscribe, which opens a Server-Sent Events stream. The remote agent pushes update events as task status changes and as artifact chunks become available.

4. For GDPR compliance, what is the recommended BigQuery logging architecture to support data subject deletion requests?

Correct. Pseudonymization with a joinable identity table satisfies both audit (immutable logs) and deletion (remove identity mapping) requirements simultaneously.

Pseudonymous user_id + separate identity table is the recommended pattern. Nulling the identity row satisfies deletion rights without rewriting immutable conversation logs.

5. In the A2A protocol, what is the difference between a "client agent" and a "remote agent"?

Correct. The client/remote distinction is about the direction of task delegation, not location or capability tier. In a different context, the same agent can be a client agent (when delegating) or a remote agent (when receiving delegations).

The distinction is about task delegation direction: the client agent initiates, the remote agent receives and executes. An agent can play either role depending on the interaction — it's positional, not a fixed characteristic.

6. What is the correct term for the privileged text block passed to the Gemini API before any user turn that establishes permanent behavioral constraints?

Correct. In the Gemini API, the system prompt is passed as the system_instruction field of the GenerateContentRequest .

Incorrect. The correct Gemini API field name is system_instruction . Review Lesson 1.

7. What are the four terminal and non-terminal states in the A2A task lifecycle?

Correct. The key non-terminal states are submitted (received), working (executing), and input-required (paused for more info). Terminal states are completed, failed, and canceled.

A2A tasks flow through: submitted → working → input-required (back to working) or directly to completed, failed, or canceled. The input-required state is particularly distinctive — it enables genuine multi-turn agent dialogue.

8. What is the primary risk of setting HITL triggers that are too sensitive (triggering too frequently)?

Correct. Over-triggering HITL creates user frustration through waits and interruptions, and negates the efficiency gains of deploying an AI agent in the first place.

Too-frequent HITL triggers degrade user experience and eliminate automation value. The goal is precise triggering — not minimal or maximal.

9. What is the "Opaque Execution Principle" and why does it enable cross-framework interoperability?

Correct. The interface contract — Agent Card plus artifacts — is all that matters. Internal implementation is invisible. This is the same principle that lets Python code call a Java service over REST without caring that it's Java.

Opaque execution means the interface contract (Agent Card + artifacts) is all that matters — not the implementation. This is what lets agents built with completely different frameworks interoperate over A2A.

10. Vertex AI Online Evaluation samples production conversations and scores them using a judge model. Where does it write results by default?

Correct. Online Evaluation writes evaluation results to BigQuery, enabling SQL analysis, Looker Studio dashboards, and scheduled alerting via custom Cloud Monitoring metrics.

Online Evaluation results go to BigQuery by default, allowing rich analytical queries and integration with BI tools like Looker Studio.

11. What is the primary driver of correct function selection in a multi-tool agent?

Correct. Description quality is the dominant factor in function routing accuracy. Vague descriptions produce misrouting even when function names are distinct.

Incorrect. Description quality is the primary routing signal. Well-written descriptions consistently outperform poorly described functions regardless of order or naming.

12. Which HITL trigger category cannot be overridden by any other system condition?

Correct. User-initiated escalation must be honored unconditionally — it overrides all other automation gates and cannot be filtered or delayed.

User-initiated escalation is the one trigger that must be honored unconditionally, regardless of other system conditions or efficiency considerations.

13. What is a "canary deployment" in the context of promoting a fine-tuned agent model?

Correct. A canary deployment routes a small slice of real traffic to the new model version, allowing measurement of real-world metrics before committing to a full rollout.

Canary deployment means routing a small percentage of live production traffic to the new model — getting real-world signal while limiting blast radius if the model underperforms.

14. Which of the four functional layers of a system prompt defines who the agent is and what voice register it uses?

Correct. Identity & Persona covers agent name, role, and tone register. Review Lesson 1.

Incorrect. Identity & Persona is the layer for agent name, voice, and role. Review Lesson 1.

15. Which gcloud command creates credentials that the Vertex AI SDK can use via ADC?

Correct. Only gcloud auth application-default login creates credentials in the well-known file location that SDKs use. Regular gcloud auth login only authenticates the CLI.

gcloud auth application-default login is the command. Regular login only works for the CLI, not SDK code.

16. Which storage backend underlies the Vertex AI Agent Builder Session API for strong consistency?

Correct. The Session API is backed by Spanner, providing strong consistency and global availability for session data.

The Session API uses Spanner under the hood, not Firestore or BigQuery, giving it strong consistency properties.

17. Waymo's Remote Assistance System (2023) automatically escalated to a second operator if no response arrived within how many seconds?

Correct. Waymo's RAS escalated to a second operator if the primary did not respond within 90 seconds — a real-world implementation of the timeout-plus-escalation pattern.

Waymo's RAS used a 90-second escalation timeout before routing to a secondary operator.

18. In hierarchical multi-agent orchestration, what role does the orchestrator agent typically play on Vertex AI?

Correct. The orchestrator needs strong reasoning to decompose complex tasks appropriately, so frontier models are typical choices. Workers can be smaller, cheaper, specialized models since they handle well-defined narrow tasks.

The orchestrator does the heavy reasoning — breaking down complex tasks and routing appropriately — so capable frontier models like Gemini 1.5 Pro are the recommended choice. Workers can be smaller, cheaper, specialized models.

19. When a function call fails with an API timeout, what should your execution layer return to the model?

Correct. Structured error objects with error codes and recovery hints allow the model to reason about failure and communicate clearly with users.

Incorrect. Structured error objects — not empty strings, stale data, or tracebacks — give the model actionable context for producing a helpful response despite the failure.

20. What is the maximum request timeout for Cloud Run, and why does this affect long-running agent architecture?

Correct. Cloud Run's 60-minute hard timeout requires that any agent workflow taking longer must be decomposed into discrete steps orchestrated by Cloud Tasks with checkpointing between steps.

Cloud Run's maximum request timeout is 60 minutes. Workflows exceeding this must be broken into steps via Cloud Tasks, with session checkpoints preserving state between steps.

Final Exam