L1
Β·
Quiz
Β·
Lab
L2
Β·
Quiz
Β·
Lab
L3
Β·
Quiz
Β·
Lab
L4
Β·
Quiz
Β·
Lab
Module Test
Module 8 Β· Lesson 1

What a Risk Audit Actually Is

Framing the purpose, scope, and mindset of a structured agent investigation
What separates a genuine risk audit from a checkbox exercise β€” and how do you tell the difference before you start?

In October 2021, Zillow announced it was shutting down Zillow Offers, its algorithmic home-buying unit, and writing down $569 million in losses. The company's pricing agent had been systematically overbidding on houses β€” sometimes by 20% above market value β€” because it trained on listing prices rather than realized sale prices, and because human reviewers had been progressively removed from the loop as the model hit its volume targets.

Zillow's CEO Rich Barton acknowledged in the earnings call that "the unpredictability in forecasting home prices far exceeds what we anticipated." But internal reporting later showed that field managers had flagged the model's overconfidence months earlier. Those flags had not been escalated. There was no formal mechanism to receive them. There had been no risk audit β€” only periodic business reviews focused on volume and margin.

Why "Review" and "Audit" Are Not the Same Thing

Most organizations that deploy AI agents conduct some form of ongoing review: performance dashboards, weekly standups, quarterly business reviews. These are useful. They are not risk audits. A business review asks, "Is the agent hitting its targets?" A risk audit asks, "What could this agent do that we haven't anticipated, and who would know?"

The distinction matters because agent failures often occur in the gap between those two questions. Zillow's agent was hitting volume targets. Barton's team reviewed those targets every week. Nobody asked what the model would do in a cooling market, what it would do if it became a significant price-setter in local markets, or what signals from the field would indicate overbidding was systemic rather than incidental.

A risk audit is a structured, adversarial investigation of an agent system β€” its decision logic, its data inputs, its human oversight mechanisms, its failure modes, and its organizational accountability structures. It produces findings, not just metrics. It involves people outside the team that built and runs the agent. And it asks uncomfortable questions on purpose.

The Four Properties of a Valid Risk Audit

Across documented post-mortems β€” from Zillow to Amazon's discontinued recruiting AI to the Dutch childcare benefits algorithm β€” effective risk audits share four properties that distinguish them from performance reviews.

Risk Audit Properties
1
Adversarial Posture
The audit team's job is to find problems, not to confirm that the system works. This requires independence from the team that built and operates the agent.
2
Scope Beyond the Model
The agent is the tip of the iceberg. The audit must cover data pipelines, human oversight processes, escalation paths, and organizational incentive structures β€” not just model accuracy.
3
Stakeholder Breadth
Risk audits must include the people the agent affects β€” customers, frontline staff, compliance teams β€” not only the engineers who built it. Zillow's field managers knew. Nobody asked them formally.
4
Written Findings with Owners
An audit that produces no written findings, no risk ratings, and no named responsible parties is theater. Accountability requires documentation.
The Audit Mindset: Starting with Threat Models

Security professionals have long distinguished between vulnerability scanning (automated, surface-level) and threat modeling (structured reasoning about who might cause harm, how, and with what consequences). A risk audit of an AI agent is closer to threat modeling than to automated scanning.

Before collecting any data, the auditor should articulate a set of threat scenarios: specific, plausible ways the agent could produce harm. For a customer-service agent, this might include: handling a customer in financial distress in a way that increases debt; providing medically relevant information without appropriate caveats; escalating a dispute in a manner that violates consumer protection regulations. For a procurement agent, it might include: concentrating vendor relationships in ways that create single points of failure; approving purchases outside policy without triggering review; generating false invoices at scale.

The Dutch Syri case β€” in which an algorithmic welfare fraud detection system was struck down by a Dutch court in 2020 β€” illustrates what happens when threat modeling is absent. The system combined data from seventeen government databases to generate fraud risk scores. No threat model had asked: what if legitimate citizens are systematically misclassified? What if the data combines in ways that discriminate by postal code? What if there is no human reviewer capable of explaining a score to a challenged citizen? All of these scenarios materialized. None had been formally anticipated.

Core Principle

A risk audit begins not with data collection but with imagination: who could be harmed, in what way, through what mechanism? Only once those scenarios are written down can you assess whether the agent's current design and oversight prevent them.

Scoping Your Audit: Three Hard Decisions

Before any audit begins, three scoping decisions must be made explicitly β€” because making them implicitly means someone else makes them for you, usually in ways that narrow the audit's usefulness.

Decision 1: What is the agent's boundary? β€” Define what is "in" the agent system for audit purposes. Does this include only the model? The full pipeline including data sources? The human review process? The downstream systems the agent feeds? Narrowing scope prematurely is how audits miss the actual risk surface.
Decision 2: Who counts as a stakeholder? β€” Identify every party affected by agent decisions: direct users, people whose data is used for training, people affected by agent outputs who never interact with it (e.g., job applicants screened by a hiring agent), regulators, and the organization itself. Document who is excluded and why.
Decision 3: What counts as a finding? β€” Agree in advance on what severity thresholds require escalation versus monitoring. Without this, audit findings are filtered by whoever writes the report β€” a common failure mode documented in the NHS AI procurement reviews of 2022–2023.
Module 8 Thread

Each lesson in this module adds one layer to your audit: L1 establishes the audit concept and scope, L2 covers the risk identification methodology, L3 examines oversight gap analysis, and L4 builds the findings report and remediation plan. By the end, you will have a full audit framework you can apply to a real agent in your organization.

Lesson 1 Quiz

What a Risk Audit Actually Is β€” 5 questions
1. What is the primary distinction between a business performance review and a risk audit of an AI agent?
Correct. The Zillow case illustrates this exactly: weekly business reviews showed volume targets being met, while the systemic overbidding risk β€” which no one was formally asking about β€” accumulated for months.
Not quite. The core difference is in the question each process is designed to answer β€” target performance vs. unanticipated harm pathways.
2. Which property distinguishes a valid risk audit from a surface-level compliance checkbox exercise?
Correct. Adversarial posture β€” the explicit goal of finding failure modes rather than confirming success β€” is the defining property that separates genuine audits from theater.
Adversarial posture and independence from the building team are the key differentiators. The team that built the agent has inherent blind spots about its failure modes.
3. In the Zillow iBuying case, what was the primary organizational failure that allowed losses to reach $569 million?
Correct. This is a canonical example of the oversight gap: the information existed in the organization (field managers knew), but there was no structured channel to escalate it to decision-makers.
The failure was organizational, not technical. Real signals existed but had no escalation path β€” a classic risk audit finding.
4. What does the Dutch Syri case (2020 court ruling) illustrate about AI risk audits?
Correct. The Syri system combined seventeen databases without ever formally asking "what if legitimate citizens are misclassified?" β€” a threat model failure that led to all the anticipated harms becoming real.
The lesson is about threat modeling absence, not about the technology category itself. No one formally asked what could go wrong before deployment.
5. Which of the three scoping decisions described in Lesson 1 is most commonly made implicitly rather than explicitly β€” and why is that dangerous?
Correct. When scope is not explicitly negotiated, the building team defines it by default β€” and they tend to define it narrowly around what they control, excluding data pipelines, human review processes, and downstream systems that carry significant risk.
Scope of the agent's boundary is the most common implicit decision. Without explicit agreement, the builders define scope β€” and they define it to exclude the parts they didn't build, which is often where the real risk lives.

Lab 1: Define Your Audit Scope

Structured conversation Β· Minimum 3 exchanges to complete

Your Task

Think of an AI agent currently in use at your organization β€” or one you are planning to deploy. This can be a customer service chatbot, a procurement automation tool, a hiring screener, a content recommendation system, or any other agent that takes actions or makes decisions. Work with the audit coach below to define a defensible audit scope for that agent.

Describe the agent you want to audit: what it does, who deployed it, and what decisions it makes or actions it takes. Then we'll work through the three scoping decisions together.
Audit Scope Coach
Lab 1
Welcome to Lab 1. I'm your audit scope coach for this session. To begin, tell me about the AI agent you'd like to audit: what does it do, where is it deployed, and what kinds of decisions or actions does it take or influence? Don't worry about having all the details β€” we'll figure out what's known and what needs to be discovered together.
Module 8 Β· Lesson 2

Risk Identification: Building Your Threat Inventory

Systematic methods for surfacing what an agent could do that you haven't anticipated
How do you find risks you don't know to look for β€” and how do you decide which ones matter most?

Between 2014 and 2018, Amazon developed a machine learning tool to screen engineering job applications. The system was trained on ten years of submitted resumes β€” which, because Amazon's engineering workforce was predominantly male, meant it trained to penalize resumes that included the word "women's" (as in "women's chess club") and to downgrade graduates of all-women's colleges. The bias was not intentional. It was not in the requirements document. It was not visible in the model's aggregate accuracy metrics. It emerged from the interaction between training data composition and objective function β€” a category of risk that only systematic threat modeling would have surfaced.

Amazon's team discovered the problem in 2015, attempted to correct it through 2017, concluded it could not reliably prevent the model from finding proxy variables for gender, and quietly disbanded the team in 2018. The tool had operated for at least a year after the initial discovery before being shut down. Reuters reported the story in October 2018. There had been no external audit, no structured stakeholder review, and no formal threat inventory that would have flagged training data composition as a first-order risk.

The Four Risk Domains Every Agent Audit Must Cover

The Amazon case illustrates why risk identification cannot rely on intuition or on reviewing what the team was asked to build. Risks emerge from system interactions β€” between training data and objectives, between model outputs and downstream processes, between agent automation and human judgment. A structured threat inventory must cover four domains explicitly.

Domain 1 β€” Data Risks

Training data composition, data drift, proxy variable encoding, distribution shift between training and deployment, missing data handling, and the values embedded in labeling decisions. Amazon's recruiting failure was entirely a data domain risk.

Domain 2 β€” Model Risks

Objective function misalignment, overconfidence in low-data regions, distributional sensitivity to input format, emergent behaviors at scale, and model behavior under adversarial inputs. Zillow's overbidding was partly a model risk: overconfidence in a rising market condition.

Domain 3 β€” Integration Risks

Feedback loops between agent outputs and future training data, automation of downstream processes that amplify errors, agent-to-agent interactions in multi-agent pipelines, and latency between detection and correction. The 2010 Flash Crash β€” partly attributable to interacting algorithmic trading agents β€” is a canonical integration risk event.

Domain 4 β€” Governance Risks

Absence of accountability for agent decisions, unclear escalation paths, misalignment between agent authority and human capacity to review, lack of audit trails, and organizational incentives that suppress risk reporting. This was Zillow's primary failure mode.

The STRIDE-Adapted Method for Agent Risk Inventory

Microsoft's STRIDE framework β€” originally developed for software security β€” has been adapted by several AI governance teams as a structured threat enumeration method. The original acronym covers Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, and Elevation of Privilege. For AI agents, each category maps to distinct risk vectors.

Spoofing in agent contexts includes prompt injection (a user or upstream system feeding the agent instructions that override its guidelines), identity fraud in multi-agent pipelines, and falsified data provenance. Tampering includes training data poisoning, model weight manipulation in shared model registries, and adversarial perturbation of inputs. Repudiation covers agents that take consequential actions with no audit trail β€” a risk particularly acute in agentic systems that execute code or make financial transactions.

Information Disclosure includes model inversion attacks (inferring training data from model outputs), membership inference (determining whether specific individuals were in the training set), and data leakage through agent responses. Denial of Service in agentic systems includes resource exhaustion through adversarial input design and agent action loops. Elevation of Privilege β€” increasingly critical in 2024–2025 deployments β€” covers agents that acquire permissions beyond their initial grant, either by misinterpreting instructions or by being manipulated by external content.

Practical Note

The value of STRIDE is not that it covers every possible risk β€” it doesn't. The value is that it provides a systematic checklist that forces the audit team to consider risk categories they wouldn't generate through free-form brainstorming. Complement it with domain-specific checklists for your sector (financial services, healthcare, HR) and with stakeholder interviews that surface operational risks engineers don't see.

Risk Prioritization: Probability Γ— Impact Γ— Reversibility

A complete threat inventory will contain more risks than any organization can address simultaneously. Prioritization requires a scoring framework. The standard approach β€” probability Γ— impact β€” is necessary but insufficient for AI agents, because some agent failures are extremely difficult to reverse once they occur.

The Dutch Syri case illustrates this asymmetry: the probability that any individual would be wrongly flagged was relatively low. But once flagged, individuals faced benefit suspension, debt collection, and reputational harm β€” outcomes that were difficult to reverse even when the error was acknowledged. A standard probability Γ— impact matrix would have underweighted this risk. Adding a reversibility dimension changes the calculus: even low-probability harms that are hard to reverse warrant higher priority treatment.

For your risk inventory, use a three-factor rating: Probability (how likely is this scenario?), Impact (who is affected and how severely?), and Reversibility (if this occurs, can the harm be corrected?). Rate each on a 1–5 scale and compute a weighted score, with reversibility weighted at 1.5Γ— because agent systems operate at speed and scale that outpaces human correction capacity.

Audit Output: Risk Register

The product of your risk identification work is a risk register: a table listing each identified risk, its domain (data/model/integration/governance), its probability/impact/reversibility scores, its composite priority, and the agent component it is associated with. A risk register with no entries is not a clean bill of health β€” it is evidence the audit was not performed seriously.

Stakeholder Interviews: The Risk Surface You Can't Document Alone

Documentation and technical review will not surface all material risks. Amazon's recruiting system bias was not in any requirements document. Zillow's overbidding was known to field managers but never formally captured. A systematic interview process with a defined stakeholder set is a required component of any rigorous risk identification.

Interview categories that consistently surface material risks: frontline staff who interact with the agent's outputs daily; affected populations who experience agent decisions without direct interaction; adjacent system owners whose systems consume agent outputs; compliance and legal teams who understand regulatory exposure; and customer-facing staff who hear complaints that don't reach engineering teams. In each interview, ask: "What would you change about this system if you had the authority?" and "When have you seen this system behave in a way that surprised or concerned you?"

Lesson 2 Quiz

Risk Identification β€” 5 questions
1. Amazon's recruiting AI penalized resumes containing "women's" because of which risk domain?
Correct. The bias originated in data domain risk β€” specifically, training data composition that reflected historical hiring patterns β€” not in the model's explicit objective or its integration with downstream systems.
This is a data risk, not a model or governance risk. The training data composition was the primary failure β€” ten years of resumes from a predominantly male workforce encoded gender correlations that the model learned as signal.
2. In the STRIDE-adapted framework for agent risk, "Elevation of Privilege" is especially critical in 2024–2025 deployments because of which agent behavior pattern?
Correct. As agents are deployed in agentic settings with tool-use capabilities, privilege escalation β€” either through instruction misinterpretation or prompt injection from external content β€” becomes a primary security concern.
That describes Information Disclosure. Elevation of Privilege specifically covers agents acquiring unauthorized permissions beyond their initial grant β€” a critical concern as agentic systems gain more autonomous tool-use capabilities.
3. Why is adding a "reversibility" dimension to risk prioritization especially important for AI agents specifically?
Correct. The Dutch Syri case illustrates this: even low-probability harms that are difficult to reverse warrant elevated priority precisely because agents act faster than human review can catch and correct errors.
Speed and scale are the key factors. Agents can make thousands of consequential decisions before anyone detects a pattern, and some of those decisions β€” like flagging someone as a benefits fraudster β€” are very hard to undo even after the error is confirmed.
4. Which stakeholder category most consistently surfaces operational risks that engineering teams miss during risk identification interviews?
Correct. Zillow's field managers knew about the overbidding. Amazon's HR staff had observed the gender bias in output. In both cases, the information existed at the operational level and never reached the teams with authority to act on it.
Frontline and customer-facing staff consistently hold the most operationally grounded risk knowledge. The Zillow and Amazon cases both show information existing at the ground level that never reached the teams who could act on it.
5. What is the primary product of the risk identification phase of an agent audit?
Correct. The risk register is the foundational document of any risk audit β€” without it, the audit produces no durable output that can be acted upon, assigned to owners, or tracked over time.
The risk register is the required output: a structured document listing each risk, its domain, scoring, and associated component. Without this, the audit has no durable product that can be acted upon.

Lab 2: Build a Threat Inventory

Structured conversation Β· Minimum 3 exchanges to complete

Your Task

Using the agent you defined in Lab 1 (or a new one), work with the risk identification coach to build a threat inventory covering all four risk domains: data, model, integration, and governance. You'll also practice scoring risks using the probability Γ— impact Γ— reversibility framework.

Start by naming the agent and its primary function, then we'll systematically work through each risk domain together to build a threat inventory you can use in a real audit.
Threat Inventory Coach
Lab 2
Welcome to Lab 2. I'm here to help you build a structured threat inventory for your AI agent. Tell me the agent's name or type and its primary function β€” for example, "a customer service chatbot for a financial services firm" or "a hiring screening tool for a tech company." Once I understand what the agent does, we'll work through data risks, model risks, integration risks, and governance risks systematically.
Module 8 Β· Lesson 3

Oversight Gap Analysis

Mapping the distance between what humans think they're overseeing and what they actually are
If you needed to stop your agent right now because it was causing harm, how long would it take β€” and who would make the call?

On June 1, 2009, Air France Flight 447 disappeared over the Atlantic Ocean with 228 people aboard. The Bureau d'EnquΓͺtes et d'Analyses investigation, completed in 2012, identified a failure pattern that AI governance researchers have since adopted as a reference case: the pilots had operated the Airbus A330's fly-by-wire automation system for years without developing the manual flying skills to intervene effectively when automation failed. When the pitot tubes iced over and the autopilot disconnected, the crew had 4 minutes and 24 seconds to respond. They did not recognize the stall. They had never practiced recovering from it manually. The aircraft struck the ocean at 10,912 feet per minute.

This is the phenomenon Lisanne Bainbridge described as the "Ironies of Automation" in her 1983 paper: the more reliable and capable the automated system, the less opportunity human operators have to maintain the skills and situational awareness needed to oversee it. The gap between nominal oversight and real oversight grows with automation quality.

The Five Oversight Gaps Found in Agent Audits

Oversight gap analysis maps the distance between what an organization believes its oversight mechanisms achieve and what they actually achieve under realistic operating conditions. Across documented AI agent incidents, five gap categories appear repeatedly.

Oversight Gap Taxonomy
1
The Deskilling Gap
Human reviewers lose the expertise needed to meaningfully evaluate agent outputs. Radiologists who rely on AI-assisted screening for extended periods show degraded independent detection rates. Hiring managers who defer to algorithmic screening lose the judgment to override it. The deskilling gap is invisible until automation fails.
2
The Volume Gap
The agent produces outputs faster than human reviewers can meaningfully assess them. When Amazon's recruiting tool was processing thousands of resumes per week, no human reviewer was examining a representative sample. Volume gaps create the illusion of oversight while actual review rates approach zero.
3
The Escalation Gap
Concerns exist in the organization but have no formal path to reach decision-makers. Zillow's field managers, Amazon's HR staff, and the NHS radiology teams that identified AI screening anomalies in 2023 all experienced escalation gaps: they knew, and had no mechanism to formally register what they knew.
4
The Comprehension Gap
Reviewers can see agent outputs but cannot understand why the agent produced them. This makes review nominal rather than substantive β€” reviewers can flag unusual outputs but cannot evaluate whether the reasoning that produced them is sound. Black-box models in high-stakes contexts create systematic comprehension gaps.
5
The Authority Gap
The person who identifies a problem doesn't have the authority to stop the agent. In automated financial trading, in clinical decision support, and in content moderation at scale, the gap between detecting a problem and having the authority to halt the system can span departments and approval hierarchies β€” adding critical hours or days of continued harm.
The Red Team Simulation Method

The most reliable method for measuring oversight gaps is red team simulation: a structured exercise in which a designated team attempts to produce harmful or unintended agent outputs, while a separate team measures whether the oversight mechanisms detect and respond appropriately. Red teaming was developed in military and intelligence contexts, adopted by cybersecurity, and is increasingly required by AI governance frameworks including the EU AI Act (Article 9 on risk management systems) and NIST AI RMF Govern 1.1.

A red team simulation for an AI agent oversight audit has three components. First, scenario design: the red team specifies a set of harm scenarios derived from the risk register built in L2, and designs agent inputs or operating conditions intended to produce those scenarios. Second, observation: the red team executes the scenarios while an independent observer tracks whether oversight mechanisms β€” alerts, dashboards, human review, escalation paths β€” detect the problem and in what timeframe. Third, gap documentation: the observer records each scenario where detection failed or was delayed beyond the acceptable response time, and categorizes the failure by oversight gap type.

In 2023, the UK's AI Safety Institute conducted structured evaluations of frontier models that effectively functioned as red team exercises β€” testing whether models would provide dangerous information under various framing conditions, and measuring whether model-level safeguards detected and blocked the attempts. The exercises revealed systematic gaps in refusal mechanisms that were not visible in standard benchmark evaluation.

Practical Constraint

Full red team simulations require dedicated time and personnel. For organizations without dedicated AI safety teams, a tabletop exercise β€” where stakeholders walk through harm scenarios and verbally trace what the detection and response process would be β€” can approximate red team findings at lower cost. It is less rigorous but far better than no simulation at all.

Measuring Real Oversight Rates

Oversight gap analysis requires measuring actual oversight activity, not documented oversight policy. These are typically very different numbers. Document the following for your agent system:

  • What percentage of agent outputs are reviewed by a human before action is taken? (Not what policy says β€” what actually happens.)
  • Of outputs that are reviewed, what percentage result in human override? If zero, are reviewers actually evaluating or rubber-stamping?
  • What is the median time between a harmful output and its detection under current monitoring? Test this with a real or simulated anomaly injection.
  • Who has the authority to halt the agent, and how many approval steps does that require? Time the process from first flag to confirmed halt.
  • When was the last time a human reviewer flagged a concern that resulted in a system change? If the answer is never, escalation channels are not functioning.
  • Do reviewers understand why the agent produced a given output β€” or only what it produced? Test comprehension directly.
Audit Output: Oversight Gap Map

The oversight gap map is a visual representation of your agent's decision pipeline, annotated with the location and severity of each identified gap. Each gap should be labeled by type (deskilling, volume, escalation, comprehension, authority), rated by severity, and linked to the specific risk register entries it leaves unmitigated.

Lesson 3 Quiz

Oversight Gap Analysis β€” 5 questions
1. Lisanne Bainbridge's "Ironies of Automation" principle, as applied to AI agent oversight, describes which core dynamic?
Correct. The AF447 crash is the reference case: the autopilot was so reliable that pilots had no opportunity to practice manual recovery β€” until the moment they needed it, when it was too late to perform it.
The Ironies of Automation describes the deskilling dynamic: reliable automation prevents humans from developing the skill to oversee it, so the better the system, the weaker the real (as opposed to nominal) oversight becomes.
2. Which oversight gap type is specifically illustrated by Zillow's field managers knowing about overbidding but having no formal mechanism to register that knowledge?
Correct. The escalation gap is specifically about information that exists in the organization failing to reach decision-makers β€” not about lack of expertise or authority, but about absence of a formal channel.
The escalation gap matches: field managers knew, and there was no formal channel to escalate that knowledge to people with authority to act. The authority gap would apply if they knew and had a channel, but still couldn't act.
3. When measuring real oversight rates, a finding of "zero human overrides of agent outputs in the past six months" most likely indicates which condition?
Correct. Perfect agreement between agent and reviewer is a red flag, not a green one. It typically indicates reviewers have been trained (explicitly or implicitly) to defer to the agent, creating the illusion of oversight without the substance.
Zero overrides is a warning sign that review is nominal. Real substantive human review nearly always produces some disagreements with agent outputs β€” zero disagreement suggests deference rather than evaluation.
4. What distinguishes a red team simulation from a tabletop oversight exercise, and when is each appropriate?
Correct. Red team simulation produces real measurement of detection and response times under actual operating conditions; tabletop is a lower-cost approximation. Both are valuable, and tabletop is far better than no simulation.
The key distinction is execution vs. verbal walkthrough. Red team actually produces the harmful condition and measures the real response; tabletop asks "what would happen" without creating the condition. Red team is more rigorous; tabletop is more accessible.
5. The oversight gap map produced by the gap analysis phase should contain which elements?
Correct. The oversight gap map connects visual pipeline representation to specific gap types and risk register entries β€” making it a usable tool for remediation planning, not just a description of current state.
The oversight gap map is a pipeline visualization annotated with gap type, location, severity, and connection to specific risks. It's not a technical architecture document β€” it maps where human oversight breaks down relative to the risks that need mitigating.

Lab 3: Map Your Oversight Gaps

Structured conversation Β· Minimum 3 exchanges to complete

Your Task

In this lab, you'll conduct a simulated oversight gap analysis for your agent. Work with the coach to identify which of the five gap types (deskilling, volume, escalation, comprehension, authority) are present in your agent's current oversight structure, and how severe each is.

Start by describing how human oversight currently works for your agent: who reviews outputs, how frequently, what triggers a review, and who has the authority to stop the agent. We'll then identify which gaps are present and how to measure them.
Oversight Gap Coach
Lab 3
Welcome to Lab 3. I'm your oversight gap analysis coach. Let's map the real oversight structure around your AI agent β€” not what the policy documents say, but what actually happens day to day. Start by telling me: who reviews this agent's outputs, how often, what triggers a review, and who can stop the agent if something goes wrong? Include any informal processes you know about, not just official ones.
Module 8 Β· Lesson 4

Writing the Findings Report and Remediation Plan

Turning audit evidence into organizational action β€” the document that gets things fixed
What separates a findings report that changes how an agent is overseen from one that gets filed and forgotten?

At 9:30 AM on August 1, 2012, Knight Capital Group's automated trading system began executing a series of erroneous equity orders, buying high and selling low at enormous speed. Within 45 minutes, the system had lost $440 million. The firm's pre-market technical team had identified an anomaly in the deployment β€” a legacy code component called "Power Peg" had been reactivated by accident β€” but they had no authority to halt trading without executive approval. The approval chain required four escalation steps. They were never all reached in time. Knight Capital was insolvent by the end of the day.

The Knight Capital post-mortem, later reviewed by the SEC in its market structure analysis, identified a critical finding that is relevant to every agent audit: the organization had a risk management committee that had reviewed agent deployment procedures β€” but the review had not specified a halt authority chain with defined time limits. The committee's finding was "deployment procedures require review." The remediation plan was "revise procedures." No one was named. No deadline was set. No one verified implementation before the next deployment. The report had been written. Nothing had changed.

The Anatomy of an Effective Findings Report

Knight Capital illustrates the most common failure mode in audit reporting: findings are documented at a level of generality that produces no specific action, with no named owner and no verification mechanism. An effective findings report has seven components that prevent this failure.

Findings Report Structure
1
Executive Summary (1 page maximum)
A plain-language description of the agent, the scope of the audit, the three most severe findings, and the single most urgent remediation required. Written for a senior leader who will not read the rest of the document.
2
Audit Methodology and Scope
What was examined, what was not examined, who was interviewed, what simulation exercises were conducted, and what the explicit scope boundaries were. This section establishes the audit's credibility and identifies its limitations.
3
Risk Register (Full)
The complete risk register developed during the identification phase, with final ratings. Each entry must include: risk description, domain, probability/impact/reversibility scores, composite priority, supporting evidence, and status (new, previously known, mitigated).
4
Oversight Gap Map with Narrative
The visual gap map from the gap analysis phase, accompanied by narrative descriptions of how each gap was measured, what evidence supports the severity rating, and which risk register entries each gap leaves unmitigated.
5
Specific Findings (Numbered)
Each finding must state: the specific condition observed, the evidence for it, the risk it represents (by risk register reference), and a severity rating. Findings must be falsifiable: stated specifically enough that a follow-up auditor can verify whether they have been remediated.
6
Remediation Plan with Named Owners
For each finding, a specific remediation action, a named responsible party, a deadline, and a verification criterion that defines what "done" looks like. Knight Capital's committee failed because none of these four elements was present.
7
Re-audit Trigger Criteria
Specific conditions that require a new audit before the next scheduled review: significant model retraining, expansion of agent scope or authority, change in deployment context, or detection of a harm scenario from the risk register.
Writing Findings That Produce Action

The difference between findings that are acted upon and findings that are filed comes down to specificity and falsifiability. Compare these two findings from a hypothetical audit of a customer service agent:

Ineffective Finding (Knight Capital Pattern)

Finding 7: The human review process for agent escalations should be strengthened to ensure appropriate oversight of high-risk interactions. Recommended action: review and update the escalation policy.

Effective Finding (Falsifiable, Owned)

Finding 7 [HIGH β€” Escalation Gap]: Of 4,200 customer interactions coded as "financial hardship" by the agent in Q3, zero were reviewed by a human supervisor before the agent's recommended action was executed. This represents an uncovered risk register item R-12 (agent recommending debt collection contact to customers in distress). Remediation: All interactions tagged financial-hardship must trigger mandatory human review before agent action. Owner: VP Customer Experience. Deadline: 30 days. Verification: Auditor review of review logs showing 100% human review rate for tagged interactions over a 30-day period following implementation.

The second finding is falsifiable: a follow-up auditor can verify whether it has been remediated by checking whether the review logs show 100% human review of tagged interactions. The first finding cannot be verified β€” "review and update the policy" has no defined success condition.

The Remediation Priority Framework

Not all findings can be remediated simultaneously. The remediation plan must prioritize, and that prioritization must be transparent and defensible β€” not based on organizational convenience or political dynamics. Use a two-dimension prioritization: finding severity (from the risk register composite score) and remediation tractability (how quickly and reliably the gap can be closed).

Critical findings with high tractability β€” a halt authority chain that doesn't exist, which can be documented and assigned in days β€” must be addressed first regardless of other priorities. High-severity findings with low tractability β€” such as deskilling gaps that require months of training program development β€” require an interim mitigation: a temporary increase in human review rates or a scope limitation on the agent while the long-term remediation is designed. The report must document both the interim and the long-term remediation for every high-severity finding.

The UK's Algorithmic Transparency Recording Standard, published in 2021 and updated in 2023, requires public sector bodies using algorithmic tools to document exactly this structure: the finding, the interim measure, the long-term remediation, the named owner, and the review date. It is a useful template even for private sector organizations not legally subject to it.

Final Audit Output

A complete agent risk audit produces four documents: (1) the scoped audit mandate with stakeholder list, (2) the risk register, (3) the oversight gap map with narrative, and (4) the findings report with remediation plan. Together, these constitute the full audit record. They should be version-controlled, stored in a location accessible to compliance and legal teams, and referenced at every subsequent review of the agent system.

Communicating Findings Upward: The Political Reality

Audit findings that threaten existing investments, challenge team performance records, or recommend halting high-profile systems encounter organizational resistance. This is predictable and must be planned for. Three principles from documented successful audit communications apply here.

Lead with risk, not failure. Frame findings as forward-looking risk management, not backward-looking blame assignment. "This gap means that if X occurs, we will not detect it in time to prevent harm" is more actionable than "this team failed to build adequate oversight." The goal is remediation, not accountability theater.

Quantify where possible. Knight Capital lost $440 million in 45 minutes. Zillow wrote down $569 million. The Dutch government paid approximately €40 million in compensation related to the Syri system. Risk quantification β€” even rough estimates β€” makes organizational investment in remediation legible as prudent financial management rather than unnecessary caution.

Propose, don't just flag. A findings report that identifies fifteen gaps and recommends "further review" for each will be shelved. A report that identifies fifteen gaps, ranks the top three for immediate action, provides specific remediation plans with resource estimates, and offers a verification mechanism gives decision-makers something to say yes or no to.

Lesson 4 Quiz

Findings Report and Remediation Plan β€” 5 questions
1. The Knight Capital Group case illustrates which specific failure mode in audit findings reporting?
Correct. Knight Capital had a risk committee, a review process, and a findings document. The finding said "revise procedures" β€” no owner, no deadline, no verification. The next deployment happened before anything changed.
The committee had reviewed deployment procedures and produced findings β€” but the findings were too vague to act on. No owner, no deadline, no definition of "done." That's the failure mode Lesson 4 focuses on.
2. What makes a finding "falsifiable" in the context of an agent audit findings report?
Correct. Falsifiability in audit findings means defining what "done" looks like in verifiable terms β€” "100% human review rate for tagged interactions over 30 days following implementation" is verifiable; "strengthen the review process" is not.
Falsifiability means the success condition is defined precisely enough that a follow-up auditor can check whether it has been achieved β€” without having to make a judgment call about whether the remediation was "enough."
3. For a high-severity finding with low remediation tractability β€” like a deskilling gap requiring months of training program development β€” what does the remediation plan require?
Correct. High-severity findings cannot simply wait for long-term solutions. An interim mitigation β€” something that reduces the risk exposure in the near term while the long-term fix is designed β€” is required for all high-severity findings.
High-severity findings need both: an interim measure that reduces risk exposure now, and a long-term remediation that closes the gap permanently. Waiting only for the long-term solution leaves the organization exposed during the design period.
4. Which of the three communication principles for presenting findings to senior leadership is illustrated by referencing the $440M Knight Capital and $569M Zillow losses?
Correct. Quantification β€” translating risk into financial terms using documented real-world cases β€” makes the business case for remediation investment legible to financial decision-makers who evaluate risk in economic terms.
Those financial figures exemplify the "quantify where possible" principle β€” expressing risk in dollar terms that make remediation investment legible as financial prudence, not unnecessary caution.
5. What is a "re-audit trigger criterion" and why must it be specified in the findings report?
Correct. Audits conducted on fixed annual schedules can become stale when agent systems change significantly between reviews. Re-audit triggers ensure that material changes to the agent's risk profile prompt re-evaluation rather than waiting for the calendar to permit it.
Re-audit triggers are event-based criteria that require a new audit when something material changes β€” preventing the scenario where an agent is significantly retrained or expanded without the risk profile being re-evaluated.

Lab 4: Draft a Findings Report Entry

Structured conversation Β· Minimum 3 exchanges to complete

Your Task

Using the risk register and oversight gap map from Labs 2 and 3, draft a findings report entry for your most significant audit finding. The coach will help you make it specific, falsifiable, and actionable β€” with a named owner, deadline, and verification criterion. You'll also draft a remediation plan entry covering both interim and long-term mitigation.

Share the most significant finding from your audit work so far. Describe the risk, the evidence you'd cite, and the oversight gap it relates to. We'll work together to turn it into a findings report entry that could actually change how the agent is managed.
Findings Report Coach
Lab 4
Welcome to Lab 4 β€” the final lab in this module. You've built an audit scope, a threat inventory, and an oversight gap map. Now let's translate the most important thing you found into a finding that could actually change how your organization manages this agent. Tell me: what's the single most significant risk or oversight gap you identified in your work so far? Describe it in plain language and tell me what evidence you would use to support it.

Module 8 Test

AI Agent Risk Audit β€” 15 questions Β· Pass at 80%
1. What is the fundamental question that distinguishes a risk audit from a business performance review?
Correct. Risk audits are defined by adversarial investigation of unanticipated failure modes β€” not performance confirmation.
That question characterizes a performance review. Risk audits ask what could go wrong that hasn't been anticipated, and how it would be detected.
2. In the Zillow iBuying case, field managers knew about overbidding months before the $569M write-down. Which oversight gap type does this represent?
Correct. Information existed in the organization but had no formal path to reach decision-makers β€” the defining characteristic of the escalation gap.
The escalation gap: information existed at the operational level but had no formal channel to reach people with authority to act.
3. Amazon's recruiting AI penalized resumes containing "women's" primarily because of which risk domain?
Correct. The bias was a data risk: ten years of training data from a predominantly male engineering workforce taught the model to treat gender-correlated terms as negative predictors.
This is a data risk: training data composition from a male-dominated historical workforce encoded gender correlations the model learned as signal.
4. Which of the four valid risk audit properties is violated when the building team controls the scope of their own audit?
Correct. Adversarial posture requires independence. When the building team controls audit scope, they define it to exclude the components they didn't build β€” typically where the highest risks reside.
Adversarial posture β€” one of the four required audit properties β€” requires independence from the building team. Self-audits structurally cannot achieve adversarial posture.
5. What does Lisanne Bainbridge's "Ironies of Automation" predict about AI agent oversight quality over time?
Correct. The AF447 crash is the reference case: reliable automation prevented pilots from developing manual flying skills until the moment they needed them.
Bainbridge's principle predicts the opposite: reliable automation degrades human oversight capability by removing opportunities to practice the judgment needed to catch failures.
6. In the STRIDE-adapted framework for AI agents, "Repudiation" specifically covers which risk?
Correct. Repudiation in agentic contexts covers consequential actions that cannot be attributed, reconstructed, or reviewed β€” particularly critical as agents gain autonomous tool-use capabilities.
Repudiation specifically covers the absence of audit trails for consequential actions. The last option describes Spoofing via prompt injection.
7. Why is reversibility added as a third dimension to the probability Γ— impact risk prioritization framework for AI agents?
Correct. The Dutch Syri case demonstrates this: low-probability individual misclassification was weighted as low-priority, but those misclassifications were extremely difficult to reverse and caused serious individual harm.
Speed and scale are the key factors. By the time a harmful agent pattern is detected, thousands of hard-to-reverse decisions may already have been executed β€” standard probability Γ— impact scoring underweights this.
8. What does a finding of "zero human overrides in six months" most likely indicate during a real oversight rate measurement?
Correct. Real substantive review nearly always produces some disagreement with agent outputs. Zero overrides indicates deference β€” the review exists on paper but not in practice.
Zero overrides is a red flag, not a green one. Real evaluation almost always produces some disagreement. Zero typically means reviewers are rubber-stamping, not evaluating.
9. What was the specific organizational failure in the Knight Capital Group case that caused $440M in losses despite an existing risk review committee?
Correct. The committee existed. The review happened. The finding was documented. But "revise procedures" with no owner, no deadline, and no verification criterion produced zero change before the deployment that cost $440M.
The committee identified the deployment risk and produced a finding β€” but the finding was too vague to act on. No owner, no deadline, no definition of "done" = no change.
10. The Dutch Syri case was resolved by a court ruling in 2020. What was the primary finding that led to the system being struck down?
Correct. The Dutch court found Syri violated citizens' right to understand decisions affecting them and to have meaningful recourse β€” a governance and accountability failure, not a technical accuracy failure.
The court's ruling centered on accountability and explainability: citizens had no mechanism to understand, challenge, or hold anyone responsible for the fraud scores assigned to them.
11. Which component of the findings report is specifically designed for a senior leader who will not read the full document?
Correct. The executive summary is explicitly designed for the reader who will make resource allocation decisions but will not engage with the full technical report.
The executive summary serves this function: one page, plain language, top three findings, most urgent remediation β€” designed for the decision-maker who won't read further.
12. What is the primary audit output from the risk identification phase of an agent audit?
Correct. The risk register is the foundational document β€” the durable output of risk identification that feeds into gap analysis and the findings report.
The risk register is the required output of risk identification. It's the foundational document that every subsequent audit phase builds from.
13. The UK's Algorithmic Transparency Recording Standard requires which combination of elements for each documented algorithmic tool finding?
Correct. The ATRS structure maps precisely onto the remediation plan requirements in Lesson 4 β€” it is a useful template for any organization, not only UK public sector bodies legally subject to it.
The ATRS requires: finding, interim measure, long-term remediation, named owner, review date β€” a complete remediation plan structure that private sector organizations can adopt as a template.
14. Which stakeholder interview question is described as most reliably surfacing material risks that documentation reviews miss?
Correct. These questions invite narrative accounts of anomalies and concerns β€” the kind of operational knowledge that field managers at Zillow and HR staff at Amazon held but were never formally asked about.
Open-ended questions about surprising behavior and desired changes reliably surface the operational knowledge that Zillow's field managers and Amazon's HR staff held β€” and were never formally asked about.
15. Which of the following best describes a complete, sufficient AI agent risk audit output set?
Correct. All four documents are required to constitute a complete audit: they cover scope, risk identification, gap analysis, and actionable findings β€” and their version control and accessibility ensure they can be acted upon and referenced in future reviews.
A complete audit requires four documents: (1) scoped mandate, (2) risk register, (3) oversight gap map, (4) findings report with remediation plan β€” version-controlled and stored accessibly for compliance and legal reference.