Module 6 · Lesson 1

How AI Auditing Works

Technical audits, process audits, and outcome audits — and why each alone is insufficient

The regulator asked for an audit. The company provided documentation of their testing methodology, demographic performance metrics, and governance processes. The regulator reviewed it and found everything in order.

Six months later, a journalist found that the AI was systematically disadvantaging one demographic group in production. The audit had missed it. The question was why — and what a better audit would have looked like.

AI auditing encompasses a range of activities that assess whether AI systems are performing as intended, complying with applicable requirements, and producing outcomes that meet defined standards. The term covers substantially different practices — from automated technical testing to human review of decision processes to third-party institutional evaluation.

Technical auditing involves evaluating the AI system itself — testing accuracy, bias, robustness, and behavior across demographic subgroups and edge cases. Technical audits may include adversarial testing (attempting to cause failures), differential analysis (comparing outcomes across demographic groups), and red-teaming (simulating misuse scenarios).

Process auditing evaluates the procedures surrounding AI development and deployment — documentation practices, training data management, testing protocols, incident response, and governance structures. Process audits ask whether the organization is doing the right things, even if they do not directly evaluate whether the AI is producing good outcomes.

Outcome auditing evaluates the real-world effects of AI deployments — are the decisions the AI makes or influences producing the outcomes the organization intends? Are there disparate impacts? Are affected individuals receiving accurate information? Outcome auditing requires data beyond the AI system itself, including post-deployment monitoring of actual consequences.

Why All Three Matter

Technical audits can miss harms that only manifest in deployment. Process audits can miss systems that follow good processes but still produce harmful outcomes. Outcome audits are the most relevant but also the most resource-intensive and the hardest to connect causally to AI system behavior. Meaningful AI auditing typically requires all three.

Lesson 1 Quiz

How AI auditing works

Adversarial testing in AI auditing involves:

✓ Correct — Correct. Adversarial testing — deliberately probing for failure modes — is a core technical audit technique for identifying robustness and security weaknesses.

Adversarial testing deliberately attempts to cause AI failures — simulating misuse, edge cases, and attack scenarios to identify vulnerabilities.

Process auditing evaluates:

✓ Correct — Correct. Process audits evaluate whether the organization is following good practices — documentation, data management, testing, governance — not whether the AI is directly producing good outcomes.

Process auditing evaluates the procedures and practices around AI development and deployment — not the AI outputs themselves or their real-world consequences.

Why is outcome auditing hardest to conduct?

✓ Correct — Correct. Outcome auditing requires data from deployment contexts (not just testing), long time horizons, and difficult causal analysis connecting AI behavior to outcomes.

Outcome auditing is hardest because it requires post-deployment monitoring of real consequences — data beyond the AI system itself — and difficult causal attribution.

Why do meaningful AI audits require all three types (technical, process, outcome)?

✓ Correct — Correct. Each audit type has characteristic blind spots — combining all three catches failure modes that any single approach misses.

Each audit type has blind spots: technical audits miss deployment harms, process audits miss outcome harms, outcome audits miss internal governance. All three together provide better coverage.

Lab 1 — Audit Design

Design a meaningful AI audit for a specific system

Your Task

Choose an AI system: credit scoring model at a bank, facial recognition in a retail store, or content moderation AI at a social media company.

Design a meaningful audit for that system. Specify the technical, process, and outcome audit components. For each component: what would you test, how would you test it, and what would constitute a passing vs. failing result?

Name your AI system and give me your initial audit design. I will probe whether your design would actually catch the most likely failure modes.

AI Lab AssistantAI Audit Designer

Name your system and give me your audit design. I will push you to test whether each component would actually catch the failure modes most likely for your specific system.

Module 6 · Lesson 2

The Third-Party Audit Ecosystem

Who conducts AI audits, what they can and cannot assess, and the limits of audit-based governance

The company announced it had received an independent third-party audit of its AI system and passed with flying colors. The audit firm was paid by the company being audited. The auditors had API access but not model weights.

This is the current state of much third-party AI auditing. The limitation is not incompetence — it is structure. The auditing ecosystem has not yet developed the independence, access, and standards that financial auditing built over decades.

The Third-Party AI Audit Ecosystem

A market for third-party AI auditing has emerged — companies offering to assess AI systems independently of the organizations that built or deploy them. Understanding this ecosystem requires understanding both its promise and its substantial limitations.

Types of Third-Party Auditors

Technical AI audit firms: Companies that conduct technical assessments of AI systems — bias testing, performance evaluation, security assessment. Examples include Credo AI, Parity AI, Arthur AI, and others. These firms can run technical tests more rigorously than most internal teams, using standardized methodologies and independent judgment.

Consulting firm AI audit practices: Big Four accounting firms (Deloitte, PwC, KPMG, EY) and large consulting firms have developed AI audit practices, often extending existing risk assurance services. These firms bring existing client relationships and credibility but often have less technical AI depth than specialized firms.

Academic and civil society audits: Researchers and advocacy organizations sometimes conduct independent audits — particularly for AI systems with significant public interest implications. These audits can access information that paid auditors cannot (like testing with affected community members) and have independence from commercial relationships. But they lack standard authority or enforcement backing.

The Auditor Limitations Problem

Third-party AI auditing faces several structural challenges that limit its effectiveness: Model access: Auditors typically get limited access to the systems they audit — API access or testing environments, not full model weights or training data. This makes comprehensive technical assessment difficult. Conflict of interest: Third-party auditors are paid by the organizations they audit — creating the same structural capture problem as financial auditing. Standards gap: Unlike financial auditing, AI auditing lacks agreed-upon standards for what constitutes a compliant system. Auditor judgment substitutes for objective criteria, with significant variation. Sandbagging risk: Organizations may optimize their systems for test conditions that auditors evaluate, without improving their general-use behavior.

Analogies to Financial Auditing

Financial auditing faced similar limitations before standardization — auditors paid by the audited, inconsistent standards, inadequate access. The resolution involved mandatory standards (GAAP, IFRS), auditor independence requirements, and regulator-backed oversight. Whether AI auditing can follow a similar path is an open question. The complexity and pace of AI development makes standardization significantly harder than for financial statements.

Lesson 2 Quiz

Third-party AI audit ecosystem

The conflict-of-interest problem in third-party AI auditing is that:

✓ Correct — Correct. Auditors paid by the audited face structural capture — the same problem financial auditing faced before independence requirements were mandated.

Structural capture: third-party AI auditors are paid by the organizations being audited, creating incentives that may compromise audit independence and rigor.

The model access limitation in third-party AI auditing means:

✓ Correct — Correct. Limited model access — API or test environment rather than full model weights and training data — constrains what auditors can technically assess.

Third-party auditors typically get limited API or test environment access, not full access to model weights or training data — this limits the depth of technical assessment possible.

The sandbagging risk in AI auditing refers to:

✓ Correct — Correct. Sandbagging — known in other auditing contexts — involves optimizing for the specific tests auditors run rather than improving actual system behavior.

Sandbagging means organizations tune systems for auditor test conditions without improving general behavior — audits assess the tuned version, not the deployed version.

Academic and civil society AI audits have an advantage over commercial audits in that:

✓ Correct — Correct. Academic and civil society auditors have independence from commercial relationships and sometimes unique access to affected communities — but lack formal authority and consistent resources.

Civil society auditors have independence from commercial relationships and can engage affected communities in ways paid auditors typically cannot — but they lack regulatory authority.

Lab 2 — Audit Ecosystem Critique

Evaluate a real third-party AI audit

Your Task

Find or research a real third-party AI audit (several are publicly available — Algorithmic Justice League has conducted some; academic researchers have audited facial recognition systems; some companies have published audit results).

Critique the audit: (1) What type of audit was it (technical, process, outcome)? (2) What was the auditor's access? (3) What structural limitations affected its validity? (4) What would a more rigorous audit have done differently?

Describe the audit you found and give me your initial critique. I will push you on the structural limitations and what they mean for what the audit can and cannot tell us.

AI Lab AssistantAI Audit Critic

Describe the audit you found and your initial critique. I will push you to assess what the structural limitations mean for what the audit actually tells us.

Module 6 · Lesson 3

Regulatory Enforcement

How AI governance rules actually get enforced — and where enforcement fails

The company had violated the rule. The question was whether anyone would find out, whether there was a regulator with clear authority to act, whether there was evidence sufficient for enforcement, and whether the penalty would be worth the enforcement effort.

This calculation plays out continuously in AI governance. Enforcement is not automatic. It is resource-constrained, jurisdictionally bounded, and largely reactive.

Regulatory Enforcement: Cases and Mechanisms

How do AI governance rules get enforced in practice? Several notable cases illustrate different enforcement mechanisms and their effectiveness.

FTC Enforcement Actions

The FTC has pursued enforcement actions related to AI in several categories. Algorithmic bias in pricing — including a settlement with a rental housing algorithm that charged different prices based on protected characteristics. Deceptive AI claims — including action against companies that claimed their AI could accurately detect cancer, lying, or other conditions it could not. Data collection practices underlying AI systems — extending existing privacy and consumer protection authority into AI-specific contexts.

FTC enforcement is reactive — it responds to harms after they occur, often based on complaints or journalism. It lacks pre-deployment authority for most AI categories, meaning harms must occur before the enforcement mechanism activates.

CFPB and Financial Regulator Enforcement

Financial regulators have more developed AI enforcement mechanisms than most sectors, reflecting existing model risk management requirements. Bank regulators (OCC, Federal Reserve, FDIC) have issued guidance requiring banks to document, validate, and monitor AI models used in credit and risk decisions. CFPB has pursued enforcement against discriminatory credit algorithms. These agencies have examination authority — they can require documentation and review systems as part of regular bank examinations, not only in response to complaints.

EU AI Office Enforcement

The EU AI Office — newly established under the EU AI Act — represents a different enforcement model: a dedicated authority with primary responsibility for GPAI model oversight, investigation authority, and direct fine-levying capability. Early cases will shape precedent for how the Act is interpreted. As of 2024–2025, the Office was still in early operational stages, with enforcement cases limited but anticipated to increase.

Enforcement Gaps

Most AI enforcement operates reactively and sectorally. Consumer AI applications that cause diffuse harms, AI systems used in small business operations, and AI operating across jurisdictions often lack effective enforcement coverage. The enforcement systems that exist were designed for other contexts and adapted to AI — creating gaps that only dedicated AI regulation begins to fill.

Lesson 3 Quiz

Regulatory enforcement of AI governance

FTC AI enforcement is primarily:

✓ Correct — Correct. FTC enforcement is reactive — it cannot require pre-deployment review of most AI systems and activates after harms occur.

FTC enforcement is reactive — it responds to AI harms after they occur, typically based on consumer complaints or investigative journalism.

Financial regulators (OCC, Federal Reserve, FDIC) have more developed AI enforcement than most sectors because:

✓ Correct — Correct. Bank examination authority allows financial regulators to review AI systems as part of regular examinations — proactive oversight, not just reactive enforcement.

Financial regulators have examination authority — they can review AI documentation and systems during regular bank examinations, enabling proactive oversight rather than only complaint-based response.

The EU AI Office represents a different enforcement model because:

✓ Correct — Correct. The EU AI Office is a dedicated AI regulator — not an extension of existing sector regulation — with focused jurisdiction over GPAI models.

The EU AI Office is a dedicated AI authority specifically established for GPAI model oversight, with investigation authority and direct fine-levying capability.

Which of the following is a significant AI enforcement gap?

✓ Correct — Correct. Consumer AI applications, small business AI, and cross-jurisdictional AI systems often lack effective enforcement coverage.

Consumer AI applications, AI causing diffuse harms, and cross-jurisdictional systems are significant enforcement gaps — existing enforcement was designed for other contexts and adapted to AI.

Lab 3 — Enforcement Gap Analysis

Identify a specific AI system that falls through enforcement gaps and design a remedy

Your Task

Choose an AI application with significant potential harms that you believe operates in an enforcement gap — where no existing regulator has clear authority and effective enforcement mechanisms.

(1) Map which regulators have potential jurisdiction and why each has limitations. (2) Describe the harm that the gap enables. (3) Propose a specific enforcement mechanism that would address the gap — who would have authority, what would trigger enforcement, what remedies would be available.

Name your AI application and start with the jurisdiction mapping. I will probe your claims about enforcement gaps and push you on your proposed remedy.

AI Lab AssistantAI Enforcement Gap Analyst

Name your AI application and give me your jurisdiction map. I will challenge your gap identification and proposed remedy.

Module 6 · Lesson 4

Building a Compliance Program

How organizations build AI governance compliance programs that actually work

The new Chief AI Officer asked for a list of all AI systems in production. No one could produce one. Different systems had been built by different teams, acquired from different vendors, and deployed across different business units.

That was the first compliance gap — not a policy gap, not a technical gap. A visibility gap. And it is where most AI compliance programs should start.

What Is an AI Compliance Program?

A compliance program for AI governance is not simply a collection of policies — it is a system of processes, controls, documentation, and accountability that enables an organization to consistently meet its governance obligations and demonstrate that it has done so. Drawing on analogies from established compliance fields (privacy, financial regulation, environmental compliance), AI compliance programs share common structural elements.

Core Elements of an AI Compliance Program

AI Inventory: A systematic registry of AI systems in use within the organization — what they do, where they are deployed, what data they use, and what risk classification they carry. Without knowing what AI is deployed, governance is impossible.

Risk Classification: A process for assessing each AI system against risk criteria — probability of harm, severity of potential harm, affected population, reversibility of decisions. Risk classification determines what governance requirements apply to each system.

Pre-deployment Review: A structured process for evaluating new AI systems before deployment — covering technical testing, data governance review, documentation review, ethics assessment, and governance sign-off. The review threshold and rigor should scale with risk classification.

Ongoing Monitoring: Post-deployment performance monitoring with defined metrics, alert thresholds, and escalation processes. Compliance is not a one-time certification — AI systems drift, contexts change, and new failure modes emerge.

Incident Management: A defined process for handling AI system failures — who is responsible for identification, escalation, investigation, remediation, and reporting (internal and, where required, regulatory).

Documentation and Records: Systematic maintenance of governance documentation — design decisions, testing results, risk assessments, incident records, and governance approvals — sufficient to demonstrate compliance to regulators or internal auditors.

Starting Small

Organizations without mature AI compliance programs often struggle to know where to begin. The most effective starting point is typically the AI inventory — you cannot govern what you cannot see. A complete inventory, even without sophisticated governance attached, reveals where the highest-risk systems are and where to invest first.

Lesson 4 Quiz

Building an AI compliance program

The AI inventory is the starting point for compliance programs because:

✓ Correct — Correct. Without knowing what AI is deployed, governance is impossible. The inventory is the foundation on which all other compliance elements depend.

The AI inventory is foundational — without visibility into what AI systems exist and what they do, governance programs cannot be targeted or effective.

Pre-deployment review in an AI compliance program should:

✓ Correct — Correct. Review rigor should scale with risk — a spam filter and a hiring algorithm should not face identical pre-deployment review requirements.

Pre-deployment review should scale with risk classification — applying the same rigor to all AI systems wastes resources on low-risk systems and underprotects against high-risk ones.

Ongoing monitoring matters for AI compliance because:

✓ Correct — Correct. AI systems change over time — drift, context shifts, new failure modes. Ongoing monitoring is not optional if compliance is to be meaningful.

AI systems drift, deployment contexts change, and new failure modes emerge post-deployment. Ongoing monitoring is essential because compliance is not a one-time certification.

Incident management in an AI compliance program includes:

✓ Correct — Correct. Incident management covers the full response lifecycle — identification through remediation — with both internal and regulatory reporting where required.

AI incident management covers the full response lifecycle: identification, escalation, investigation, remediation, and reporting (both internal and regulatory where required).

Lab 4 — Compliance Program Design

Build an AI compliance program for a specific organization

Your Task

Choose an organization type: a mid-sized bank, a healthcare system, a large social media company, or a government agency using AI in benefits administration.

Design the core elements of an AI compliance program for that organization: inventory approach, risk classification criteria, pre-deployment review process, monitoring approach, and incident management. Identify the three biggest implementation challenges for your design.

Name your organization type and give me your compliance program design. I will probe each element and push you on implementation challenges.

AI Lab AssistantAI Compliance Program Designer

Name your organization and give me your compliance program design. I will push you on whether each element is appropriately scaled for your organization type and on realistic implementation challenges.

Module Test

15 questions · 80% to pass

Technical AI auditing involves:

✓ Correct — Correct.

Technical auditing evaluates the AI system itself — accuracy, bias across demographic groups, robustness, security — not policies or outcomes.

Process auditing in AI governance evaluates:

✓ Correct — Correct.

Process auditing evaluates procedures and practices around AI development and deployment — not the AI outputs themselves.

Outcome auditing is hardest to conduct because:

✓ Correct — Correct.

Outcome auditing requires post-deployment monitoring of real consequences and difficult causal analysis — data and methods beyond what technical or process audits require.

The conflict-of-interest problem in third-party AI auditing is:

✓ Correct — Correct.

Structural capture: third-party AI auditors are paid by the audited organization, creating incentives that may compromise independence and rigor.

Sandbagging in AI auditing means:

✓ Correct — Correct.

Sandbagging: organizations tune systems for specific auditor tests without improving actual deployed behavior — audits assess the tuned version.

FTC AI enforcement is primarily characterized as:

✓ Correct — Correct.

FTC AI enforcement is reactive — it cannot require pre-deployment review and activates after harms occur.

Bank regulators have more developed AI enforcement than most sectors because of:

✓ Correct — Correct.

Bank examination authority allows proactive AI oversight — reviewing documentation during regular examinations, not only responding to complaints.

The EU AI Office represents a different enforcement model because:

✓ Correct — Correct.

The EU AI Office is a dedicated AI regulator — not adapted sector regulation — with GPAI-specific jurisdiction and direct enforcement authority.

An AI inventory is the starting point for compliance programs because:

✓ Correct — Correct.

The AI inventory is foundational — without visibility into what AI systems exist, governance programs cannot be targeted or effective.

Pre-deployment review in AI compliance should:

✓ Correct — Correct.

Pre-deployment review should scale with risk classification — applying identical review to all AI wastes resources on low-risk systems and underprotects against high-risk ones.

Ongoing monitoring matters for AI compliance because:

✓ Correct — Correct.

AI systems drift and contexts change post-deployment. Ongoing monitoring is essential because compliance cannot be a one-time certification.

Academic and civil society AI auditors have an advantage in:

✓ Correct — Correct.

Civil society auditors have independence from commercial relationships and can engage affected communities — but lack formal authority and consistent resources.

AI incident management covers:

✓ Correct — Correct.

AI incident management covers the full response lifecycle — identification through remediation — with both internal and regulatory reporting where required.

A significant AI enforcement gap exists for:

✓ Correct — Correct.

Consumer AI applications, AI causing diffuse harms, and cross-jurisdictional systems are major enforcement gaps — existing enforcement was designed for other contexts.

Why do meaningful AI audits require technical, process, AND outcome evaluation?

✓ Correct — Correct.

Each audit type misses different failure modes. Technical audits miss deployment harms. Process audits miss outcome harms. Outcome audits miss internal governance failures. All three provide better coverage.