AI Risk for Business Leaders · Module 7 · Lesson 1

Why Frameworks Exist: From Ad-Hoc to Structured AI Governance

Organizations that treat AI risk as a checklist discover, usually at cost, that risk is a system — not a list.

Three Samsung semiconductor engineers, working independently, each pasted proprietary source code and internal meeting notes into ChatGPT to get help with debugging and summarizing. The data left Samsung's environment immediately, ingested by OpenAI's training pipeline. Samsung had no AI-use policy in place. No framework. No governance layer. The incidents were discovered internally and reported. Samsung responded by banning ChatGPT on corporate devices within weeks — a blunt instrument applied after the fact. The damage, in terms of IP exposure, was already done.

The Samsung case is not an indictment of the engineers. It is an indictment of an organization that deployed productivity AI to thousands of workers without first answering the question: what can go wrong, and who owns the answer?

The Gap Between Deployment and Governance

The pace of AI adoption inside enterprises consistently outstrips the pace of governance development. A 2023 IBM Institute for Business Value survey found that 42% of enterprise-scale companies had deployed AI in production, while only 20% reported having a formal AI risk management process. The gap between those two numbers is where Samsung-style incidents live.

Ad-hoc risk management — blocking a tool after an incident, issuing a memo after a bias complaint, adding a disclaimer after a regulatory inquiry — is reactive by definition. It addresses specific symptoms rather than the underlying absence of a risk-aware deployment culture. A framework, by contrast, is a prospective instrument: it asks what could go wrong before deployment, not after.

The case for formal AI risk frameworks accelerated sharply after 2022. The EU AI Act (formally adopted in 2024) requires organizations deploying high-risk AI systems to maintain documented risk management systems throughout the system lifecycle. NIST's AI Risk Management Framework (AI RMF 1.0, published January 2023) provided U.S. organizations with a voluntary but widely adopted structure. The SEC began requiring material AI-related risk disclosures. Boards started asking questions that their chief data officers could not answer off the cuff.

What a Framework Actually Is

A risk framework is not a policy document and it is not a compliance checklist. It is a structured, repeatable process for identifying, assessing, prioritizing, mitigating, and monitoring risks — applied consistently across AI initiatives regardless of business unit, vendor, or model type.

The NIST AI RMF organizes this around four core functions: Govern, Map, Measure, and Manage. Govern establishes the organizational culture, policies, and accountability structures. Map identifies the AI system context and the harms it could cause. Measure analyzes and quantifies those risks. Manage applies treatments and monitors outcomes. These are not sequential steps — they are iterative, and they apply at every stage of the AI lifecycle.

ISO/IEC 42001, published in December 2023, goes further by creating a certifiable management system standard for AI — the AI equivalent of ISO 27001 for information security. Organizations pursuing this certification must demonstrate that AI risk management is embedded in their broader management system, not siloed in the data science team.

The Cost of the Absence

The Samsung leak is a low-visibility example. Higher-visibility cases illustrate larger costs. In 2023, Air Canada's AI-powered chatbot incorrectly told a bereaved passenger that bereavement fares could be claimed retroactively — a policy that did not exist. When Air Canada argued the chatbot was a "separate legal entity" responsible for its own statements, a Canadian tribunal rejected the argument. Air Canada paid. The absence of a framework for testing, auditing, and constraining chatbot outputs cost the company more than the price of building one would have.

In 2021, the Dutch government's SyRI (System Risk Indication) welfare fraud detection algorithm was struck down by a Dutch court for violating the European Convention on Human Rights — the first ruling of its kind in Europe. The government had deployed the system without a documented risk assessment for discriminatory impact. Framework absence cost the government the system entirely.

KEY PRINCIPLE

A framework does not prevent all AI failures. It ensures that when failures occur, the organization can demonstrate it exercised reasonable care — and it creates the institutional memory needed to prevent recurrence. Regulators, courts, and boards increasingly treat the presence or absence of a documented framework as a proxy for organizational intent.

Framework Anatomy: The Six Structural Elements

Regardless of which reference standard an organization adopts, mature AI risk frameworks share six structural elements:

1. Scope definition: Which AI systems, use cases, and data flows are covered. Scope creep and scope gaps are equally dangerous.

2. Risk taxonomy: A shared vocabulary for categorizing AI risks — safety, fairness, privacy, security, reliability, explainability, regulatory compliance — so that teams across the organization assess risks consistently.

3. Risk assessment process: A defined methodology for rating likelihood and impact, ideally calibrated against historical incidents in similar systems.

4. Ownership and accountability: Named roles responsible for each category of risk, with escalation paths and board-level visibility for high-severity items.

5. Controls library: The catalogue of available mitigations — technical, procedural, contractual — that can be applied to reduce identified risks.

6. Monitoring and review cadence: Scheduled reassessments tied to model updates, data drift alerts, regulatory changes, and incident triggers.

LEADERSHIP IMPLICATION

Business leaders do not need to build frameworks themselves. They need to ask whether one exists, whether it covers the AI systems their organization is deploying, whether it is maintained actively, and whether accountability is real rather than nominal. Those four questions, asked in a board room or an executive review, tend to accelerate framework development more effectively than any consultancy mandate.

Lesson 1 Quiz

3 questions — free, untracked, retake anytime.

What was the primary governance failure illustrated by the 2023 Samsung ChatGPT incident?

✓ Correct — ✓ Correct. The core failure was organizational: Samsung had no AI governance framework in place before deployment, leaving workers without guidance on what data could be shared with external AI systems.

✗ The incident was a governance failure, not an intentional breach or vendor violation. Samsung had no AI-use policy when employees pasted proprietary content into ChatGPT — a framework gap, not individual misconduct.

The NIST AI Risk Management Framework (AI RMF 1.0) organizes its core functions as Govern, Map, Measure, and Manage. Which of these functions specifically focuses on establishing organizational culture, policies, and accountability?

✓ Correct — ✓ Correct. The Govern function in the NIST AI RMF establishes the organizational culture, policies, processes, and accountability structures within which the other three functions operate.

✗ The Govern function handles culture, policies, and accountability. Map identifies context and potential harms; Measure analyzes and quantifies risks; Manage applies treatments and monitors outcomes.

The Dutch court ruling against the SyRI welfare fraud detection system in 2021 is significant for AI risk frameworks because:

✓ Correct — ✓ Correct. The SyRI ruling was a landmark because it demonstrated that deploying an AI system without a documented discriminatory impact assessment constitutes a human rights violation — and that the cost of framework absence can be the loss of the system entirely.

✗ The SyRI ruling struck down a government AI system specifically because no documented risk assessment for discriminatory impact had been conducted — demonstrating that framework absence can result in complete system forfeiture, not just fines.

Lab 1: Framework Foundations

Explore the rationale for structured AI governance — ask anything about frameworks, governance gaps, and real-world cases.

Building the Case for a Framework in Your Organization

In this lab, you'll work through the foundational logic of AI risk frameworks — what they are, why they exist, and how to make the internal case for one. Use the AI assistant to stress-test your understanding, explore specific cases, or draft talking points for a leadership conversation.

The assistant understands NIST AI RMF, ISO/IEC 42001, the EU AI Act governance requirements, and documented real-world incidents (Samsung, SyRI, Air Canada, and others). Ask specific questions about framework structure or get help applying concepts to your industry context.

Try asking: "What's the most compelling argument I can make to a skeptical CFO that we need a formal AI risk framework, not just an acceptable-use policy?" or "How does the NIST AI RMF Govern function differ from having a Chief AI Officer?"

AI Lab Assistant FRAMEWORK FOUNDATIONS

AI Risk for Business Leaders · Module 7 · Lesson 2

Risk Identification and Taxonomy: Mapping What Can Go Wrong

You cannot manage risks you have not named. The taxonomy you choose determines the risks you see.

Amazon's machine learning team spent years building a recruiting AI intended to automate résumé screening. The system was trained on a decade of résumés submitted to Amazon — a dataset that overwhelmingly reflected historical hiring patterns in which men dominated technical roles. By 2015, internal audits found the system was systematically downgrading résumés from candidates who attended all-women's colleges and penalizing CVs that included the word "women's." Amazon scrapped the project in 2018. The tool had never been deployed externally, but four years of engineering effort and significant reputational exposure were the price of failing to identify fairness risk at the taxonomy stage.

The failure was not algorithmic — it was definitional. No one had formally named bias risk as a category to assess before training began. It emerged from the data, and was only discovered when auditors knew to look for it.

Why Taxonomy Precedes Assessment

Risk identification is bounded by the vocabulary available to perform it. An organization that uses a risk taxonomy limited to "security" and "compliance" will systematically miss fairness, environmental, and transparency risks — not because those risks don't exist, but because the taxonomy provides no slot to record them. The Amazon case is a canonical example: the organization was sophisticated enough to build the system and audit it, but the audit framework did not include a bias-risk category until the system had already been found to exhibit bias.

Modern AI risk taxonomies are considerably richer. NIST's AI RMF identifies six primary risk categories: accuracy and reliability, bias and fairness, explainability and transparency, privacy, safety, and security. The EU AI Act adds a seventh that functions as a threshold classifier: prohibited use — risks so severe that no mitigation makes deployment acceptable. ISO/IEC 42001 maps risks against stakeholder groups, requiring organizations to identify which harms accrue to which parties.

The Eight Risk Categories Business Leaders Must Know

Drawing on NIST AI RMF, the EU AI Act, and the OECD AI Principles, a complete working taxonomy for enterprise AI risk should cover eight categories:

1. Performance and Reliability Risk: The AI system produces incorrect outputs at a rate that causes harm. Includes model drift, distribution shift, and edge-case failures. Reference: Tesla Autopilot fatality investigations (NHTSA, 2016–2023), where NHTSA investigated 956 crashes involving Tesla's driver assistance systems.

2. Bias and Fairness Risk: The system produces outputs that systematically disadvantage protected groups. Documented across hiring (Amazon), lending (HUD complaint against Facebook's ad targeting algorithm, 2019), and criminal justice (COMPAS recidivism tool, ProPublica analysis, 2016).

3. Privacy and Data Risk: The system processes personal data in ways that violate individual rights or regulatory requirements. Includes both training-data privacy and inference-time privacy (the ability to reconstruct personal data from model outputs).

4. Security and Adversarial Risk: The system can be manipulated by adversarial inputs, data poisoning, model extraction, or prompt injection. Increasingly relevant as AI systems are connected to enterprise data and action capabilities.

5. Transparency and Explainability Risk: The system produces decisions that affected parties cannot meaningfully understand or contest. The EU's General Data Protection Regulation (Article 22) establishes a right to explanation for automated decisions — making this a compliance risk, not just an ethical one.

6. Operational and Dependency Risk: Over-reliance on AI outputs without adequate human oversight creates single points of failure. Also includes third-party model risk: when an organization deploys a foundation model it did not train, it inherits risks it cannot fully inspect.

7. Regulatory and Legal Risk: The deployment violates existing law or anticipates regulatory requirements that have not yet been finalized. The EU AI Act creates liability exposure for prohibited and high-risk use cases with fines up to 7% of global turnover.

8. Reputational and Trust Risk: Public perception of AI system behavior causes brand damage disproportionate to direct operational harm. Air Canada's chatbot incident generated international coverage and a legal precedent; the direct financial cost was modest but the reputational signal was significant.

DOCUMENTED CASE: COMPAS, 2016

ProPublica's 2016 investigation of Northpointe's COMPAS recidivism prediction tool found that Black defendants were nearly twice as likely as white defendants to be incorrectly flagged as high risk for future offending. Northpointe disputed the methodology, but the case established that criminal justice AI systems could be simultaneously accurate in aggregate and discriminatory in effect — a finding that reshaped how fairness risk is defined across the field.

Risk Identification Methods

Taxonomy provides the categories; structured methods populate them. Organizations use four primary identification techniques:

Red-teaming: Deliberately adversarial probing of the AI system to identify failure modes. Microsoft and OpenAI both maintain dedicated red-team functions. In 2023, the U.S. government organized a public AI red-team exercise at DEF CON where over 2,200 participants probed major AI systems for vulnerabilities.

Algorithmic impact assessments (AIAs): Structured pre-deployment reviews modeled on privacy impact assessments. Canada's federal government mandated AIAs for all government AI deployments in 2019 via its Directive on Automated Decision-Making. The AIA requires organizations to score each deployment across harm dimensions and apply mandatory safeguards above certain thresholds.

Stakeholder harm mapping: Identifying every category of person who could be affected by the system and mapping specific harms for each group. This prevents the common error of assessing risks only from the operator's perspective.

Failure mode and effects analysis (FMEA): Adapted from engineering and quality management, FMEA asks "what can fail, how likely is it, and what is the effect?" for each component and decision point in the AI pipeline.

LEADERSHIP IMPLICATION

When reviewing an AI deployment proposal, ask to see the risk taxonomy used and the identification methods applied. If the answer covers only security and compliance, the assessment is incomplete by definition. Explicitly ask: "Has bias risk been assessed? Has explainability been assessed against our regulatory obligations? Who performed the stakeholder harm mapping?" These questions are not technical — they are managerial. They signal that your organization expects a complete taxonomy before approval.

Lesson 2 Quiz

3 questions — free, untracked, retake anytime.

Amazon's AI recruiting tool (2014–2018) downgraded résumés from women's colleges and penalized the word "women's." What type of AI risk does this primarily represent?

✓ Correct — ✓ Correct. The system exhibited bias and fairness risk — it learned discriminatory patterns from historical hiring data and reproduced them systematically, disadvantaging candidates based on gender-related signals in their applications.

✗ This is a classic bias and fairness failure. The system trained on historically male-dominated hiring data and reproduced those patterns, systematically penalizing women — not because it was attacked or unreliable in aggregate, but because it encoded historical discrimination.

Canada's Directive on Automated Decision-Making (2019) mandated which risk identification tool for all federal government AI deployments?

✓ Correct — ✓ Correct. Canada's 2019 Directive on Automated Decision-Making mandated Algorithmic Impact Assessments for federal government AI deployments, requiring harm scoring across multiple dimensions with mandatory safeguards above certain thresholds.

✗ Canada's Directive mandated Algorithmic Impact Assessments (AIAs) — structured pre-deployment reviews that score each deployment across harm dimensions and require safeguards above certain thresholds. This was one of the first government mandates of its kind globally.

ProPublica's 2016 analysis of the COMPAS recidivism tool established which critical finding about AI fairness?

✓ Correct — ✓ Correct. This was COMPAS's most important finding: the tool was broadly accurate by aggregate metrics, but Black defendants were nearly twice as likely to be incorrectly flagged as high-risk compared to white defendants. It reshaped how fairness risk is defined in the field.

✗ ProPublica found that COMPAS could be accurate overall while still producing systematically different error rates by race. This showed that aggregate performance metrics are insufficient for fairness assessment — a finding that transformed how the field defines bias risk.

Lab 2: Risk Taxonomy in Practice

Apply the eight-category risk taxonomy to real AI use cases — and practice identifying risks before they become incidents.

Mapping Risks Before Deployment

In this lab, you'll work with the AI assistant to apply the eight-category risk taxonomy to specific AI deployment scenarios. Describe a real or hypothetical AI use case from your industry, and the assistant will help you systematically identify which risk categories apply, which are highest priority, and what identification methods would be appropriate.

The assistant understands the NIST AI RMF risk categories, the EU AI Act prohibited and high-risk classifications, Canada's Algorithmic Impact Assessment methodology, and documented cases including Amazon hiring, COMPAS, and Facebook ad targeting.

Try asking: "I work in financial services and we're considering deploying an AI model to score small business loan applications. Walk me through which of the eight risk categories I should prioritize and why." or "What's the difference between a red-team exercise and a stakeholder harm mapping, and when should I use each?"

AI Lab Assistant RISK TAXONOMY

AI Risk for Business Leaders · Module 7 · Lesson 3

Risk Assessment, Prioritization, and the Controls Library

Identifying every possible risk is not the goal. Knowing which risks matter most — and what to do about them — is.

On March 18, 2018, an Uber Advanced Technologies Group self-driving vehicle struck and killed Elaine Herzberg in Tempe, Arizona — the first pedestrian fatality caused by an autonomous vehicle. Post-incident investigations by NTSB revealed that the vehicle's system had detected Herzberg 6 seconds before impact, classified her as an unknown object, then as a vehicle, then as a bicycle — and ultimately suppressed the emergency braking system because it had been deliberately disabled to prevent false-positive braking events. A human safety driver was present but distracted.

Uber had identified braking-suppression-related risks internally. The decision to disable emergency braking was a risk prioritization failure: the team had weighted false-positive braking events (a reliability nuisance) more heavily than the low-probability, catastrophic-consequence scenario of a disabled safety system failing to stop the vehicle in time. The controls that existed were not applied because the risk had been under-ranked.

From Identification to Prioritization

Every mature risk framework requires a method for moving from a list of identified risks to a prioritized set that determines resource allocation and control application. The standard instrument is a risk matrix — a two-dimensional grid plotting likelihood (probability of occurrence) against impact (severity of consequence). Risks in the high-likelihood, high-impact quadrant receive immediate attention; low-likelihood, low-impact risks may be accepted without mitigation.

The Uber case illustrates the critical limitation of standard risk matrices for AI systems: they systematically under-prioritize rare, catastrophic events. A risk that has a 0.01% probability of occurrence but a consequence of death scores low on probability and high on impact — but in a matrix where probability is weighted equally with impact, it may be ranked below a risk that has a 40% probability of causing a minor operational delay.

AI-specific risk assessment frameworks address this through consequence-weighted scoring: multiplying impact by a severity modifier that elevates irreversible, life-altering, or legally consequential harms regardless of their estimated probability. NIST's AI RMF specifically calls out the need to treat catastrophic and irreversible harms with heightened scrutiny even when probability is uncertain or low.

The Risk Register: AI's Version of the Audit Trail

A risk register is the operational document that records, for each identified AI risk: the risk category, a description of the specific risk, the assessment of likelihood and impact, the control(s) applied, the residual risk after control application, the risk owner, and the next review date.

In 2023, Goldman Sachs — under pressure from regulators regarding its use of AI in consumer financial products — disclosed in SEC filings that it maintained AI-specific risk registers reviewed quarterly by its Operational Risk Committee. This level of documentation, once optional, is rapidly becoming a regulatory expectation. The EU AI Act requires high-risk AI systems to maintain technical documentation and logs that are essentially formalized risk registers.

Risk registers serve a function beyond compliance: they create institutional memory. When a model is retrained, updated, or replaced, the risk register captures the risk history — preventing the scenario where a new team inherits a system and is unaware of the specific risks that previous mitigations were designed to address.

RISK ASSESSMENT PITFALL

Organizations frequently assess risks at deployment and treat the register as complete. AI systems degrade over time through data drift — the real-world distribution of inputs shifts away from the training distribution. A model that was low-risk at deployment may become high-risk six months later as market conditions, user behavior, or regulatory context changes. Risk registers must be scheduled for reassessment on a defined cadence, not treated as one-time documents.

The Controls Library: What You Can Actually Do About Risk

A controls library is the catalogue of available risk treatments an organization can apply to reduce identified risks to acceptable residual levels. AI risk controls fall into four categories:

Technical controls: Changes to the AI system itself — differential privacy in training, output filtering, confidence thresholds below which the system escalates to human review, adversarial robustness training, model cards and datasheets, explainability layers (SHAP, LIME), and monitoring dashboards for performance drift.

Process controls: Changes to how the system is used — mandatory human review for high-stakes outputs, defined escalation procedures, required documentation before deployment, incident response protocols, and AI use policies (the control that Samsung lacked).

Contractual controls: Legal instruments that manage third-party AI risk — data processing agreements with AI vendors, indemnification clauses covering AI-generated outputs, audit rights over third-party models, and model access restrictions in enterprise license agreements. After the Air Canada chatbot incident, insurers began explicitly excluding AI-generated content from standard product liability policies in some markets — making contractual risk transfer an active concern.

Governance controls: Structural oversight mechanisms — AI ethics boards (Salesforce established its Office of Ethical and Humane Use in 2019), model review committees, mandatory impact assessments before deployment, and board-level AI risk reporting.

Control Selection: Matching Severity to Response

Control selection should be proportionate to residual risk after a first-pass assessment. The EU AI Act provides a useful external calibration: it mandates specific controls for high-risk systems (human oversight, technical robustness, accuracy and robustness testing, transparency and logging) and prohibits certain risk categories entirely (social scoring by governments, real-time biometric surveillance in public spaces with narrow exceptions).

For business leaders, the practical test is: if this control fails, what is the worst realistic outcome? If the answer involves irreversible harm to individuals — job loss from biased hiring AI, denial of credit, incorrect medical recommendation, safety system failure — the control tier must be elevated regardless of estimated probability. If the answer involves reversible operational disruption, a lighter control tier is proportionate.

Microsoft's Responsible AI Standard (published in its current form in 2022) provides one of the most detailed public examples of a controls library applied to specific risk categories. For each of its six responsible AI principles, Microsoft specifies the technical, process, and governance controls required at different severity levels — a model organizations can adapt rather than build from scratch.

LEADERSHIP IMPLICATION

Ask your AI teams to show you the risk register for any significant AI deployment and the controls applied against the top five risks. Then ask one question about each control: "If this control failed tomorrow, how would we know?" Controls that cannot be monitored for failure are not controls — they are assumptions. The ability to answer that question for each critical control is a meaningful signal of framework maturity.

Lesson 3 Quiz

3 questions — free, untracked, retake anytime.

The NTSB investigation of the 2018 Uber ATG fatality revealed that emergency braking had been deliberately disabled. What risk assessment failure does this illustrate?

✓ Correct — ✓ Correct. The Uber team had identified braking-related risks but prioritized false-positive braking (a frequent, low-severity nuisance) more heavily than the rare, catastrophic scenario of a disabled safety system failing at a critical moment — a prioritization failure typical of standard risk matrix limitations.

✗ The core failure was prioritization: the team had weighted false-positive braking events (common, low-severity) more heavily than the low-probability catastrophic scenario of a disabled safety system. This is the classic limitation of equal-weighted risk matrices when applied to rare, irreversible harms.

A risk register serves two primary functions in AI governance. Which of the following correctly states both?

✓ Correct — ✓ Correct. A risk register serves as both an audit trail (demonstrating to regulators and courts that risks were identified and addressed) and institutional memory (ensuring that when models are updated or teams change, the risk history of the system is preserved).

✗ Risk registers serve two key functions: they provide the audit trail regulators increasingly require, and they create institutional memory — ensuring that when models are retrained or teams change, the specific risks that previous controls were designed to address are not forgotten.

Which of the following is the most accurate test for whether an AI risk control is genuinely effective, as opposed to merely documented?

✓ Correct — ✓ Correct. Controls that cannot be monitored for failure are assumptions, not controls. The ability to answer "how would we know if this failed?" is a key signal of framework maturity — if the answer is "we wouldn't know until an incident occurred," the control is inadequate.

✗ Documentation, external endorsement, and absence of incidents are insufficient tests. The critical question is: "If this control failed tomorrow, how would we know?" Controls without monitoring mechanisms are assumptions. This test distinguishes mature frameworks from paper compliance.

Lab 3: Risk Assessment and Controls

Practice building a risk register and selecting controls proportionate to AI risk severity.

From Risk Register to Controls Selection

In this lab, you'll work through the practical mechanics of risk assessment and controls selection with the AI assistant. You can describe a specific AI deployment scenario and work through a simplified risk register together, or explore the controls library in depth — asking about specific technical, process, contractual, or governance controls and when each is appropriate.

The assistant can reference the EU AI Act's mandatory controls for high-risk systems, Microsoft's Responsible AI Standard controls library, NIST AI RMF measurement and management guidance, and documented cases including the Uber ATG incident and Goldman Sachs AI risk register practices.

Try asking: "Help me build a simplified risk register for an AI chatbot we're deploying for customer service — what are the top five risks, how should I rate them, and what controls apply?" or "What's the difference between a technical control and a process control for explainability risk? Give me a concrete example of each."

AI Lab Assistant RISK ASSESSMENT & CONTROLS

AI Risk for Business Leaders · Module 7 · Lesson 4

Governance, Accountability, and Operationalizing the Framework

A framework that lives in a document is not a framework. It is a liability — evidence that you knew what you should have done.

On February 6, 2023, Google published a promotional video for its new AI chatbot Bard in which the system answered a question about the James Webb Space Telescope by incorrectly stating that the telescope had taken "the very first pictures of a planet outside of our own solar system." NASA astronomers pointed out publicly that this was false — the first exoplanet images were taken in 2004 and 2008. The correction spread rapidly. Google's share price fell approximately 8% in the days following the launch, erasing roughly $100 billion in market capitalization. Google had apparently not conducted adequate factual accuracy testing before the public demonstration.

The Bard incident illustrates a governance failure distinct from the technical failures seen in earlier cases: it was a failure of accountability structure. Someone had to have been responsible for pre-launch testing. The promotional video had to have been approved. Either the review process did not include factual accuracy verification, or it did and the results were not escalated. Governance failures are process failures with named owners who did not own the process.

Accountability Structures for AI Risk

A risk framework without clear accountability is an organizational decoration. The question of who owns AI risk has been answered differently across industries and organizational sizes, but three structural models have emerged as dominant:

The Centralized Model: A dedicated AI risk function — reporting to the Chief Risk Officer or Chief Technology Officer — owns the framework, conducts assessments, and holds veto authority over high-risk deployments. IBM, which publishes annual AI ethics progress reports, uses a variant of this model through its AI Ethics Board. The advantage is consistency; the disadvantage is that central functions can become bottlenecks and lose context on specific business unit deployments.

The Federated Model: Each business unit owns AI risk for its deployments, with the central risk function providing the framework, taxonomy, and oversight. Microsoft's Responsible AI Standard operates this way: business units must comply with the standard, but responsibility for application sits with the product teams. The advantage is speed and contextual depth; the disadvantage is inconsistent application across units.

The Embedded Model: AI risk roles are built into product and engineering teams — "responsible AI leads" or "AI safety engineers" — who perform assessments as part of the build-deploy cycle. This model is increasingly common in organizations deploying AI at scale. It is the fastest but requires the highest level of training investment to ensure embedded roles maintain framework fidelity.

Board-Level Visibility: What Governance Means at the Top

The question of whether AI risk reaches the board has been answered definitively by regulation. The SEC's 2023 cybersecurity disclosure rules require public companies to disclose material cybersecurity incidents and describe their cybersecurity risk management processes. AI incidents are increasingly material cybersecurity events. The EU AI Act requires that providers of high-risk AI systems ensure human oversight at a level appropriate to the risk — which, for enterprise-scale deployments, extends to board-level reporting structures.

In 2023, Anthropic published a responsible scaling policy that included explicit board-level commitments: if the company's AI systems reached defined capability thresholds, specific governance and safety measures would be triggered regardless of commercial considerations. This was notable because it created a documented accountability structure in which the board — not product teams — held the trigger authority for certain risk responses.

For most organizations, the minimum board-level AI governance requirement is: a named executive who reports to the board on AI risk at a defined cadence; a threshold above which AI incidents require board notification; and a process for the board to understand and approve material AI deployments before they go live. This is not the board managing technical details — it is the board ensuring that someone below them is genuinely responsible and genuinely empowered.

DOCUMENTED CASE: FTC vs. RITE AID, 2023

In December 2023, the FTC banned Rite Aid from using facial recognition AI for five years after finding the system had incorrectly flagged customers as shoplifters at a rate that disproportionately affected people of color. The FTC specifically cited Rite Aid's failure to implement "reasonable procedures" to prevent harm, including lack of staff training, absence of accuracy audits, and no process for customers to dispute incorrect flags. The order is significant: the FTC treated the governance and process failures as the primary violation — not the technical error rate itself.

Operationalizing the Framework: From Document to Discipline

Framework operationalization requires four organizational mechanisms that convert written policy into practiced behavior:

Training and capability building: Every person who makes decisions about AI deployment — product managers, engineers, procurement officers, legal counsel, senior executives — needs sufficient AI risk literacy to apply the framework to their decisions. Salesforce's Trailhead platform includes AI ethics training for non-technical employees. The EU AI Act mandates training requirements for staff deploying high-risk AI systems.

Gate reviews: Defined decision points in the AI development and deployment lifecycle where risk assessment is mandatory before proceeding. Google's internal process for AI products — the "Responsible Innovation Review" — is a formalized gate review. The output of each gate is a documented risk assessment and a deployment decision with named approver.

Incident response integration: AI risk incidents must be integrated into the broader organizational incident response process, with defined escalation paths, notification timelines, and post-incident review requirements. The EU AI Act requires providers of high-risk AI systems to report serious incidents to national authorities — which means the incident response process must be AI-literate enough to identify what constitutes a reportable AI incident.

Continuous monitoring: Automated monitoring of model performance, fairness metrics, and output distributions in production — with alert thresholds that trigger human review when metrics drift beyond acceptable bounds. This converts the risk register from a static document into a living system that reflects actual system behavior.

The Maturity Model: Where Is Your Organization?

NIST's AI RMF describes AI risk management maturity across four levels: Partial (ad-hoc, reactive), Risk Informed (risk-aware but inconsistent), Repeatable (consistent processes applied across the organization), and Adaptive (continuously improving, proactively anticipating emerging risks).

Most large enterprises in 2024 operate at the Risk Informed level: they have acknowledged AI risk as a category, have designated someone to own it, and have applied frameworks inconsistently — comprehensively for high-profile deployments, ad-hoc for others. The gap between Risk Informed and Repeatable is primarily a governance gap: the difference between having a framework and having a governance structure that ensures the framework is applied.

Achieving the Repeatable level requires exactly the mechanisms described above: training, gate reviews, incident response integration, and continuous monitoring — applied consistently, not selectively. The Adaptive level requires additionally: feedback loops that capture near-misses, structured processes for learning from external incidents at peer organizations, and proactive engagement with emerging regulatory requirements before they become compliance deadlines.

LEADERSHIP IMPLICATION

The final question for any business leader reviewing their organization's AI risk framework is not "do we have one?" — it is "is the framework actually being used?" The tests are operational: Are gate reviews happening? Are risk registers being updated after model changes? Are AI incidents being escalated through a defined process? Is the board receiving AI risk reports? A framework that cannot answer yes to all four questions is at the Risk Informed level at best — and the next significant AI incident will be treated by regulators, courts, and press as evidence that the organization knew what it should have done and chose not to do it.

Lesson 4 Quiz

3 questions — free, untracked, retake anytime.

The 2023 FTC action against Rite Aid for its facial recognition AI system is most significant for AI governance because:

✓ Correct — ✓ Correct. The FTC's Rite Aid order is significant because regulators treated the absence of governance processes — training, auditing, dispute mechanisms — as the core violation. This signals that regulators evaluate the quality of the framework as much as the technical output of the system.

✗ The FTC's Rite Aid action is significant because it treated governance failures — absence of staff training, no accuracy audits, no customer dispute process — as the primary violation. This establishes that regulators are evaluating the quality of governance frameworks, not just technical performance metrics.

In the NIST AI RMF maturity model, what distinguishes the "Repeatable" level from the "Risk Informed" level?

✓ Correct — ✓ Correct. The gap between Risk Informed and Repeatable is fundamentally a governance gap: Risk Informed organizations have a framework and apply it inconsistently; Repeatable organizations have governance structures — training, gate reviews, incident response integration, monitoring — that ensure consistent application.

✗ The distinction is about consistency, not perfection. Risk Informed organizations acknowledge AI risk and apply frameworks selectively; Repeatable organizations have governance mechanisms that ensure the framework is applied consistently across all deployments. The gap is a governance gap, not a technical gap.

Anthropic's 2023 Responsible Scaling Policy was notable for AI governance because it:

✓ Correct — ✓ Correct. Anthropic's Responsible Scaling Policy was notable because it created explicit board-level accountability: if AI systems reached defined capability thresholds, specific governance measures were triggered regardless of commercial considerations — a documented structure where the board, not product teams, held trigger authority.

✗ Anthropic's policy was significant because it elevated accountability to the board level: defined capability thresholds would trigger specific governance responses regardless of commercial considerations, with the board holding trigger authority. This is a model for how board-level AI governance accountability can be operationalized.

Lab 4: Governance and Operationalization

Design accountability structures and test whether your framework would actually work under operational pressure.

From Framework Design to Real-World Deployment

In this lab, you'll work through the governance and operationalization challenges that determine whether a framework functions in practice. Use the AI assistant to design accountability structures for your organizational context, work through gate review design, or stress-test your framework against specific incident scenarios.

The assistant understands the centralized, federated, and embedded accountability models; the four operational mechanisms (training, gate reviews, incident response integration, continuous monitoring); the NIST AI RMF maturity levels; and documented cases including Rite Aid (FTC 2023), Google Bard, and Anthropic's Responsible Scaling Policy.

Try asking: "My organization has about 800 employees and is starting to deploy AI across three business units. Which accountability model — centralized, federated, or embedded — is most appropriate for our scale, and what are the key risks of each?" or "Walk me through what a gate review process should look like for an AI deployment in healthcare — what are the mandatory checkpoints and who should be in the room?"

AI Lab Assistant GOVERNANCE & OPERATIONALIZATION

Module 7 Test

15 questions. Score 80% or higher to pass. Covers all four lessons.

1. The 2023 Samsung ChatGPT incident is primarily used in AI governance education to illustrate:

✓ Correct — ✓ Correct. The Samsung case illustrates the governance failure of deploying AI at scale without prior policy — making it a canonical example of framework absence rather than individual misconduct or vendor failure.

✗ Samsung illustrates what happens when organizations deploy AI tools without governance frameworks in place — the engineers behaved predictably given no policy existed. The lesson is organizational, not individual.

2. ISO/IEC 42001, published in December 2023, differs from the NIST AI RMF in that it:

✓ Correct — ✓ Correct. ISO/IEC 42001 creates a certifiable AI management system standard — meaning organizations can be formally certified against it, similar to ISO 27001 for cybersecurity. It requires AI risk management to be integrated into the broader management system rather than siloed.

✗ ISO/IEC 42001 is notable because it created a certifiable standard — organizations can be audited and certified against it, unlike the voluntary NIST framework. It requires AI risk management embedded in the overall management system, not isolated in data science teams.

3. The EU AI Act's prohibition on government social scoring AI and real-time biometric surveillance in public spaces represents which element of a risk controls library?

✓ Correct — ✓ Correct. The EU AI Act's prohibited-use category represents the extreme end of the controls spectrum: risks where regulators have determined that no mitigation makes deployment acceptable, so the appropriate "control" is prohibition rather than mitigation.

✗ The EU AI Act's prohibited uses represent a risk classification beyond what controls can address — the judgment that certain risks are inherently unacceptable regardless of what mitigations are applied. This is a risk classification decision, not a control-selection decision.

4. An AI hiring tool produces outcomes that are 85% accurate in aggregate but incorrectly rejects qualified Black candidates at twice the rate of white candidates. According to the COMPAS precedent and modern bias risk frameworks, this system:

✓ Correct — ✓ Correct. This is exactly the COMPAS finding generalized: a system can be accurate in aggregate while producing systematically discriminatory outcomes for specific groups. Aggregate accuracy is necessary but insufficient for fairness assessment.

✗ Aggregate accuracy is not a sufficient fairness metric. COMPAS established that a system can perform well overall while producing systematically different error rates for protected groups. An 85% aggregate accuracy with 2x false-rejection rates for Black candidates represents a clear bias and fairness risk requiring mitigation.

5. Canada's 2019 Directive on Automated Decision-Making was groundbreaking because it:

✓ Correct — ✓ Correct. Canada's Directive was pioneering because it mandated AIAs for federal AI deployments — one of the first government-level mandates globally — requiring harm scoring across dimensions with safeguards calibrated to assessed risk level.

✗ Canada's Directive was among the first government-level mandates globally for Algorithmic Impact Assessments, requiring federal AI deployments to score potential harms and apply safeguards proportionate to those scores — a structured approach ahead of most other governments at the time.

6. The Dutch court's 2021 ruling against the SyRI welfare fraud detection algorithm established that:

✓ Correct — ✓ Correct. The SyRI ruling established that the absence of documented impact assessment for discriminatory harm is itself a rights violation — regardless of intent. Framework absence, when it enables rights violations, can result in prohibition of the system entirely.

✗ SyRI established that deploying a system affecting individual rights without documenting the assessment of discriminatory impact violates human rights law — intent is not a defense. The ruling struck down the system entirely, demonstrating that the cost of framework absence can be losing the capability altogether.

7. Which of the following best describes the relationship between the NIST AI RMF's four core functions (Govern, Map, Measure, Manage)?

✓ Correct — ✓ Correct. The NIST AI RMF explicitly describes its four functions as iterative and continuous — applied throughout the AI system lifecycle, not completed once at deployment. This distinguishes a living framework from a one-time assessment.

✗ NIST AI RMF describes its four functions as iterative and continuous across the full AI lifecycle. They are not sequential phases to complete before deployment, nor departmental assignments — they represent ongoing organizational activities that must be sustained as systems evolve.

8. The Google Bard promotional video incident in February 2023 (factual error about James Webb telescope, ~$100B market cap loss) illustrates which specific type of governance failure?

✓ Correct — ✓ Correct. The Bard incident was a governance accountability failure: someone was responsible for the promotional video, someone approved it, and either the accuracy review process did not exist or its results were not escalated. It illustrates that governance failures are process failures with named owners who did not own the process.

✗ Bard's error was a governance failure: a major public demonstration was approved and published without factual accuracy verification reaching the people who needed to act on it. This is an accountability structure failure — not a model capability failure, bias failure, or regulatory failure.

9. A risk that has a 0.005% probability of occurrence but would result in irreversible patient harm if a medical AI system failed is best handled using which risk assessment approach?

✓ Correct — ✓ Correct. NIST AI RMF specifically calls for heightened scrutiny of catastrophic, irreversible harms even when probability is low. Standard equal-weighted matrices systematically under-prioritize these scenarios — the Uber ATG case is a documented cost of that failure.

✗ Standard risk matrices fail for low-probability catastrophic events. NIST AI RMF recommends consequence-weighted scoring that elevates irreversible, catastrophic harms regardless of probability. The Uber ATG case demonstrated the cost of treating a low-probability catastrophic risk as low-priority because of its probability score.

10. An organization's AI risk register records risks at deployment but is never updated after the model goes live. The most significant problem this creates is:

✓ Correct — ✓ Correct. Data drift — where real-world input distributions shift away from training distributions — can silently transform a low-risk model into a high-risk one. A static risk register treats an evolving system as frozen, creating a dangerous gap between documented risk and actual risk.

✗ The core problem with static risk registers is data drift: as real-world conditions change, a model that was low-risk at deployment may become high-risk. Without scheduled reassessment, the register reflects historical risk, not current risk — and controls calibrated at deployment may be inadequate for the system's current behavior.

11. The FTC's December 2023 action against Rite Aid — a five-year ban on facial recognition AI — is most relevant to business leaders because:

✓ Correct — ✓ Correct. The Rite Aid action is a governance signal: regulators evaluated the quality of Rite Aid's AI governance processes, not just the technical performance of the system. Absence of training, auditing, and dispute processes was the basis for action — not error rate alone.

✗ The Rite Aid action established that regulators evaluate AI governance quality — training programs, audit mechanisms, dispute processes — not just technical outputs. Governance failures are independently regulable, regardless of whether the technical error rate would independently trigger enforcement.

12. In Microsoft's Responsible AI Standard, where does accountability for applying the standard primarily reside?

✓ Correct — ✓ Correct. Microsoft's Responsible AI Standard is a federated model: the central function sets the standard, but accountability for applying it rests with product teams. This enables speed and contextual depth while maintaining framework consistency.

✗ Microsoft operates a federated accountability model: the Responsible AI Standard is set centrally, but product and engineering teams are responsible for applying it to their specific deployments. This balances consistency with the contextual knowledge that business unit teams have about their specific products.

13. Which of the following is the best description of a "gate review" in AI risk governance?

✓ Correct — ✓ Correct. Gate reviews are defined lifecycle checkpoints — not periodic reviews or automated filters — where proceeding requires completing a documented risk assessment and obtaining approval from a named decision-maker. They convert framework requirements into mandatory steps in the development process.

✗ A gate review is a defined lifecycle checkpoint: before moving from design to build, or from testing to deployment, the team must complete a risk assessment and obtain named approval. This embeds framework requirements into the development process rather than treating them as separate governance activities.

14. An organization's AI risk framework covers all enterprise AI systems comprehensively. Risk registers are maintained and updated. Gate reviews are completed. Incident response is AI-literate. At which NIST AI RMF maturity level does this organization most likely operate?

✓ Correct — ✓ Correct. The described organization operates at the Repeatable level: consistent application across deployments, maintained registers, functioning gate reviews, and AI-literate incident response. Adaptive would additionally require proactive near-miss learning and anticipation of emerging regulatory requirements before they become compliance deadlines.

✗ Consistent comprehensive application, maintained registers, gate reviews, and AI-literate incident response describe the Repeatable level. Adaptive adds proactive near-miss learning, external incident learning, and regulatory anticipation. Most leading enterprises aspire to Repeatable; few have reached Adaptive.

15. A business leader reviewing an AI deployment proposal asks: "If the primary control applied to this model's fairness risk failed tomorrow, how would we know?" The team cannot answer this question. What does this most accurately indicate?

✓ Correct — ✓ Correct. Controls without monitoring mechanisms are assumptions, not controls. The inability to answer "how would we know if this failed?" reveals that the control has no detection capability — making it operationally equivalent to having no control. Deployment should not proceed until the monitoring gap is addressed.

✗ A control that cannot be monitored for failure is an assumption. If no one can describe how they would detect control failure, the control provides no real protection — only the appearance of protection. This is one of the most important diagnostic questions a business leader can ask about AI risk management, and the inability to answer it is a deployment blocker.