Three Samsung semiconductor engineers, working independently, each pasted proprietary source code and internal meeting notes into ChatGPT to get help with debugging and summarizing. The data left Samsung's environment immediately, ingested by OpenAI's training pipeline. Samsung had no AI-use policy in place. No framework. No governance layer. The incidents were discovered internally and reported. Samsung responded by banning ChatGPT on corporate devices within weeks β a blunt instrument applied after the fact. The damage, in terms of IP exposure, was already done.
The Samsung case is not an indictment of the engineers. It is an indictment of an organization that deployed productivity AI to thousands of workers without first answering the question: what can go wrong, and who owns the answer?
The pace of AI adoption inside enterprises consistently outstrips the pace of governance development. A 2023 IBM Institute for Business Value survey found that 42% of enterprise-scale companies had deployed AI in production, while only 20% reported having a formal AI risk management process. The gap between those two numbers is where Samsung-style incidents live.
Ad-hoc risk management β blocking a tool after an incident, issuing a memo after a bias complaint, adding a disclaimer after a regulatory inquiry β is reactive by definition. It addresses specific symptoms rather than the underlying absence of a risk-aware deployment culture. A framework, by contrast, is a prospective instrument: it asks what could go wrong before deployment, not after.
The case for formal AI risk frameworks accelerated sharply after 2022. The EU AI Act (formally adopted in 2024) requires organizations deploying high-risk AI systems to maintain documented risk management systems throughout the system lifecycle. NIST's AI Risk Management Framework (AI RMF 1.0, published January 2023) provided U.S. organizations with a voluntary but widely adopted structure. The SEC began requiring material AI-related risk disclosures. Boards started asking questions that their chief data officers could not answer off the cuff.
A risk framework is not a policy document and it is not a compliance checklist. It is a structured, repeatable process for identifying, assessing, prioritizing, mitigating, and monitoring risks β applied consistently across AI initiatives regardless of business unit, vendor, or model type.
The NIST AI RMF organizes this around four core functions: Govern, Map, Measure, and Manage. Govern establishes the organizational culture, policies, and accountability structures. Map identifies the AI system context and the harms it could cause. Measure analyzes and quantifies those risks. Manage applies treatments and monitors outcomes. These are not sequential steps β they are iterative, and they apply at every stage of the AI lifecycle.
ISO/IEC 42001, published in December 2023, goes further by creating a certifiable management system standard for AI β the AI equivalent of ISO 27001 for information security. Organizations pursuing this certification must demonstrate that AI risk management is embedded in their broader management system, not siloed in the data science team.
The Samsung leak is a low-visibility example. Higher-visibility cases illustrate larger costs. In 2023, Air Canada's AI-powered chatbot incorrectly told a bereaved passenger that bereavement fares could be claimed retroactively β a policy that did not exist. When Air Canada argued the chatbot was a "separate legal entity" responsible for its own statements, a Canadian tribunal rejected the argument. Air Canada paid. The absence of a framework for testing, auditing, and constraining chatbot outputs cost the company more than the price of building one would have.
In 2021, the Dutch government's SyRI (System Risk Indication) welfare fraud detection algorithm was struck down by a Dutch court for violating the European Convention on Human Rights β the first ruling of its kind in Europe. The government had deployed the system without a documented risk assessment for discriminatory impact. Framework absence cost the government the system entirely.
KEY PRINCIPLE
A framework does not prevent all AI failures. It ensures that when failures occur, the organization can demonstrate it exercised reasonable care β and it creates the institutional memory needed to prevent recurrence. Regulators, courts, and boards increasingly treat the presence or absence of a documented framework as a proxy for organizational intent.
Regardless of which reference standard an organization adopts, mature AI risk frameworks share six structural elements:
1. Scope definition: Which AI systems, use cases, and data flows are covered. Scope creep and scope gaps are equally dangerous.
2. Risk taxonomy: A shared vocabulary for categorizing AI risks β safety, fairness, privacy, security, reliability, explainability, regulatory compliance β so that teams across the organization assess risks consistently.
3. Risk assessment process: A defined methodology for rating likelihood and impact, ideally calibrated against historical incidents in similar systems.
4. Ownership and accountability: Named roles responsible for each category of risk, with escalation paths and board-level visibility for high-severity items.
5. Controls library: The catalogue of available mitigations β technical, procedural, contractual β that can be applied to reduce identified risks.
6. Monitoring and review cadence: Scheduled reassessments tied to model updates, data drift alerts, regulatory changes, and incident triggers.
LEADERSHIP IMPLICATION
Business leaders do not need to build frameworks themselves. They need to ask whether one exists, whether it covers the AI systems their organization is deploying, whether it is maintained actively, and whether accountability is real rather than nominal. Those four questions, asked in a board room or an executive review, tend to accelerate framework development more effectively than any consultancy mandate.
In this lab, you'll work through the foundational logic of AI risk frameworks β what they are, why they exist, and how to make the internal case for one. Use the AI assistant to stress-test your understanding, explore specific cases, or draft talking points for a leadership conversation.
The assistant understands NIST AI RMF, ISO/IEC 42001, the EU AI Act governance requirements, and documented real-world incidents (Samsung, SyRI, Air Canada, and others). Ask specific questions about framework structure or get help applying concepts to your industry context.
Amazon's machine learning team spent years building a recruiting AI intended to automate rΓ©sumΓ© screening. The system was trained on a decade of rΓ©sumΓ©s submitted to Amazon β a dataset that overwhelmingly reflected historical hiring patterns in which men dominated technical roles. By 2015, internal audits found the system was systematically downgrading rΓ©sumΓ©s from candidates who attended all-women's colleges and penalizing CVs that included the word "women's." Amazon scrapped the project in 2018. The tool had never been deployed externally, but four years of engineering effort and significant reputational exposure were the price of failing to identify fairness risk at the taxonomy stage.
The failure was not algorithmic β it was definitional. No one had formally named bias risk as a category to assess before training began. It emerged from the data, and was only discovered when auditors knew to look for it.
Risk identification is bounded by the vocabulary available to perform it. An organization that uses a risk taxonomy limited to "security" and "compliance" will systematically miss fairness, environmental, and transparency risks β not because those risks don't exist, but because the taxonomy provides no slot to record them. The Amazon case is a canonical example: the organization was sophisticated enough to build the system and audit it, but the audit framework did not include a bias-risk category until the system had already been found to exhibit bias.
Modern AI risk taxonomies are considerably richer. NIST's AI RMF identifies six primary risk categories: accuracy and reliability, bias and fairness, explainability and transparency, privacy, safety, and security. The EU AI Act adds a seventh that functions as a threshold classifier: prohibited use β risks so severe that no mitigation makes deployment acceptable. ISO/IEC 42001 maps risks against stakeholder groups, requiring organizations to identify which harms accrue to which parties.
Drawing on NIST AI RMF, the EU AI Act, and the OECD AI Principles, a complete working taxonomy for enterprise AI risk should cover eight categories:
1. Performance and Reliability Risk: The AI system produces incorrect outputs at a rate that causes harm. Includes model drift, distribution shift, and edge-case failures. Reference: Tesla Autopilot fatality investigations (NHTSA, 2016β2023), where NHTSA investigated 956 crashes involving Tesla's driver assistance systems.
2. Bias and Fairness Risk: The system produces outputs that systematically disadvantage protected groups. Documented across hiring (Amazon), lending (HUD complaint against Facebook's ad targeting algorithm, 2019), and criminal justice (COMPAS recidivism tool, ProPublica analysis, 2016).
3. Privacy and Data Risk: The system processes personal data in ways that violate individual rights or regulatory requirements. Includes both training-data privacy and inference-time privacy (the ability to reconstruct personal data from model outputs).
4. Security and Adversarial Risk: The system can be manipulated by adversarial inputs, data poisoning, model extraction, or prompt injection. Increasingly relevant as AI systems are connected to enterprise data and action capabilities.
5. Transparency and Explainability Risk: The system produces decisions that affected parties cannot meaningfully understand or contest. The EU's General Data Protection Regulation (Article 22) establishes a right to explanation for automated decisions β making this a compliance risk, not just an ethical one.
6. Operational and Dependency Risk: Over-reliance on AI outputs without adequate human oversight creates single points of failure. Also includes third-party model risk: when an organization deploys a foundation model it did not train, it inherits risks it cannot fully inspect.
7. Regulatory and Legal Risk: The deployment violates existing law or anticipates regulatory requirements that have not yet been finalized. The EU AI Act creates liability exposure for prohibited and high-risk use cases with fines up to 7% of global turnover.
8. Reputational and Trust Risk: Public perception of AI system behavior causes brand damage disproportionate to direct operational harm. Air Canada's chatbot incident generated international coverage and a legal precedent; the direct financial cost was modest but the reputational signal was significant.
DOCUMENTED CASE: COMPAS, 2016
ProPublica's 2016 investigation of Northpointe's COMPAS recidivism prediction tool found that Black defendants were nearly twice as likely as white defendants to be incorrectly flagged as high risk for future offending. Northpointe disputed the methodology, but the case established that criminal justice AI systems could be simultaneously accurate in aggregate and discriminatory in effect β a finding that reshaped how fairness risk is defined across the field.
Taxonomy provides the categories; structured methods populate them. Organizations use four primary identification techniques:
Red-teaming: Deliberately adversarial probing of the AI system to identify failure modes. Microsoft and OpenAI both maintain dedicated red-team functions. In 2023, the U.S. government organized a public AI red-team exercise at DEF CON where over 2,200 participants probed major AI systems for vulnerabilities.
Algorithmic impact assessments (AIAs): Structured pre-deployment reviews modeled on privacy impact assessments. Canada's federal government mandated AIAs for all government AI deployments in 2019 via its Directive on Automated Decision-Making. The AIA requires organizations to score each deployment across harm dimensions and apply mandatory safeguards above certain thresholds.
Stakeholder harm mapping: Identifying every category of person who could be affected by the system and mapping specific harms for each group. This prevents the common error of assessing risks only from the operator's perspective.
Failure mode and effects analysis (FMEA): Adapted from engineering and quality management, FMEA asks "what can fail, how likely is it, and what is the effect?" for each component and decision point in the AI pipeline.
LEADERSHIP IMPLICATION
When reviewing an AI deployment proposal, ask to see the risk taxonomy used and the identification methods applied. If the answer covers only security and compliance, the assessment is incomplete by definition. Explicitly ask: "Has bias risk been assessed? Has explainability been assessed against our regulatory obligations? Who performed the stakeholder harm mapping?" These questions are not technical β they are managerial. They signal that your organization expects a complete taxonomy before approval.
In this lab, you'll work with the AI assistant to apply the eight-category risk taxonomy to specific AI deployment scenarios. Describe a real or hypothetical AI use case from your industry, and the assistant will help you systematically identify which risk categories apply, which are highest priority, and what identification methods would be appropriate.
The assistant understands the NIST AI RMF risk categories, the EU AI Act prohibited and high-risk classifications, Canada's Algorithmic Impact Assessment methodology, and documented cases including Amazon hiring, COMPAS, and Facebook ad targeting.
On March 18, 2018, an Uber Advanced Technologies Group self-driving vehicle struck and killed Elaine Herzberg in Tempe, Arizona β the first pedestrian fatality caused by an autonomous vehicle. Post-incident investigations by NTSB revealed that the vehicle's system had detected Herzberg 6 seconds before impact, classified her as an unknown object, then as a vehicle, then as a bicycle β and ultimately suppressed the emergency braking system because it had been deliberately disabled to prevent false-positive braking events. A human safety driver was present but distracted.
Uber had identified braking-suppression-related risks internally. The decision to disable emergency braking was a risk prioritization failure: the team had weighted false-positive braking events (a reliability nuisance) more heavily than the low-probability, catastrophic-consequence scenario of a disabled safety system failing to stop the vehicle in time. The controls that existed were not applied because the risk had been under-ranked.
Every mature risk framework requires a method for moving from a list of identified risks to a prioritized set that determines resource allocation and control application. The standard instrument is a risk matrix β a two-dimensional grid plotting likelihood (probability of occurrence) against impact (severity of consequence). Risks in the high-likelihood, high-impact quadrant receive immediate attention; low-likelihood, low-impact risks may be accepted without mitigation.
The Uber case illustrates the critical limitation of standard risk matrices for AI systems: they systematically under-prioritize rare, catastrophic events. A risk that has a 0.01% probability of occurrence but a consequence of death scores low on probability and high on impact β but in a matrix where probability is weighted equally with impact, it may be ranked below a risk that has a 40% probability of causing a minor operational delay.
AI-specific risk assessment frameworks address this through consequence-weighted scoring: multiplying impact by a severity modifier that elevates irreversible, life-altering, or legally consequential harms regardless of their estimated probability. NIST's AI RMF specifically calls out the need to treat catastrophic and irreversible harms with heightened scrutiny even when probability is uncertain or low.
A risk register is the operational document that records, for each identified AI risk: the risk category, a description of the specific risk, the assessment of likelihood and impact, the control(s) applied, the residual risk after control application, the risk owner, and the next review date.
In 2023, Goldman Sachs β under pressure from regulators regarding its use of AI in consumer financial products β disclosed in SEC filings that it maintained AI-specific risk registers reviewed quarterly by its Operational Risk Committee. This level of documentation, once optional, is rapidly becoming a regulatory expectation. The EU AI Act requires high-risk AI systems to maintain technical documentation and logs that are essentially formalized risk registers.
Risk registers serve a function beyond compliance: they create institutional memory. When a model is retrained, updated, or replaced, the risk register captures the risk history β preventing the scenario where a new team inherits a system and is unaware of the specific risks that previous mitigations were designed to address.
RISK ASSESSMENT PITFALL
Organizations frequently assess risks at deployment and treat the register as complete. AI systems degrade over time through data drift β the real-world distribution of inputs shifts away from the training distribution. A model that was low-risk at deployment may become high-risk six months later as market conditions, user behavior, or regulatory context changes. Risk registers must be scheduled for reassessment on a defined cadence, not treated as one-time documents.
A controls library is the catalogue of available risk treatments an organization can apply to reduce identified risks to acceptable residual levels. AI risk controls fall into four categories:
Technical controls: Changes to the AI system itself β differential privacy in training, output filtering, confidence thresholds below which the system escalates to human review, adversarial robustness training, model cards and datasheets, explainability layers (SHAP, LIME), and monitoring dashboards for performance drift.
Process controls: Changes to how the system is used β mandatory human review for high-stakes outputs, defined escalation procedures, required documentation before deployment, incident response protocols, and AI use policies (the control that Samsung lacked).
Contractual controls: Legal instruments that manage third-party AI risk β data processing agreements with AI vendors, indemnification clauses covering AI-generated outputs, audit rights over third-party models, and model access restrictions in enterprise license agreements. After the Air Canada chatbot incident, insurers began explicitly excluding AI-generated content from standard product liability policies in some markets β making contractual risk transfer an active concern.
Governance controls: Structural oversight mechanisms β AI ethics boards (Salesforce established its Office of Ethical and Humane Use in 2019), model review committees, mandatory impact assessments before deployment, and board-level AI risk reporting.
Control selection should be proportionate to residual risk after a first-pass assessment. The EU AI Act provides a useful external calibration: it mandates specific controls for high-risk systems (human oversight, technical robustness, accuracy and robustness testing, transparency and logging) and prohibits certain risk categories entirely (social scoring by governments, real-time biometric surveillance in public spaces with narrow exceptions).
For business leaders, the practical test is: if this control fails, what is the worst realistic outcome? If the answer involves irreversible harm to individuals β job loss from biased hiring AI, denial of credit, incorrect medical recommendation, safety system failure β the control tier must be elevated regardless of estimated probability. If the answer involves reversible operational disruption, a lighter control tier is proportionate.
Microsoft's Responsible AI Standard (published in its current form in 2022) provides one of the most detailed public examples of a controls library applied to specific risk categories. For each of its six responsible AI principles, Microsoft specifies the technical, process, and governance controls required at different severity levels β a model organizations can adapt rather than build from scratch.
LEADERSHIP IMPLICATION
Ask your AI teams to show you the risk register for any significant AI deployment and the controls applied against the top five risks. Then ask one question about each control: "If this control failed tomorrow, how would we know?" Controls that cannot be monitored for failure are not controls β they are assumptions. The ability to answer that question for each critical control is a meaningful signal of framework maturity.
In this lab, you'll work through the practical mechanics of risk assessment and controls selection with the AI assistant. You can describe a specific AI deployment scenario and work through a simplified risk register together, or explore the controls library in depth β asking about specific technical, process, contractual, or governance controls and when each is appropriate.
The assistant can reference the EU AI Act's mandatory controls for high-risk systems, Microsoft's Responsible AI Standard controls library, NIST AI RMF measurement and management guidance, and documented cases including the Uber ATG incident and Goldman Sachs AI risk register practices.
On February 6, 2023, Google published a promotional video for its new AI chatbot Bard in which the system answered a question about the James Webb Space Telescope by incorrectly stating that the telescope had taken "the very first pictures of a planet outside of our own solar system." NASA astronomers pointed out publicly that this was false β the first exoplanet images were taken in 2004 and 2008. The correction spread rapidly. Google's share price fell approximately 8% in the days following the launch, erasing roughly $100 billion in market capitalization. Google had apparently not conducted adequate factual accuracy testing before the public demonstration.
The Bard incident illustrates a governance failure distinct from the technical failures seen in earlier cases: it was a failure of accountability structure. Someone had to have been responsible for pre-launch testing. The promotional video had to have been approved. Either the review process did not include factual accuracy verification, or it did and the results were not escalated. Governance failures are process failures with named owners who did not own the process.
A risk framework without clear accountability is an organizational decoration. The question of who owns AI risk has been answered differently across industries and organizational sizes, but three structural models have emerged as dominant:
The Centralized Model: A dedicated AI risk function β reporting to the Chief Risk Officer or Chief Technology Officer β owns the framework, conducts assessments, and holds veto authority over high-risk deployments. IBM, which publishes annual AI ethics progress reports, uses a variant of this model through its AI Ethics Board. The advantage is consistency; the disadvantage is that central functions can become bottlenecks and lose context on specific business unit deployments.
The Federated Model: Each business unit owns AI risk for its deployments, with the central risk function providing the framework, taxonomy, and oversight. Microsoft's Responsible AI Standard operates this way: business units must comply with the standard, but responsibility for application sits with the product teams. The advantage is speed and contextual depth; the disadvantage is inconsistent application across units.
The Embedded Model: AI risk roles are built into product and engineering teams β "responsible AI leads" or "AI safety engineers" β who perform assessments as part of the build-deploy cycle. This model is increasingly common in organizations deploying AI at scale. It is the fastest but requires the highest level of training investment to ensure embedded roles maintain framework fidelity.
The question of whether AI risk reaches the board has been answered definitively by regulation. The SEC's 2023 cybersecurity disclosure rules require public companies to disclose material cybersecurity incidents and describe their cybersecurity risk management processes. AI incidents are increasingly material cybersecurity events. The EU AI Act requires that providers of high-risk AI systems ensure human oversight at a level appropriate to the risk β which, for enterprise-scale deployments, extends to board-level reporting structures.
In 2023, Anthropic published a responsible scaling policy that included explicit board-level commitments: if the company's AI systems reached defined capability thresholds, specific governance and safety measures would be triggered regardless of commercial considerations. This was notable because it created a documented accountability structure in which the board β not product teams β held the trigger authority for certain risk responses.
For most organizations, the minimum board-level AI governance requirement is: a named executive who reports to the board on AI risk at a defined cadence; a threshold above which AI incidents require board notification; and a process for the board to understand and approve material AI deployments before they go live. This is not the board managing technical details β it is the board ensuring that someone below them is genuinely responsible and genuinely empowered.
DOCUMENTED CASE: FTC vs. RITE AID, 2023
In December 2023, the FTC banned Rite Aid from using facial recognition AI for five years after finding the system had incorrectly flagged customers as shoplifters at a rate that disproportionately affected people of color. The FTC specifically cited Rite Aid's failure to implement "reasonable procedures" to prevent harm, including lack of staff training, absence of accuracy audits, and no process for customers to dispute incorrect flags. The order is significant: the FTC treated the governance and process failures as the primary violation β not the technical error rate itself.
Framework operationalization requires four organizational mechanisms that convert written policy into practiced behavior:
Training and capability building: Every person who makes decisions about AI deployment β product managers, engineers, procurement officers, legal counsel, senior executives β needs sufficient AI risk literacy to apply the framework to their decisions. Salesforce's Trailhead platform includes AI ethics training for non-technical employees. The EU AI Act mandates training requirements for staff deploying high-risk AI systems.
Gate reviews: Defined decision points in the AI development and deployment lifecycle where risk assessment is mandatory before proceeding. Google's internal process for AI products β the "Responsible Innovation Review" β is a formalized gate review. The output of each gate is a documented risk assessment and a deployment decision with named approver.
Incident response integration: AI risk incidents must be integrated into the broader organizational incident response process, with defined escalation paths, notification timelines, and post-incident review requirements. The EU AI Act requires providers of high-risk AI systems to report serious incidents to national authorities β which means the incident response process must be AI-literate enough to identify what constitutes a reportable AI incident.
Continuous monitoring: Automated monitoring of model performance, fairness metrics, and output distributions in production β with alert thresholds that trigger human review when metrics drift beyond acceptable bounds. This converts the risk register from a static document into a living system that reflects actual system behavior.
NIST's AI RMF describes AI risk management maturity across four levels: Partial (ad-hoc, reactive), Risk Informed (risk-aware but inconsistent), Repeatable (consistent processes applied across the organization), and Adaptive (continuously improving, proactively anticipating emerging risks).
Most large enterprises in 2024 operate at the Risk Informed level: they have acknowledged AI risk as a category, have designated someone to own it, and have applied frameworks inconsistently β comprehensively for high-profile deployments, ad-hoc for others. The gap between Risk Informed and Repeatable is primarily a governance gap: the difference between having a framework and having a governance structure that ensures the framework is applied.
Achieving the Repeatable level requires exactly the mechanisms described above: training, gate reviews, incident response integration, and continuous monitoring β applied consistently, not selectively. The Adaptive level requires additionally: feedback loops that capture near-misses, structured processes for learning from external incidents at peer organizations, and proactive engagement with emerging regulatory requirements before they become compliance deadlines.
LEADERSHIP IMPLICATION
The final question for any business leader reviewing their organization's AI risk framework is not "do we have one?" β it is "is the framework actually being used?" The tests are operational: Are gate reviews happening? Are risk registers being updated after model changes? Are AI incidents being escalated through a defined process? Is the board receiving AI risk reports? A framework that cannot answer yes to all four questions is at the Risk Informed level at best β and the next significant AI incident will be treated by regulators, courts, and press as evidence that the organization knew what it should have done and chose not to do it.
In this lab, you'll work through the governance and operationalization challenges that determine whether a framework functions in practice. Use the AI assistant to design accountability structures for your organizational context, work through gate review design, or stress-test your framework against specific incident scenarios.
The assistant understands the centralized, federated, and embedded accountability models; the four operational mechanisms (training, gate reviews, incident response integration, continuous monitoring); the NIST AI RMF maturity levels; and documented cases including Rite Aid (FTC 2023), Google Bard, and Anthropic's Responsible Scaling Policy.