On the night of March 18, 2018, a self-driving Uber Volvo SUV struck and killed Elaine Herzberg in Tempe, Arizona — the first pedestrian fatality attributed to an autonomous vehicle. The NTSB investigation found the perception system had detected Herzberg 5.6 seconds before impact, classified her successively as an unknown object, a vehicle, then a bicycle, never stabilizing on the correct classification of a pedestrian pushing a bicycle. The system's emergency braking had been deliberately disabled to prevent erratic behavior. The safety operator was watching a phone video at the moment of impact.
Failures in autonomous systems rarely arise from a single cause. Researchers at institutions including MIT's Computer Science and Artificial Intelligence Laboratory and the RAND Corporation classify them into three broad categories: sensor failures, algorithmic failures, and operational design domain (ODD) violations.
Sensor failures occur when the hardware providing situational awareness produces inaccurate, delayed, or absent data. LiDAR occlusion in rain, camera washout in direct sunlight, and GPS spoofing are canonical examples. In January 2022, a self-driving Waymo vehicle in San Francisco was confused by a construction zone that had altered lane markings overnight, illustrating how quickly real environments can exit the envelope of tested conditions.
Algorithmic failures emerge when the system's decision logic produces incorrect outputs even from valid sensor data. This category includes distribution shift — when runtime conditions differ statistically from training data — as well as adversarial inputs, edge-case brittleness, and reward misspecification in reinforcement-learning systems.
ODD violations occur when the system is operated outside the conditions for which it was certified. Many Level 2 systems are designed for highway use but are misused on surface streets with cyclists, children, and unpredictable pedestrian behavior.
Though not a physical autonomous system, the Knight Capital algorithmic trading failure is the canonical case study in autonomous system failure propagation. On August 1, 2012, a dormant code path labeled "Power Peg" was accidentally reactivated during a software deployment. The system autonomously executed 4 million trades in 45 minutes, accumulating a $7 billion position. Knight lost $440 million before engineers could halt it. Root causes: lack of deployment verification, no automated kill-switch, and no rate-limit safeguard. The firm was effectively destroyed.
Aviation safety adopted James Reason's Swiss Cheese Model in the 1990s: each safety layer has holes (vulnerabilities), and an accident occurs when holes align across layers simultaneously. Autonomous systems inherit this model but add novel complexity. Unlike a pilot who can exercise situational judgment, an autonomous controller applies its decision function uniformly. If that function has a blind spot, no human intuition compensates.
The 2018 Uber case illustrates multi-layer failure: the perception algorithm failed to stabilize classification; the emergency braking system had been suppressed; the safety operator was inattentive; and Uber's safety protocols did not require two operators. Every defense layer had a hole that aligned that night.
Sensor degradation, connector corrosion, calibration drift, electromagnetic interference, thermal limits exceeded. Often latent — undetected until a high-stakes moment.
Memory leaks, race conditions, integer overflow, untested edge cases, stale model weights, incorrect dependency versions in production deployment.
Automation complacency (Tesla Autopilot fatalities), mode confusion (Air France 447), alert fatigue, and trust miscalibration — over-trusting or under-trusting the system.
Adversarial conditions outside ODD: black ice, unusual lighting, construction zones, GPS-denied environments, novel object types not in training distribution.
A 2020 RAND Corporation report, Driving to Safety, estimated that autonomous vehicles would need to drive 275 million miles — or 400 years of testing — to statistically demonstrate lower fatality rates than human drivers at a 95% confidence level. This created pressure to supplement test-mile accumulation with simulation, formal verification, and scenario-based testing.
Understanding failure taxonomy is the prerequisite for designing against failure. The subsequent lessons address how safety engineers quantify, mitigate, and verify that autonomous systems remain within acceptable risk envelopes — and what happens when those envelopes are breached.
A warehouse autonomous delivery robot has struck a human worker, causing injury. You are the safety investigator. Use the AI assistant to conduct a structured failure mode analysis — apply the taxonomy from Lesson 1 to determine root causes and contributing factors.
Between 1985 and 1987, the Therac-25 radiation therapy machine administered massive overdoses to at least six patients, killing three. The root cause was a software race condition — a type of timing-dependent bug — introduced when engineers removed hardware safety interlocks that earlier versions relied on, replacing them with software-only checks. The software had never been formally verified. The cases demonstrated that informal testing is insufficient for safety-critical systems: the race condition required a precise, rare sequence of operator keystrokes that no tester had replicated, but that real clinical staff reproduced regularly in fast-paced workflows.
Testing explores a finite subset of possible system states. Edsger Dijkstra observed in 1970: "Testing can show the presence of bugs, but never their absence." For autonomous systems operating in continuous state spaces — where sensor readings, environmental conditions, and system states combine into effectively infinite configurations — exhaustive testing is mathematically impossible.
A modern autonomous vehicle perception stack may process 50 sensor channels at 100Hz, generating state spaces that dwarf any feasible test suite. This is why the field has invested heavily in formal methods — mathematical techniques that prove properties of systems rather than merely testing instances.
Model Checking exhaustively explores all reachable states of a system model to verify that specified properties hold in every state. Tools like SPIN and UPPAAL have been used to verify communication protocols and real-time embedded systems. The limitation is state explosion: realistic system models may have more states than atoms in the observable universe, requiring abstraction techniques.
Theorem Proving uses mathematical logic to construct proofs that a system satisfies its specification. Interactive theorem provers like Coq and Isabelle/HOL have been used to verify software components of aerospace systems. In 2016, researchers used Isabelle to formally verify the seL4 microkernel — a foundational layer used in some autonomous system architectures.
Abstract Interpretation overapproximates the set of possible program behaviors to prove absence of certain classes of errors. The Astrée analyzer, developed from 2001 at INRIA, uses abstract interpretation and was used to verify absence of runtime errors in Airbus A340 and A380 primary flight control software.
ISO 26262, first published in 2011 and revised in 2018, is the international standard for functional safety in automotive systems. It defines Automotive Safety Integrity Levels (ASIL A through D), where ASIL D requires the most rigorous development and verification processes. A brake-by-wire system controlling emergency stopping must achieve ASIL D. The standard mandates both requirements-based testing and independent safety analyses including FMEA (Failure Mode and Effects Analysis) and FTA (Fault Tree Analysis).
Different industries have developed parallel safety certification frameworks. Aviation uses DO-178C for airborne software, defining five criticality levels (Level A through E). Level A software — whose failure could cause catastrophic aircraft loss — requires the most stringent development: 100% modified condition/decision coverage (MC/DC), formal reviews, and independent verification. The Boeing 737 MAX MCAS failures of 2018–2019 involved a sensor input to software that had not been adequately analyzed for single-sensor failure scenarios under the DO-178C framework.
The IEC 61508 standard governs industrial automation and is the parent standard from which ISO 26262, IEC 62061 (machinery safety), and other domain-specific standards derive. Its Safety Integrity Levels (SIL 1–4) are defined by Probability of Failure on Demand (PFD) ranges — SIL 4 requires PFD between 10⁻⁵ and 10⁻⁴ per hour of operation.
Traditional formal verification techniques were designed for deterministic, rule-based software. Neural networks used in perception and decision-making are fundamentally different: their behavior is learned from data rather than specified as rules, making them resistant to formal verification. This "verification gap" is one of the most active research areas in AI safety. Techniques being explored include abstract interpretation of neural network layers, formal robustness certification against adversarial perturbations (e.g., CROWN, α-β-CROWN), and neuro-symbolic hybrid architectures where verifiable symbolic reasoners supervise learned components.
A medical device company is preparing to certify an autonomous surgical assistant robot for FDA approval. You are consulting on the safety verification strategy. The robot makes real-time tissue-cutting decisions based on computer vision.
On June 1, 2009, Air France Flight 447 crashed into the Atlantic Ocean, killing all 228 people aboard. The initiating event was ice crystal blockage of all three Pitot tube airspeed sensors simultaneously — a known failure mode. The autopilot, losing valid airspeed data, disconnected and handed control to the crew. In the confusion, a co-pilot applied full back-stick input — a response consistent with training for low-speed stalls — while the aircraft was actually in a high-altitude aerodynamic stall caused by the exact opposite problem. The flight management system did not prevent the contradictory control input. Triple-redundant sensors had all failed simultaneously, defeating a redundancy architecture designed for independent failures.
Redundancy is the provision of multiple independent means to perform a critical function. The goal is to ensure that single — and in some designs, dual — component failures do not produce system-level failures. However, the Air France 447 case illustrates a fundamental challenge: redundancy protects against independent failures but is defeated by common-cause failures — single events that disable multiple redundant components simultaneously.
All redundant components run simultaneously. Outputs are compared; majority voting or signal selection determines the valid output. Fastest recovery — no switchover latency. Used in flight control computers (Boeing 777 uses three independent flight control computers).
Backup components are powered off until needed. Switchover occurs upon primary failure detection. Lower power consumption but adds recovery latency. Inappropriate for systems requiring continuous operation.
Redundant components use different hardware designs, software implementations, or vendors — reducing common-cause failure risk. The Airbus A320 uses two independent flight control computers running different software written by different teams in different programming languages.
Redundant components are physically separated to prevent single physical events (fire, shrapnel, wiring harness damage) from disabling all copies. Mandatory in aviation and increasingly required in automotive ASIL D components.
Waymo's fifth-generation autonomous vehicle platform (2020) includes multiple overlapping sensor modalities: LiDAR (both short- and long-range), radar, and cameras. The architecture is designed so that no single sensor failure leaves the vehicle without situational awareness — camera failure is compensated by LiDAR and radar, radar failure by LiDAR and camera. This is functional redundancy through diversity: different physical principles ensure independent failure modes. Waymo has published that its system performs continuous self-diagnostics and can initiate a minimal-risk condition (safe pullover) if sensor health degrades below threshold.
Fault tolerance is the ability of a system to continue operating — possibly in a degraded mode — following component failure. The goal of graceful degradation is to ensure that failures produce proportionally reduced capability rather than catastrophic loss of function. A self-driving vehicle that loses one camera should reduce speed and avoid complex maneuvers, not immediately stop in a live traffic lane.
The key design principle is specifying the degradation hierarchy in advance: what capabilities are lost at each fault level, and what constraints apply. This requires engineers to reason about all possible fault combinations and ensure each combination maps to a defined safe state. The SOTIF standard (ISO 21448, Safety Of The Intended Functionality) specifically addresses cases where systems fail not from hardware faults but from the intended functionality being insufficient for the encountered situation.
Every safety-critical autonomous system must define its safe state: the condition the system reaches when it cannot continue normal operation safely. For an autonomous vehicle, the safe state hierarchy typically runs: (1) continue with reduced capability, (2) pull over and stop with hazard lights, (3) execute emergency braking to stop in the current lane. Which safe state is appropriate depends on traffic conditions, vehicle speed, and the nature of the failure.
Nuclear power plants use a fail-safe design principle: control rods are held up by electromagnets; power loss causes them to fall by gravity and shut down the reactor. The Fukushima Daiichi disaster in 2011 revealed the limit of this principle — the reactors successfully shut down following the earthquake, but diesel backup generators for cooling were destroyed by the subsequent tsunami, causing fuel melt despite successful initial safe state transition.
Common-cause failures — where a single event disables multiple redundant components — are the primary threat to redundancy-based safety architectures. IEC 61508 requires analysis of β-factor (the fraction of failures attributable to common cause) and mandates design measures to minimize it: physical separation, diversity in technology, diversity in supplier, and independent power supplies. In the Air France 447 case, all three Pitot tubes were of the same model, from the same supplier, mounted in adjacent positions on the same part of the fuselage — a β-factor of effectively 1.0 for the specific failure mode of ice crystal blockage.
You are a safety architect for a startup developing an autonomous air taxi operating in urban airspace. The vehicle must navigate buildings, other aircraft, birds, and weather. You need to design a sensor redundancy architecture that handles common-cause failures.
On May 7, 2016, Joshua Brown was killed when his Tesla Model S operating in Autopilot mode struck a tractor-trailer that had turned across the highway in Williston, Florida. Tesla's system — designed for highway lane-keeping, not autonomous driving — failed to distinguish the white side of the trailer against a bright sky. The NTSB investigation found that Brown had not touched the steering wheel for 37 minutes before the crash, and that the Autopilot system had provided no warnings about attention lapse. Tesla subsequently introduced a steering torque detection system and escalating alerts for drivers who fail to demonstrate engagement. The case became foundational to ongoing debates about human-machine handoff design in Level 2 automation.
Runtime monitoring refers to the continuous, automated assessment of a system's operational state against defined safety specifications while the system is executing. Unlike pre-deployment verification, runtime monitoring operates on actual sensor data, actual environmental conditions, and actual system behavior — catching deviations that no pre-deployment analysis could have anticipated.
Runtime monitors are classified along two dimensions. Safety monitors detect when the system has entered or is approaching an unsafe state. Performance monitors detect when the system's output quality has degraded below a threshold sufficient for safe operation. A perception monitor that tracks object detection confidence and triggers degraded-mode operation when confidence falls below threshold is a performance monitor; a monitor that detects that the vehicle has crossed a lane boundary without a turn signal is a safety monitor.
The SAE J3016 autonomy levels (0–5) define five degrees of driving automation, from no automation (Level 0) through full automation (Level 5). Levels 2 and 3 are particularly challenging because they involve shared or time-shared responsibility between human and machine — and the handoff between them is a documented failure point.
Level 3 automation (Conditional Driving Automation) allows the driver to disengage from monitoring but requires them to respond to a take-over request (TOR) within a defined time. Research at Stanford and TU Delft has found that following extended periods of automation, drivers take 15–40 seconds to regain full situational awareness — a duration incompatible with emergency response requirements in many driving scenarios. Audi abandoned its Level 3 Traffic Jam Pilot (the first Level 3 system to receive regulatory approval in Germany) in 2020 due partly to concerns about liability during handoff transitions.
A 2019 study by the Insurance Institute for Highway Safety found that Tesla Autopilot users were more likely to engage in secondary tasks (phone use, eating) while driving than users of adaptive cruise control alone. A parallel MIT AgeLab study found that drivers using Tesla Autopilot took their eyes off the road for significantly longer glances than drivers in manual mode. These findings directly contributed to NHTSA's Standing General Order (June 2021) requiring manufacturers to report all crashes involving driver-assistance systems — which has collected data on over 900 Autopilot-involved crashes between 2021 and 2023.
Effective runtime monitoring requires not just detection but communication — the system must convey safety-relevant information to human supervisors in ways that produce appropriate responses. Alert design failures produce two failure modes: alert fatigue (so many alerts that operators habituate and ignore them) and startle response (so few alerts that sudden critical warnings cause panic and inappropriate response).
Air traffic control research by the FAA has documented that automation alerts contribute to pilot error in approximately 15% of incidents where crews responded incorrectly to system warnings. The standard design guidelines now require alerts to be: specific (what failed), actionable (what response is needed), timely (early enough to allow response), and prioritized (distinguishing warnings from advisories from cautions).
For autonomous vehicle remote operations centers — used by Waymo, Nuro, and others for teleoperations — operators typically monitor between 5 and 15 vehicles simultaneously. Research on attention and multiple-target tracking suggests this is near the upper limit of human cognitive capacity, especially during low-frequency, high-consequence exception events that define the safety-critical supervisory role.
A runtime safety monitor runs in parallel with the primary controller. If the monitor detects unsafe behavior, it overrides the primary and activates a safe backup controller. Used in high-assurance robotics where the backup is formally verified even if the primary is not.
Neural networks behave unpredictably on inputs far from their training distribution. Runtime OOD detectors alert when input data is anomalous, triggering human handoff or conservative fallback behavior before the neural network produces a dangerous output.
A statistically rigorous method for producing prediction intervals with guaranteed coverage. Applied in autonomous systems to provide uncertainty bounds on perception outputs — if uncertainty exceeds threshold, trigger conservative operation mode.
A candidate new software version runs in parallel with the production system, receiving the same inputs but not controlling the vehicle. Its outputs are logged and compared, allowing detection of regressions before deployment. Used extensively by Waymo and Tesla.
In AI safety research, corrigibility refers to the property of an AI system that allows it to be safely corrected, modified, or shut down by its operators — even if the system has goals or preferences. Stuart Russell and colleagues at the Berkeley Center for Human-Compatible AI argue that a key safety property is uncertainty about human preferences: a system that is uncertain what humans want will defer to human correction rather than resist it. This theoretical framework has practical implications for runtime monitoring design — systems should be designed to surface uncertainty and request human input, not to confidently act on low-confidence assessments.
Runtime monitoring and human oversight are not alternatives to rigorous pre-deployment safety engineering — they are the final layer in a defense-in-depth strategy. The most robust autonomous systems combine formal verification of critical components, hardware redundancy with diversity, formal safety standards compliance, and active runtime monitoring with well-designed human handoff mechanisms. No single technique is sufficient; safety emerges from the interaction of all layers.
A logistics company is deploying a fleet of 50 autonomous freight trucks on interstate highways. A remote operations center (ROC) will have human operators available to intervene. You must design the runtime monitoring and human oversight system.