In 1896, one year after the Lumière brothers first projected moving pictures onto a public screen in Paris, critics predicted that cinema would remain a carnival novelty — too expensive, too technically fragile, too dependent on operators who understood the machinery. By 1915, D.W. Griffith's Birth of a Nation was playing in dedicated theaters before audiences of thousands. The technology did not wait for consensus approval. It developed its own momentum, outpacing regulators, business models, and social frameworks simultaneously.
Autonomous vehicle technology is following a structurally identical arc, compressed into a shorter timeframe. In October 2020, Waymo launched the world's first fully driverless commercial robotaxi service — no safety driver, no steering wheel — in Chandler, Arizona. By late 2023, Waymo One was logging over 700,000 driverless miles per month across Phoenix, San Francisco, and Los Angeles. In parallel, Tesla's Full Self-Driving system had accumulated more than 500 million miles of supervised autonomous driving data by the same period. The infrastructure, the data flywheel, and the hardware are all scaling faster than the legal and ethical frameworks meant to govern them.
This course examines how that technology actually functions: the sensor stacks, the machine learning pipelines, the edge cases that have caused fatal accidents, and the regulatory responses now taking shape across the United States, Europe, and China. You will leave with a working technical vocabulary, a clear map of where the industry stands in 2024, and an honest sense of what remains unsolved — which, as you will see, is considerable.
If you finish every module, here's who you become:
On the morning of March 18, 2018, an Uber Advanced Technologies Group test vehicle operating in autonomous mode struck and killed Elaine Herzberg as she walked her bicycle across a four-lane road in Tempe, Arizona. The car's sensors detected her 5.6 seconds before impact. The software classified her first as an unknown object, then as a vehicle, then as a bicycle — and each reclassification reset the system's prediction of her trajectory. The car never confidently predicted she would be in its path. The safety driver was watching a video on her phone. Herzberg was struck at 39 mph without any braking. The National Transportation Safety Board's final report, published in November 2019, identified the root cause not as hardware failure but as a perception pipeline that could not handle object classifications it had not been specifically trained to expect.
That accident — the first pedestrian fatality caused by an autonomous vehicle — illuminates the central challenge of self-driving technology: sensing is not seeing, and detection is not understanding. What follows is the engineering underneath those distinctions.
Every production autonomous vehicle system as of 2024 relies on some combination of three sensor types: LiDAR (Light Detection and Ranging), radar, and cameras. Each captures a fundamentally different kind of information about the world, and each has failure modes the others can compensate for — in theory.
LiDAR fires pulses of laser light and measures the time each pulse takes to return. From millions of these measurements per second, it constructs a dense three-dimensional point cloud of the vehicle's surroundings with centimeter-level accuracy. Waymo's fifth-generation Jaguar I-PACE test fleet used a custom LiDAR array with a 360-degree field of view and a range of over 300 meters. The system cost roughly $75,000 per unit in 2018; by 2023, solid-state LiDAR modules from manufacturers like Luminar and Ouster had dropped below $1,000 at volume. The principal weakness is performance in rain, snow, and fog — water droplets scatter laser returns in ways that corrupt the point cloud.
Radar uses radio waves rather than light, which means it penetrates precipitation that defeats LiDAR and cameras alike. Radar also measures velocity directly through the Doppler effect — a critical advantage when the system needs to know not just where an object is but how fast it is moving relative to the vehicle. Automotive radar has been standard equipment on premium vehicles since the early 2000s (Mercedes-Benz introduced adaptive cruise control using radar on the S-Class in 1999). Its limitation is spatial resolution: radar returns are coarse compared to LiDAR point clouds, making it difficult to distinguish the shape or classify the type of an object.
Cameras provide the richest semantic information — color, texture, lane markings, traffic signs, and the fine-grained visual context that humans rely on. Tesla's Autopilot and Full Self-Driving systems use a camera-only architecture (eight cameras with overlapping fields of view) on the argument that if a human can drive using vision alone, a sufficiently capable neural network should be able to as well. The argument is contested: cameras require strong lighting conditions and sophisticated depth-estimation algorithms, since a 2D image does not inherently encode distance. Night performance, glare, and occlusion remain active research problems.
Tesla's camera-only approach eliminates LiDAR cost but places enormous demands on the neural network to infer 3D geometry from 2D images. Every other major autonomous vehicle developer — Waymo, Cruise, Aurora, Mobileye — uses LiDAR as a primary sensor. The industry has not reached consensus on which architecture is safer.
No single sensor type is sufficient for production autonomous driving. The engineering challenge is combining them into a unified, consistent world model — a process called sensor fusion. Fusion can happen at three levels: raw data (early fusion), independently processed feature representations (mid-level fusion), or final object detections (late fusion). Each has tradeoffs in computational cost, latency, and the ability to propagate uncertainty through the pipeline.
Waymo's approach, documented in their 2020 paper "Scalability in Perception for Autonomous Driving," fuses LiDAR point clouds and camera images at the feature level using a shared encoder architecture they call the Multimodal Sensor Fusion (MSF) network. The system explicitly represents uncertainty in its object detections, which allows downstream planning modules to make conservative decisions when the perception system is not confident — a direct engineering response to the failure mode observed in the Tempe accident.
Effective fusion requires all sensors to share a common coordinate frame and precise time synchronization. If a LiDAR return and a camera frame are offset by even 50 milliseconds, a vehicle moving at 60 mph will have traveled 1.3 meters between measurements — enough to introduce significant errors in object localization. Hardware timestamping and GPS-synchronized clocks are standard solutions, but they introduce their own failure modes in GPS-denied environments like tunnels and dense urban canyons.
The Herzberg fatality occurred because the perception system's object-classification uncertainty caused repeated trajectory-prediction resets. Better sensor fusion — specifically, maintaining consistent object tracks across classification changes — was one of the primary recommendations in the NTSB report. Perception architecture is not an abstract engineering question; it is a safety-critical design decision with documented consequences.
In this lab you will apply Lesson 1 concepts by reasoning through real sensor architecture decisions with an AI assistant trained on autonomous vehicle engineering. There are no wrong questions — the goal is to think carefully about tradeoffs.
In May 2016, a Tesla Model S operating in Autopilot mode collided with a tractor-trailer that had turned across its path near Williston, Florida, killing the driver, Joshua Brown. The car's camera failed to distinguish the white side of the trailer against a bright sky. The radar system detected the trailer but its configuration filtered out stationary overhead objects to avoid false positives from road signs and overpasses — so the trailer's height was treated as irrelevant. Neither system flagged a collision risk. Tesla's subsequent statement emphasized that Autopilot "is an assist feature that requires you to keep your hands on the steering wheel at all times." The National Highway Traffic Safety Administration investigated and closed the case without finding a safety defect, concluding that Brown had misused a Level 2 system by treating it as Level 4. That distinction — between what a system can do and what its driver-responsibility model implies — is not an engineering question. It is a legal and ethical one that the SAE taxonomy was designed to clarify.
The Society of Automotive Engineers published the J3016 standard in 2014 (revised 2021) to create a common vocabulary for automation levels. The six levels are defined not by the technology used but by who or what is responsible for monitoring the driving environment and performing the dynamic driving task.
| Level | Name | Who Drives? | Who Monitors? | Example (2024) |
|---|---|---|---|---|
| L0 | No Automation | Human | Human | Standard vehicle with no ADAS |
| L1 | Driver Assistance | Human + system (one axis) | Human | Adaptive cruise control or lane keep |
| L2 | Partial Automation | Human + system (both axes) | Human | Tesla Autopilot, GM Super Cruise |
| L3 | Conditional Automation | System | System (human on standby) | Mercedes Drive Pilot (Germany, Nevada) |
| L4 | High Automation | System | System (in defined conditions) | Waymo One (Phoenix, SF, LA) |
| L5 | Full Automation | System | System (all conditions) | Does not exist in production (2024) |
The critical divide is between Level 2 and Level 3. At Level 2, the human must monitor the driving environment at all times, even though the system handles both steering and speed. The system is not capable of requesting intervention — it simply stops working if the driver does not maintain engagement. At Level 3, the system monitors the environment and the human may disengage attention — but must be available to take over when the system requests it, typically within a defined response window (usually ten seconds).
Mercedes-Benz became the first manufacturer to receive regulatory approval for a Level 3 system in production vehicles: the Drive Pilot system, approved in Germany in December 2021 and in Nevada in January 2023. Crucially, Mercedes has accepted legal liability for accidents that occur while Drive Pilot is engaged — a precedent-setting acknowledgment that Level 3 changes the responsibility model in ways that Level 2 does not.
Tesla's Full Self-Driving (Supervised) is, as of 2024, a Level 2 system by SAE definition despite the name. The driver is legally responsible for the vehicle's behavior at all times. Between 2016 and 2023, NHTSA opened more than 40 investigations into Tesla Autopilot/FSD incidents. The naming of Level 2 systems continues to be a regulatory flashpoint, with NHTSA proposing in 2023 that the term "self-driving" be prohibited in marketing materials for systems below Level 3.
Every autonomous system, regardless of level, operates within a defined Operational Design Domain (ODD) — the specific conditions under which the system is designed to function. An ODD includes geographic boundaries, road types, speed ranges, weather conditions, and time of day. Waymo One's ODD in Phoenix in 2023 covered approximately 180 square miles of mapped territory, excluded freeways above 45 mph, and had weather restrictions excluding heavy rain. Understanding ODD boundaries is as important as understanding the automation level: a Level 4 system operating outside its ODD is functionally a zero-automation system.
The concept of ODD explains why Level 5 does not exist: it would require an ODD with no restrictions whatsoever — any road, any weather, any location, at any speed. No engineering team has publicly claimed a timeline for achieving this. Waymo's 2023 roadmap, shared in investor materials, focuses exclusively on expanding Level 4 ODD coverage rather than pursuing Level 5.
Apply the SAE taxonomy to real scenarios. Practice classifying systems, identifying ODD boundaries, and thinking through who bears responsibility at each level. The assistant will push back if your classification is off — that friction is part of the learning.
In December 2022, a Cruise robotaxi in San Francisco picked up a passenger, drove approximately one block, and then stopped in the middle of a lane — unable to continue because a construction zone had altered the road in a way the vehicle's mapping and prediction systems could not reconcile. The passenger was told by a remote operator to exit the vehicle and wait. A second Cruise vehicle was dispatched. This was not a catastrophic failure — no one was hurt — but it illustrated the gap between what autonomous vehicles can do in designed conditions and what human drivers handle without conscious deliberation thousands of times per journey. The vehicle's perception system had detected the construction correctly. Its prediction module could not generate a valid path. Its planning module, unable to proceed, defaulted to the safest available action: stop.
Every production autonomous driving system separates the computational work into three stages: perception, prediction, and planning. These stages correspond roughly to the questions "what is around me?", "what will those things do next?", and "what should I do about it?" Understanding where each stage succeeds and fails is the foundation for understanding the limits of current autonomous systems.
Perception takes raw sensor data and outputs a structured representation of the world: a list of detected objects, their positions, their dimensions, their headings, and — increasingly — their identities (pedestrian, cyclist, vehicle type). Modern perception pipelines use convolutional neural networks (CNNs) for image-based detection and point cloud processing architectures like PointNet or VoxelNet for LiDAR data.
Waymo's 2022 open dataset includes over 1,950 segments of driving data used to train and benchmark perception models. Their published detection models achieve over 95% precision on vehicles and pedestrians in clear conditions — but performance degrades meaningfully in rain, at night, and with partially occluded objects. The metric that matters is not average performance but tail performance: how the system behaves in the 1% of situations it has seen least during training.
A critical failure mode is distribution shift: when the real-world distribution of scenarios differs from the training distribution. The Uber accident's core problem — an unknown object classification — was a distribution shift failure. The training data had not adequately represented pedestrians walking bicycles at night on multilane roads.
Researchers at universities including MIT and Carnegie Mellon have demonstrated that small physical perturbations to stop signs — stickers placed in specific patterns — can cause object-detection CNNs to misclassify them as speed limit signs. These "adversarial examples" exploit the non-human nature of neural network perception: the patterns that fool a network are typically invisible to human observers. No production AV has been compromised this way in the wild, but the vulnerability category is real and documented.
Prediction takes the perception output and generates probabilistic forecasts of how detected objects will move over the next several seconds. A pedestrian at a crosswalk is predicted to have a high probability of entering the road; a vehicle approaching a red light is predicted to stop. Modern prediction models use recurrent neural networks (RNNs) or transformer architectures, incorporating not just current object states but their motion histories and contextual cues from the map — lane structure, signal states, and intersection geometry.
Waymo published their Waymo Motion Dataset in 2021, containing 570 hours of unique data and over 100,000 agent scenarios, specifically to advance research on prediction. The key challenge is that prediction is inherently uncertain: humans are not deterministic, and even expert drivers frequently make decisions that surprise other road users. Production systems must maintain multiple hypothesis tracks — the pedestrian might cross, might stop, might reverse — and plan for all of them simultaneously.
The Cruise construction-zone stoppage was a prediction failure in disguise: the system could not generate confident predictions about how the construction workers and equipment in the zone would behave, so it could not plan safely around them.
Planning takes the perception world model and prediction outputs and generates an executable trajectory: a sequence of steering, acceleration, and braking commands that move the vehicle toward its goal while respecting traffic laws, staying within road boundaries, and maintaining safe distances from other agents. Planning operates at two levels: route planning (which roads to take) and motion planning (exactly how to move through the immediate environment in the next few seconds).
Motion planning is the most computationally demanding stage. Traditional approaches like model predictive control (MPC) optimize over a finite time horizon, generating the trajectory that minimizes a cost function (travel time, comfort, safety margins). Increasingly, companies are exploring learned planning, where a neural network directly generates trajectories from perception inputs — a "end-to-end" approach that is more flexible but harder to interpret when it fails. Waymo disclosed in 2023 that their fifth-generation system uses a hybrid approach: structured planning for known scenarios, with learned components for novel situations.
When a traditional planning system makes an error, engineers can inspect the cost function and find the miscalibration. When a learned end-to-end system makes an error, the cause may be distributed across millions of weights with no human-interpretable explanation. This interpretability gap becomes a safety certification problem: regulators need to know why a system is safe, not just that it passed a test suite. As of 2024, no regulatory framework has resolved how to certify a black-box neural network for safety-critical driving decisions.
Apply the three-stage ML pipeline framework to real and hypothetical scenarios. Practice diagnosing which stage caused a failure, understanding what good prediction uncertainty looks like, and reasoning about end-to-end versus modular architectures.
On October 2, 2023, a Cruise robotaxi in San Francisco struck a pedestrian who had already been hit by a human-driven vehicle, then dragged her approximately 20 feet before stopping. The incident — captured on video and reported to the California DMV — led to Cruise's immediate suspension of all driverless operations by the California Public Utilities Commission. Within weeks, General Motors had suspended Cruise operations nationwide and initiated an internal investigation. The incident exposed two distinct failures: the vehicle's collision response (it did not brake immediately upon impact) and Cruise's reporting behavior (the company initially provided regulators with incomplete video footage, omitting the dragging sequence). By November 2023, Cruise's CEO had resigned and GM had written down $500 million in Cruise-related assets. A single accident — and the regulatory response it triggered — effectively ended the operational phase of one of the best-funded autonomous vehicle programs in the world. The question the industry is now grappling with is not whether autonomous vehicles can be safe, but how safety must be demonstrated, to whom, and under what governance structure.
Autonomous vehicle regulation in the United States is split between federal and state authority. The National Highway Traffic Safety Administration (NHTSA) has authority over vehicle safety standards but, as of 2024, has not issued binding federal regulations specifically for autonomous vehicles. Instead, NHTSA has published voluntary guidance documents — the most recent being AV 4.0 in 2020 — and relies on existing motor vehicle safety laws applied to new contexts.
States have filled the gap. California, Arizona, Nevada, Florida, and Texas have all enacted AV-specific legislation. California is the most rigorous: the DMV requires manufacturers to report all disengagements (instances where a safety driver takes over from the autonomous system) and all accidents involving autonomous vehicles on public roads. Waymo's 2023 California DMV disengagement report showed 0.0048 disengagements per thousand miles — one disengagement every 208,000 miles. Cruise reported 0.036 disengagements per thousand miles before its suspension. Human drivers have no equivalent metric, which makes direct comparison difficult.
In China, the regulatory approach is more centralized. Beijing's Intelligent Connected Vehicle policy, active since 2021, requires AV operators to obtain city-specific permits and report safety data to municipal authorities. Baidu's Apollo Go robotaxi service, operating in Wuhan, Chongqing, and Beijing, had completed over 4 million rides by mid-2023 under this framework.
Waymo reported in October 2023 that its vehicles had been involved in 18 minor accidents over 7.1 million miles of driverless operation in San Francisco — a rate of 0.0025 crashes per 100,000 miles. The US average for human drivers is approximately 1.3 crashes per 100,000 miles. But the comparison is not straightforward: Waymo's ODD is restricted to mapped urban areas at moderate speeds, the vehicle mix and road types differ, and not all accident severity levels are comparable. The industry lacks an agreed methodology for like-for-like safety comparison.
Edge cases and long-tail events. The set of rare, unusual scenarios that autonomous vehicles encounter infrequently but must handle correctly is effectively unbounded. A mattress falling from a truck, a pedestrian in a costume, emergency vehicles approaching from multiple directions simultaneously — each represents a scenario that may not appear in training data. The industry's primary solution is simulation, but simulation cannot fully replicate the complexity and stochasticity of the real world.
Scaling ODD economically. HD mapping — the centimeter-accurate maps required by most current AV systems — is enormously expensive to create and maintain. Waymo's maps cover approximately 180 square miles in Phoenix and smaller areas in other cities. The total mapped driveable area in the United States is millions of square miles. Systems that can operate without HD maps (relying on standard mapping data and real-time perception) are a major research focus, but no production system has demonstrated this at scale.
Regulatory harmonization. A vehicle approved to operate in California may not meet Arizona's standards. No international standards body has produced binding safety certification requirements for autonomous vehicles. The European Union's UNECE Working Party 29 published Regulation 157 for automated lane keeping in 2021, the first binding international regulation for any level of autonomy, but it covers only Level 3 highway driving at speeds below 60 km/h.
Public trust. A 2023 AAA survey found that 68% of Americans reported being afraid to ride in a fully self-driving vehicle — up from 58% in 2017. The Cruise incident reinforced public skepticism. Industry advocates argue that the appropriate benchmark is human driver safety (approximately 1.35 fatalities per 100 million vehicle miles in the US in 2022), not perfection. Critics argue that algorithmic systems require a higher bar than human error because they scale across millions of simultaneous deployments.
As of early 2024, Waymo is the only company operating a fully driverless commercial robotaxi service in the United States. Cruise is suspended. Aurora is preparing to launch driverless commercial trucking on Texas interstates in 2024. Tesla continues accumulating FSD miles under driver supervision. The gap between what is technically possible in constrained ODDs and what is needed for broad commercial deployment remains substantial — measured not in years of incremental progress but in unsolved research problems. That is an honest description of where the field stands, and why it matters to understand it clearly.
Apply Lesson 4 concepts by reasoning through regulatory decisions, safety measurement frameworks, and deployment ethics. The assistant will engage with nuance — there are genuinely contested questions here where experts disagree, and thinking through them carefully is the goal.