Lesson 1 · Module 5

How a Car Learns to See

The sensor stack that turns raw light and radio waves into a drivable understanding of the world.

What does it actually mean for a machine to "see" well enough to drive?

In October 2023, Waymo's fully autonomous robotaxis — operating with no safety driver — completed over one million paid passenger trips in San Francisco and Phoenix. Riders hailed rides through an app, climbed into a Jaguar I-PACE fitted with a roof-mounted sensor dome, and arrived at their destinations without a human hand ever touching the wheel. The vehicle's visual AI processed roughly twenty camera feeds, lidar pulses, and radar returns simultaneously, every hundredth of a second.

The Sensor Stack

No single sensor gives a self-driving car everything it needs. Production systems like Waymo's fifth-generation Driver and Cruise's AV combine three complementary modalities, each filling gaps the others leave.

Cameras are the richest source of visual information — color, texture, lane markings, traffic-light state, facial expressions on pedestrians, text on signs. They are cheap, high-resolution, and the most human-like sensor. Their weakness: performance degrades in heavy rain, glare, or darkness, and they produce no direct depth measurement. Everything 3-D must be inferred by the AI from a 2-D projection.

Lidar (Light Detection and Ranging) fires rapid pulses of laser light and times their return. The result is a precise 3-D point cloud — a real-time geometric map of every nearby surface, accurate to centimeters at 100+ meters. Waymo's custom Laser Bear Honeycomb sensors spin 360° and fire millions of pulses per second. Lidar is nearly unaffected by lighting but is expensive and can be confused by heavy precipitation.

Radar uses radio waves rather than light. It penetrates fog, rain, and snow that blind cameras and scatter lidar. It also measures velocity directly via the Doppler effect — instantly knowing whether that object ahead is stationary or moving at 60 mph. Radar resolution is low, though; it cannot read a stop sign or identify a pedestrian's pose.

Why Three Sensors?

Cameras see richly but not in 3-D. Lidar measures space precisely but costs hundreds of dollars per unit. Radar sees through weather but blurrily. The fusion of all three produces a perception layer more robust than any individual sensor — a philosophy called sensor redundancy.

Key Terms

Lidar —A ranging sensor that emits laser pulses and times their return to build centimeter-accurate 3-D point clouds of the environment.

Point Cloud —A set of data points in 3-D space, each representing a surface location detected by lidar; visualized as a constellation of colored dots.

Sensor Fusion —The process of combining inputs from multiple sensors — camera, lidar, radar — into a single unified model of the environment.

Doppler Radar —Radar that measures the frequency shift of returned signals to compute the relative velocity of detected objects.

From Pixels to Perception

Raw sensor data is nearly useless on its own. A lidar point cloud is just a cloud of numbers; a camera frame is a 2-D array of colored pixels. The perception stack — layers of convolutional neural networks and transformer models — must classify every detected object, predict its future motion, and assign a confidence score. Waymo's models are trained on data collected from tens of millions of real-world miles, plus billions of additional miles generated in simulation.

Tesla takes a different approach: cameras only, no lidar. Its Full Self-Driving system relies on a custom AI chip (the FSD Chip, first deployed in 2019) processing eight camera feeds through a neural network trained on a fleet of millions of vehicles. Tesla argues that humans drive with eyes alone, so cameras should suffice. Critics argue that human visual processing evolved over millions of years; silicon networks trained for a few years need the redundancy that lidar provides.

Real Figure

As of early 2024, Waymo's vehicles had driven over 7 million fully autonomous miles on public roads. Tesla's FSD fleet had accumulated over 500 million miles of FSD-engaged driving — though with a human safety driver present and able to intervene.

Quiz — Lesson 1

How a Car Learns to See · 3 questions

Which sensor directly measures the velocity of surrounding objects using the Doppler effect?

Correct. Radar measures frequency shift of returned radio waves to compute relative velocity — a direct measurement cameras and lidar cannot provide.

Not quite. Radar uses the Doppler effect to measure relative velocity of objects directly. Cameras and lidar require AI inference to estimate motion over time.

What is a "point cloud" in the context of autonomous vehicles?

Correct. Lidar fires millions of laser pulses per second and plots each return as a 3-D point, creating a real-time geometric map called a point cloud.

A point cloud is the 3-D set of data points produced when lidar pulses return from surfaces. It gives the vehicle a centimeter-accurate spatial model of its immediate environment.

Tesla's Full Self-Driving approach differs from Waymo's primarily because Tesla relies on:

Correct. Tesla uses a camera-only approach, arguing human drivers navigate with eyes alone. Waymo combines cameras, lidar, and radar for sensor redundancy.

Tesla's FSD system uses cameras only — no lidar. It processes eight camera feeds through a custom AI chip. The debate over camera-only vs. multi-sensor fusion is one of the key philosophical divides in AV development.

Lab 1 — The Sensor Debate

Explore trade-offs between lidar, radar, and camera-only autonomous vehicle designs

Your Mission

You are an AV systems designer advising a startup that must choose its sensor stack for a robotaxi operating in both sunny Arizona and foggy San Francisco. Consider cost, weather resilience, depth accuracy, and regulatory factors.

Start by asking: "What are the biggest real-world failure modes for camera-only self-driving systems?" Then explore trade-offs and build toward your recommendation.

AI Lab Assistant

Sensor Design

Welcome to Lab 1. I'm here to help you think through autonomous vehicle sensor design. You're advising a robotaxi startup that needs to operate in both sunny Phoenix and foggy San Francisco. What questions do you have about the sensor trade-offs — cameras, lidar, radar, or how they fuse together?

Lesson 2 · Module 5

Teaching Cars to Recognize the Road

How computer vision models learn to classify objects, read signs, and predict what pedestrians will do next.

How does a neural network know that a moving blur at the side of the road is a child about to run into traffic?

At 9:58 p.m. on March 18, 2018, a Volvo XC90 operated by Uber's Advanced Technologies Group struck and killed Elaine Herzberg, who was walking a bicycle across a multi-lane road in Tempe, Arizona. The National Transportation Safety Board's investigation found that the vehicle's perception system had detected Herzberg six seconds before impact — classifying her first as an unknown object, then as a vehicle, then as a bicycle — cycling between categories because it had no stable class for "pedestrian not in a crosswalk." The system never generated an alert. The safety driver was looking at a device in their lap.

The crash became a watershed moment in AV development. It demonstrated that perception accuracy in controlled test conditions does not translate to reliability across all real-world distributions — a problem researchers call distribution shift.

Object Detection and Classification

Modern AV perception systems use object detection networks — most commonly variants of YOLO (You Only Look Once) and transformer-based architectures — to draw bounding boxes around every detected entity in a scene and assign a class label with a confidence score. Classes include: car, truck, motorcycle, pedestrian, cyclist, traffic cone, stop sign, traffic light, and dozens more.

The challenge is not detecting a pedestrian in ideal conditions — a human walking on a sidewalk in daylight is easy. The challenge is detecting a pedestrian at night, partially occluded by a parked car, crossing at an unexpected location, while the AV is moving at 40 mph. The Uber crash revealed that detection confidence thresholds were set too conservatively: the system would discard a detection rather than act on an uncertain one. Overconfident classification and under-confident classification are both dangerous in different ways.

NTSB Finding, 2019

The NTSB determined that Uber's system had been programmed to suppress false positives by requiring high confidence before triggering emergency braking. This threshold caused the system to hesitate fatally. After the crash, Uber and the broader industry revised guidance on how to balance false-positive suppression against reaction time.

Semantic Segmentation

Beyond bounding boxes, advanced perception uses semantic segmentation — assigning a class label to every single pixel in the camera frame. The road is one color, the sidewalk another, buildings another. This gives the AV a much richer spatial understanding: it can see exactly where the drivable surface ends, where a puddle begins, and precisely how wide the lane is at a given point.

Waymo and Mobileye use segmentation networks running in parallel with object detection. The outputs are fused with lidar point clouds to produce a labeled 3-D occupancy grid — essentially a real-time map of the world divided into cells, each marked: free, occupied, unknown.

Predicting Behavior

Detecting what is present is only half the task. The AV must predict what each agent will do next. A pedestrian stepping off the curb is likely about to cross. A vehicle with its turn signal on is likely about to change lanes. A ball rolling into the road suggests a child may follow.

Waymo's prediction models — described in a 2022 paper from the Waymo Research team — output a probability distribution over possible future trajectories for each agent over the next eight seconds. The planning system then selects driving actions that maintain safety margins across the most probable scenarios. This is called probabilistic prediction, and it is one of the most active research areas in autonomous driving.

Distribution Shift —When a model encounters inputs that differ statistically from its training data, causing unreliable predictions — e.g., a pedestrian model trained on daytime images struggling at night.

Semantic Segmentation —Per-pixel classification of an image, labeling each pixel with a category such as road, sky, pedestrian, or building.

Occupancy Grid —A 2-D or 3-D grid map where each cell is labeled as free, occupied, or unknown, built by fusing sensor inputs in real time.

Probabilistic Prediction —Outputting a range of possible future behaviors for each detected agent, weighted by likelihood, rather than a single fixed forecast.

Quiz — Lesson 2

Teaching Cars to Recognize the Road · 3 questions

What is "distribution shift" in AV perception, and why did it matter in the 2018 Uber crash?

Correct. The Uber system was trained mostly on scenarios where pedestrians cross at crosswalks. Elaine Herzberg was crossing mid-road at night — outside the training distribution — causing unstable, cycling classification.

Distribution shift means the model encounters conditions statistically unlike its training data. Uber's model hadn't robustly learned "pedestrian crossing mid-road at night," which is why it cycled between object classes without acting.

Semantic segmentation differs from standard object detection because it:

Correct. Semantic segmentation classifies every pixel — not just regions of interest — giving a dense spatial understanding of the entire scene including road edges, surfaces, and obstacles.

Semantic segmentation labels every pixel in the image, not just draws boxes. This gives the AV precise boundary information — exactly where the drivable road surface ends, for example.

Waymo's prediction models output a probability distribution over future agent trajectories spanning approximately how many seconds ahead?

Correct. Waymo's 2022 research described prediction horizons of ~8 seconds — long enough for meaningful driving decisions but short enough to remain physically plausible.

Waymo's prediction models forecast likely agent trajectories across an 8-second horizon. One second is too short for planning; 30+ seconds is too uncertain to be useful for most driving maneuvers.

Lab 2 — Perception Failures

Investigate how and why AV vision systems make dangerous misclassifications

Your Mission

You are an AI safety researcher reviewing perception system failures. Using the 2018 Uber ATG crash as your starting case study, probe what kinds of scenarios cause AV vision systems to fail — and what engineering solutions are being deployed.

Begin by asking: "Why did Uber's system classify Elaine Herzberg as a vehicle, then a bicycle, then an unknown object — and what architectural flaw caused this cycling behavior?" Then explore what modern systems do differently.

AI Lab Assistant

Perception Safety

Welcome to Lab 2. We're examining AV perception failures — specifically why computer vision systems misclassify objects in ways that can be dangerous. The 2018 Uber crash is a powerful case study. What aspect of the perception failure would you like to dig into first?

Lesson 3 · Module 5

HD Maps and Localization

Why knowing where you are — to within centimeters — is as important as knowing what surrounds you.

GPS is accurate to a few meters. A car lane is three and a half meters wide. So how do AVs know exactly which lane they are in?

By 2023, Mobileye's Road Experience Management (REM) system had collected over eight billion kilometers of anonymized driving data from dashcam-equipped vehicles — taxis, trucks, and consumer cars — to build and continuously update a centimeter-level HD map of public roads in more than 40 countries. Every equipped vehicle acts as a mapping probe, uploading road geometry observations in the background. The aggregate becomes a map accurate enough to localize a vehicle to within 10 centimeters — far beyond what GPS alone can provide.

Why Standard GPS Falls Short

Consumer GPS — the kind in a smartphone or a standard car navigation system — achieves accuracy of roughly 3–5 meters under good conditions. That sounds precise, but a standard highway lane is 3.7 meters wide. A 5-meter GPS error could place a vehicle in the adjacent lane or on the shoulder. For a human driver glancing at a navigation screen, that imprecision is acceptable. For an autonomous vehicle making its own steering decisions at 70 mph, it is not.

Differential GPS (DGPS) and Real-Time Kinematic (RTK) GPS systems use ground-based correction stations to achieve sub-meter or even centimeter accuracy — but they are expensive, require specialized hardware, and lose precision in urban canyons where tall buildings block satellite signals.

HD Maps

High-Definition maps go far beyond standard navigation maps. An HD map records not just road centerlines and speed limits but: precise lane geometry (width, curvature, grade), lane markings (solid, dashed, double yellow), traffic sign positions and types, traffic light locations and phases, curb positions, crosswalk locations, and overhead clearances. This data is stored as a 3-D geometric model accurate to centimeters.

AV companies including Waymo, Cruise, Mobileye, and Baidu use HD maps as a prior — a known, trusted baseline — against which real-time sensor data is matched. The vehicle knows roughly where it is from GPS; it then aligns its lidar point cloud to the stored HD map geometry to refine its position to centimeter accuracy. This process is called map-based localization or point cloud registration.

Limitation: Map Freshness

HD maps become stale. A construction zone that reroutes lanes, a new traffic signal, a recently painted crosswalk — none of these are in yesterday's map. AVs must detect and handle discrepancies between their HD map and real-time sensor data. Waymo's system flags map conflicts and falls back to sensor-only navigation in affected zones. Keeping HD maps current is one of the largest operational costs in the industry.

Simultaneous Localization and Mapping (SLAM)

For areas without pre-built HD maps — or when map data is unreliable — AVs use SLAM algorithms. SLAM builds a local map from sensor data in real time while simultaneously estimating the vehicle's position within that map. It is computationally intensive and less precise than map-based localization, but it provides a fallback in unmapped territory.

Tesla's camera-only approach uses a form of visual SLAM: the neural network builds a local 3-D understanding of the environment from camera parallax and optical flow, estimating position and obstacles together. This is one reason Tesla can operate in areas with no HD map, while Waymo and Cruise must pre-map every deployment zone before service begins.

HD Map —A centimeter-accurate 3-D map of road geometry, lane markings, signs, and signals used as a localization prior by autonomous vehicles.

Point Cloud Registration —The process of aligning a live lidar scan to a stored HD map to refine position estimate to centimeter accuracy.

SLAM —Simultaneous Localization and Mapping — algorithms that build a local map and estimate position within it concurrently from sensor data.

RTK GPS —Real-Time Kinematic GPS — a high-precision positioning system using ground correction stations to achieve centimeter-level accuracy.

Quiz — Lesson 3

HD Maps and Localization · 3 questions

Why is standard consumer GPS insufficient for autonomous vehicle lane-keeping?

Correct. A 3–5 meter GPS error in a 3.7-meter lane could place the vehicle in an adjacent lane. Autonomous driving requires centimeter-level accuracy, which requires HD maps and map-based localization.

The key issue is scale: consumer GPS error (~5 m) is comparable to lane width (~3.7 m). This makes lane-level position unreliable for autonomous steering decisions. HD maps and lidar registration close this gap.

What is "point cloud registration" used for in AV localization?

Correct. The vehicle matches the geometry of its live lidar scan to the known geometry stored in the HD map. Small offsets between them reveal the vehicle's precise position relative to the map.

Point cloud registration is a geometric matching process: the vehicle's live lidar scan is compared to the HD map's stored 3-D geometry. Differences reveal the vehicle's exact position relative to the map's coordinate system.

What is the main operational advantage of Tesla's visual SLAM approach over Waymo's HD map approach?

Correct. Because Tesla builds its local map on-the-fly from camera data, it can navigate in areas with no pre-built HD map. Waymo must pre-map every zone it operates in before deployment.

Tesla's visual SLAM builds a local 3-D model from camera data in real time, enabling navigation in areas that haven't been pre-mapped. Waymo's map-based approach requires expensive prior mapping of every operational zone.

Lab 3 — Mapping the Future

Explore how HD maps are built, maintained, and when they fail

Your Mission

You are a city planner evaluating whether to approve a robotaxi service for your mid-size city. The company needs three months to build HD maps before service can begin. Your city does frequent road construction. Probe the AI about the mapping pipeline, freshness requirements, and risk management strategies.

Start by asking: "How does an AV company actually build an HD map of a new city, and how long does it take?" Then explore: what happens when the real road differs from the map?

AI Lab Assistant

HD Mapping

Welcome to Lab 3. You're a city planner evaluating a robotaxi HD mapping proposal. I can explain how companies like Waymo and Mobileye build and maintain HD maps, what happens when maps go stale, and what safeguards modern AV systems use. What would you like to explore first?

Lesson 4 · Module 5

Safety, Crashes, and the Road Ahead

Real-world AV crash records, regulatory responses, and what visual AI still cannot reliably do.

Are self-driving cars actually safer than humans? And what does the evidence actually show?

In June 2021, NHTSA issued a Standing General Order requiring all AV operators to report any crash involving a Level 2 or higher automated driving system within 24 hours. Between July 2021 and May 2023, NHTSA received 392 reports of crashes involving Teslas with Autopilot or FSD engaged — the highest volume of any manufacturer, partly because Tesla had by far the largest fleet of Level 2 vehicles. Waymo reported 18 crashes during the same period across its smaller fleet of fully autonomous vehicles. The raw numbers are difficult to compare without normalizing for miles driven and operational conditions, but the data gave regulators — and the public — their first systematic view of where automated driving systems were involved in collisions.

What the Crash Data Shows

Waymo published an independent safety report in 2023 comparing its crash rate per million miles to the average human driver on comparable roads. The report, authored in part by researchers at the Virginia Tech Transportation Institute, found that Waymo's vehicles had a significantly lower rate of police-reported crashes and injury crashes than human drivers in equivalent urban environments. However, AVs had a higher rate of minor rear-end collisions — where human drivers following a Waymo vehicle were surprised by its cautious, abrupt braking behavior.

This finding points to a challenge not of the AV's visual AI but of its interaction with human drivers who have not adapted to machine driving patterns. AVs follow rules precisely; human drivers follow norms flexibly. The gap between legal driving behavior and expected driving behavior creates collision risk at the human-machine interface.

Cruise Recall, San Francisco 2023

In October 2023, California's DMV suspended Cruise's robotaxi permit after a Cruise vehicle struck a pedestrian who had already been hit by another human-driven vehicle. The Cruise AV subsequently dragged the pedestrian 20 feet before stopping, because its perception system classified the pedestrian as having cleared the vehicle's path. Cruise recalled all 950 robotaxis from U.S. roads. General Motors later disclosed that Cruise had shared incomplete information with regulators about the sequence of events — leading to criminal and civil investigations. The incident demonstrated that perception failures in edge cases can have catastrophic consequences even for vehicles with otherwise strong safety records.

What Visual AI Still Cannot Reliably Do

Despite dramatic progress, current AV visual AI has documented weak points:

Unusual object categories: Objects the training data never included — a large piece of furniture on a highway, a horse-drawn carriage, a flock of birds — can be misclassified or ignored. Tesla's FSD was documented in 2022 approaching a stopped train broadside on a crossing, apparently not recognizing it as an obstacle.

Adversarial conditions: Researchers at the University of Washington demonstrated in 2019 that attaching specific sticker patterns to a stop sign could cause classification networks to misread it as a speed limit sign with high confidence — a "physical adversarial example." Real-world graffiti on signs can have similar effects.

Social and contextual cues: A human driver can read body language: the pedestrian making eye contact before stepping off the curb, the delivery driver about to open their door. Current AV perception models work primarily on geometric and visual features, not on social intent inference.

Extreme weather: Even lidar is significantly degraded by heavy snow accumulation on sensors and dense fog. The operational design domain of every current AV excludes certain weather conditions — Waymo does not operate in heavy rain above certain thresholds.

The Regulatory Landscape

As of 2024, the United States has no federal AV-specific regulations — AVs are regulated state-by-state. California, Arizona, and Texas have the most permissive frameworks. California's DMV oversees permit applications; companies must submit safety cases and report crashes. The European Union is developing harmonized AV type-approval rules under UNECE Working Party 29, which will require systematic safety validation of perception systems including computer vision.

The core regulatory challenge is that visual AI is a statistical system — it is not correct 100% of the time, by definition. Regulators accustomed to evaluating mechanical systems with binary pass/fail criteria are adapting to evaluate probabilistic AI systems that improve over time but can never be certified as perfectly safe.

The Core Trade-Off

The question is not whether AVs will ever crash — they will. The question is whether they crash less often, less severely, and less randomly than human drivers. Early evidence from Waymo's deployed fleet suggests the answer is trending toward yes. But that answer must be earned mile by mile, in every new city, weather condition, and edge case the world can produce.

SGO (Standing General Order) —A 2021 NHTSA directive requiring AV operators to report crashes involving automated driving systems within 24 hours.

Operational Design Domain (ODD) —The specific conditions — weather, geography, speed range, road type — within which an AV system is designed and certified to operate safely.

Physical Adversarial Example —A real-world object modified with patterns that cause a vision model to misclassify it — demonstrated on stop signs and camera inputs in multiple published studies.

Quiz — Lesson 4

Safety, Crashes, and the Road Ahead · 3 questions

What does NHTSA's 2021 Standing General Order (SGO) require of AV operators?

Correct. The SGO created the first systematic national database of AV-involved crashes, enabling regulators to identify safety patterns across manufacturers and platforms.

NHTSA's SGO requires operators to report crashes involving Level 2 or higher automated driving systems within 24 hours of the company becoming aware of them — creating the first national AV crash database.

The 2023 Cruise incident in San Francisco revealed a failure in which specific aspect of AV perception?

Correct. After a human driver struck the pedestrian, the Cruise vehicle also hit her — and then its perception system assessed the path as clear and moved forward 20 feet, dragging the pedestrian. The classification error was in the post-impact scene understanding.

The Cruise vehicle struck a pedestrian already downed by another car, then its perception system incorrectly classified the scene as clear — causing it to pull forward and drag her 20 feet. This was a post-impact scene understanding failure, not a map or traffic-light error.

What is an "Operational Design Domain" (ODD) and why does every current AV have one?

Correct. No current AV can safely handle all conditions — heavy snow, unpaved roads, extreme heat affecting sensors. The ODD defines the boundary of validated safe operation, outside of which the system must disengage or alert the driver.

The ODD defines the envelope of conditions — weather, road type, speed, geography — within which the AV system has been validated to operate safely. Because visual AI is trained on specific data distributions, performance outside the ODD cannot be guaranteed.

Lab 4 — Regulating Visual AI in Cars

Build a policy framework for evaluating AV perception system safety

Your Mission

You are a policy advisor to a state transportation department drafting AV operating permit requirements. Using real incidents — Uber 2018, Cruise 2023, Tesla SGO data — build a set of evidence-based requirements for how AV companies must demonstrate that their visual AI perception systems are safe enough for public roads.

Start by asking: "What minimum perception performance standards should regulators require before issuing a fully autonomous robotaxi permit?" Then explore: how should regulators handle edge cases and ODD limitations?

AI Lab Assistant

AV Policy

Welcome to Lab 4. You're drafting AV permit requirements that specifically address visual AI perception safety. I can help you think through what real incidents reveal about regulatory gaps, what metrics matter, and how to handle the inherently probabilistic nature of AI safety evidence. What aspect of the policy framework shall we develop first?

Module 5 — Test

Self-Driving Cars and Visual AI · 15 questions · Pass at 80%

1. Which sensor uses laser pulses and return timing to build a centimeter-accurate 3-D point cloud?

Correct. Lidar fires laser pulses and times their return to produce 3-D point clouds accurate to centimeters.

Lidar (Light Detection and Ranging) uses timed laser pulses to build point clouds. Radar uses radio waves; ultrasonic uses sound; infrared measures heat.

2. Waymo's robotaxis completed over one million paid passenger trips with no safety driver in which U.S. cities by October 2023?

Correct. Waymo achieved this milestone in San Francisco and Phoenix, where it had received full driverless commercial operating permits.

Waymo's commercial driverless operations reached the million-trip milestone in San Francisco and Phoenix — its two active markets in 2023.

3. The main advantage radar has over camera and lidar in poor weather is:

Correct. Radio waves are largely unaffected by precipitation, and the Doppler frequency shift provides direct velocity measurement — two capabilities cameras and lidar lack.

Radar's key advantage is weather penetration and direct velocity measurement via Doppler. It has low spatial resolution — it cannot read signs or identify object shapes precisely.

4. In the 2018 Uber ATG crash in Tempe, the perception system failed primarily because:

Correct. The system classified Herzberg as unknown object, vehicle, then bicycle — cycling without stable classification — and the braking inhibitor prevented action. The NTSB cited this as a critical design failure.

The NTSB found that the system detected Herzberg 6 seconds before impact but cycled unstably between classifications. The braking system was suppressed to reduce false positives and never fired.

5. Semantic segmentation differs from object detection bounding boxes because it:

Correct. Semantic segmentation produces a dense per-pixel classification map, enabling precise boundary detection for road edges, obstacles, and drivable surfaces.

Semantic segmentation labels every pixel — not just draws boxes around objects. This dense labeling gives much more precise spatial information about scene layout.

6. "Distribution shift" in AV perception refers to:

Correct. When real-world inputs differ from the training distribution, model performance degrades in ways that may not be predictable or flagged to the system.

Distribution shift is a statistical concept: the model was trained on data from one distribution (e.g., daylight pedestrian crosswalks) and encounters another (night, mid-road crossing), causing performance to degrade.

7. Waymo's prediction models output trajectory probability distributions covering approximately how many seconds ahead?

Correct. Waymo's 2022 research paper described an 8-second prediction horizon — long enough for planning decisions, short enough to maintain physical plausibility.

Waymo predicts over an 8-second horizon. One second is too short for meaningful planning; 20 seconds involves too much uncertainty for the predictions to usefully constrain path planning.

8. Why is standard consumer GPS (3–5 m accuracy) insufficient for AV lane-keeping on a typical highway?

Correct. With lane widths ~3.7 m and GPS errors up to 5 m, position uncertainty spans more than one lane — making GPS alone insufficient for autonomous steering decisions.

The core issue is scale: GPS error (~5 m) is larger than lane width (~3.7 m). This means GPS alone cannot tell an AV which lane it is in. HD maps and lidar registration close this gap to centimeters.

9. Mobileye's REM (Road Experience Management) system collects mapping data primarily from:

Correct. REM turns every equipped vehicle into a passive mapping probe, crowdsourcing HD map data at massive scale — over 8 billion km collected across 40+ countries by 2023.

REM is a crowdsourced system: vehicles equipped with Mobileye chips upload anonymized road geometry data in the background. This scales map coverage far beyond what a dedicated fleet could achieve.

10. SLAM (Simultaneous Localization and Mapping) allows AVs to navigate in areas with no pre-built HD map because it:

Correct. SLAM is a chicken-and-egg problem solved simultaneously: build the map and localize within it at the same time, using sensor data alone.

SLAM builds a local map and estimates position within it concurrently from sensor data — no pre-existing map required. This is why Tesla's visual SLAM approach can operate in unmapped areas.

11. NHTSA's 2021 Standing General Order requires AV operators to report crashes within:

Correct. The 24-hour reporting window was designed to enable rapid regulatory response and build a real-time national database of AV incidents.

The SGO requires reporting within 24 hours of the company becoming aware of a qualifying crash — enabling NHTSA to track AV safety patterns in near-real-time.

12. The 2023 Cruise incident resulted in all 950 Cruise robotaxis being recalled because:

Correct. The post-impact perception failure — dragging the pedestrian — combined with incomplete regulatory disclosure led California's DMV to suspend Cruise's permit and GM to voluntarily recall the fleet.

The recall followed a collision where the AV dragged an already-downed pedestrian because its perception system assessed the path as clear. The situation was compounded by Cruise sharing incomplete information with California's DMV.

13. A "physical adversarial example" in AV perception is:

Correct. University of Washington researchers demonstrated in 2019 that carefully designed sticker patterns on stop signs could cause classification networks to misidentify them with high confidence.

Physical adversarial examples are real-world objects modified with patterns that exploit neural network classification vulnerabilities — demonstrated on stop signs in published academic research.

14. The "Operational Design Domain" (ODD) of an AV system defines:

Correct. Every current AV has an ODD — the boundary of validated safe operation. Waymo, for example, excludes operation during heavy rain above certain intensity thresholds.

The ODD is the validated envelope of safe operation — specific weather limits, road types, speeds, and geographic areas. Outside the ODD, performance is untested and the system must disengage.

15. Tesla's Full Self-Driving approach differs architecturally from Waymo's primarily because Tesla uses:

Correct. Tesla uses cameras only — no lidar — arguing that human drivers navigate visually. Its FSD system is trained on hundreds of millions of miles of data from its consumer fleet, a fundamentally different data strategy than Waymo's.

Tesla's key architectural choice is cameras-only — no lidar. This allows broader geographic operation and reduces hardware cost but removes the direct 3-D measurement that lidar provides. The trade-off remains actively debated in the AV safety community.