Module 4 · Lesson 1

Training's Hidden Price Tag

Before a model answers a single question, it has already consumed enormous energy.

How much carbon does it actually cost to train a large AI model — and who is accounting for it?

Researchers Emma Strubell, Ananya Ganesh, and Andrew McCallum published a paper that made the AI community uncomfortable. They had done something few had bothered to do: they measured the actual carbon cost of training large NLP models. Their headline finding — that training a single large transformer with neural architecture search emitted roughly 626,000 pounds of CO₂ equivalent — traveled far beyond academic circles. It was the first time many researchers had confronted the arithmetic of their own work.

The paper did not claim that AI was uniquely villainous. It claimed that the field had been operating without transparency. No cost, no accountability.

Why Training Is So Energy-Intensive

Training a large neural network is fundamentally an optimization problem solved by repetition. A model with hundreds of billions of parameters is exposed to vast datasets — trillions of tokens — and its weights are nudged incrementally toward configurations that minimize prediction error. Each nudge requires a forward pass, a loss calculation, and a backward pass of gradient computation across every parameter. This cycle repeats billions of times.

The hardware doing this work — typically thousands of specialized GPUs or TPUs running in parallel — draws power continuously for weeks or months. The electricity to run those chips, plus the electricity to cool the data centers housing them, produces the carbon footprint researchers began measuring in 2019.

Compute efficiency has improved dramatically: models trained in 2024 accomplish far more per GPU-hour than those trained in 2019. But model scale has grown faster than efficiency gains, meaning absolute energy consumption has risen even as relative efficiency improved.

626K

lbs CO₂e to train one large transformer with NAS (Strubell et al., 2019)

~500t

Estimated CO₂e for GPT-3 training run (OpenAI / independent estimates, 2020)

1,287

MWh consumed training GPT-3 (Patterson et al., 2021 estimate)

~60×

More compute used by GPT-3 vs. GPT-2 (OpenAI scaling analysis)

The Patterson et al. Benchmark (2021)

In 2021, Google and UC Berkeley researchers led by David Patterson published "Carbon Emissions and Large Neural Network Training," offering a more systematic methodology. They computed energy consumption from chip specifications, PUE (Power Usage Effectiveness) of data centers, and the carbon intensity of the electricity grid used during training. Their analysis covered a range of landmark models and found that the choice of data center location — and therefore grid carbon intensity — was often more important than raw compute volume in determining total emissions.

Training the same model in a coal-heavy grid region could produce ten times more CO₂ than training it in a region powered by hydroelectricity or nuclear. This finding reframed the conversation: hardware efficiency matters, but energy source matters more.

Key Finding — Patterson et al. 2021

Training a 213M-parameter T5 model in a Google data center using TPUs and renewable-matched energy produced roughly 46 tonnes CO₂e. The same compute on a coal-heavy grid would have produced approximately 270 tonnes — nearly a 6× difference from location alone.

What Makes Measurement Difficult

Labs rarely publish the full details needed to reproduce emissions estimates independently. The critical inputs — exact GPU count, training duration, data center PUE, and grid carbon factor at the time of training — are often proprietary or simply unrecorded. Researchers like Jesse Dodge at AI2 have argued that papers should routinely report compute costs alongside accuracy benchmarks, the same way experimental papers report methodology. As of 2024, this remains voluntary and inconsistently practiced.

There is also the problem of failed runs. The published training run is not the only training run. Hyperparameter searches, ablations, and failed experiments consume additional energy that rarely appears in any carbon accounting. Strubell et al. noted this explicitly: the figure reported is a floor, not a ceiling.

PUEPower Usage Effectiveness — the ratio of total data center energy to the energy used by IT equipment alone. A PUE of 1.0 is perfect; typical data centers run 1.1–1.5; older facilities can exceed 2.0.

Carbon IntensityThe grams of CO₂ emitted per kilowatt-hour of electricity generated, varying dramatically by grid mix: ~20 gCO₂/kWh (Norwegian hydro) to ~700+ gCO₂/kWh (coal-heavy grids).

NASNeural Architecture Search — automated search for optimal network architectures, multiplying compute cost dramatically. The Strubell figure included NAS overhead.

The Accountability Gap

No regulatory body requires AI labs to disclose training emissions. The Strubell paper's lasting contribution was not the specific numbers — those are already outdated — but the demonstration that the numbers are calculable and that the field's silence on them was a choice, not a necessity.

Lesson 2 will examine what happens after training — the inference phase, where the cumulative energy cost of millions of daily queries may ultimately dwarf the training run itself.

Lesson 1 Quiz

Training's Hidden Price Tag — check your understanding

1. The 2019 Strubell et al. paper estimated that training one large transformer with neural architecture search emitted approximately how much CO₂ equivalent?

Correct. Strubell et al. reported ~626,000 lbs CO₂e for training with NAS — a figure that shocked many in the AI research community in 2019.

Not quite. The figure was ~626,000 lbs CO₂e — far larger than a flight or household, but specific to the NAS overhead. Review the stat grid in Lesson 1.

2. According to Patterson et al. (2021), which factor often had the greatest influence on total training emissions?

Correct. Patterson et al. found that grid carbon intensity — determined largely by data center location — could cause a tenfold difference in emissions for identical training runs.

Not quite. While model size matters, Patterson et al. found the grid's carbon intensity was often the dominant factor, sometimes creating a 10× difference in emissions.

3. What is PUE, and why does it matter for AI carbon accounting?

Correct. PUE captures the overhead energy (cooling, lighting, power conversion) beyond raw compute. A PUE of 1.5 means 50% extra energy spent beyond the chips themselves.

Not quite. PUE stands for Power Usage Effectiveness — the ratio of total facility energy to IT equipment energy. It captures cooling and overhead costs critical to accurate carbon accounting.

4. Why do Strubell et al. describe their published training figure as a "floor" rather than a "ceiling"?

Correct. The published figure represents only the successful final training run. All the exploratory compute — failed runs, searches, ablations — adds to the real total but goes unreported.

Not quite. The "floor" framing refers to the unreported overhead: failed experiments, hyperparameter searches, and ablations that consume real energy but aren't captured in the headline number.

Lab 1 — Training Carbon Calculator

Explore the variables that drive training emissions with your AI lab assistant.

Your Mission

You have been given compute budget to train a 70-billion-parameter model. Your lab assistant can help you estimate how different choices — data center location, hardware generation, grid mix — change the carbon cost of your training run.

Ask at least three substantive questions to complete this lab.

Try asking: "If I train in Virginia vs. Iceland, how different would my emissions be?" or "How does TPU v4 compare to A100 in emissions per FLOP?" or "What did Patterson et al. say about grid carbon intensity vs. hardware choice?"

AI Lab Assistant

Training Emissions

Welcome to Lab 1. I'm your training emissions assistant. You're planning a 70B-parameter model training run. Ask me how grid location, hardware choice, PUE, or training duration affect your carbon footprint — I'll help you think through the numbers and the tradeoffs.

Module 4 · Lesson 2

Inference at Scale

Training makes headlines. Inference pays the electricity bill — forever.

When a model serves billions of queries per day, does the cumulative energy cost of inference eventually exceed the training run?

Within two months of ChatGPT's public launch in November 2022, OpenAI was serving an estimated 100 million users — a milestone no consumer technology product had reached so quickly. Behind that milestone were data centers running inference around the clock. Each query required a forward pass through GPT-3.5 Turbo: not the months-long training job, but a millisecond-scale computation happening simultaneously across millions of requests.

The training run was a one-time cost. Inference was a daily tax — one that scaled with every new user, every longer conversation, every additional product built on the API.

The Inference Energy Equation

Training and inference have fundamentally different cost profiles. Training is compute-bound and occurs once (or a handful of times per model generation). Inference is demand-bound and continuous. The energy cost of a single inference query is small — measured in milliwatt-hours — but multiplied by billions of queries daily, it becomes substantial.

Researchers at Hugging Face and Carnegie Mellon published an analysis in 2022 estimating that inference energy consumption for a model like BLOOM (176B parameters) was roughly 1.0 Wh per query for a long-form generation task. At a hypothetical 100 million queries per day, that is 100 MWh daily — or about 36,500 MWh annually. Compare that to BLOOM's training run, which consumed approximately 433 MWh. Inference overtakes training in cumulative energy within five days of heavy use.

Model quantization, caching, and smaller specialized models are the primary levers for reducing per-query inference cost. But as model capability expands and query volumes grow, efficiency gains have not kept pace with demand growth.

433

MWh to train BLOOM 176B (Hugging Face, 2022)

~1 Wh

Per long-form BLOOM inference query (HF/CMU estimate)

5 days

Days until inference energy exceeds training at 100M queries/day

10×

ChatGPT search query estimated energy vs. a standard Google search (IEA, 2023)

Google's Search vs. AI Search Comparison

The International Energy Agency's 2023 report "Electricity 2024" cited estimates that a single AI-powered search query uses approximately 10 times more electricity than a conventional keyword search. Google processes roughly 8.5 billion searches per day. If even a fraction of those migrate to generative AI responses — as Google's own Search Generative Experience and Bing AI integration suggest — the aggregate energy impact on search infrastructure alone becomes significant.

Google reported in its 2023 Environmental Report that its data centers consumed 24.2 terawatt-hours of electricity, a figure that predates the full integration of generative AI into Search. Independent analysts expect the figure to rise substantially through 2025–2027 as AI inference workloads displace or supplement traditional search.

The IEA Warning — 2024

The IEA's 2024 electricity report projected that global data center electricity consumption could double by 2026, with AI inference workloads as the primary growth driver. The agency noted that without significant grid decarbonization, this growth could add materially to global emissions even if data centers operate at high efficiency.

Inference Optimization Strategies

Quantization reduces the numerical precision of model weights (e.g., from 32-bit floats to 8-bit or 4-bit integers), cutting memory use and computation per token at some accuracy cost. Research from Hugging Face, EleutherAI, and others has demonstrated that 4-bit quantized models retain most capability while using roughly 4× less memory and significantly less compute.

Distillation trains a smaller "student" model to mimic a larger "teacher" model. Microsoft's Phi series and Google's Gemma series are examples of models designed for efficient inference while maintaining strong performance. Distilled models can be deployed on edge hardware, eliminating data center round-trips entirely.

KV-cache optimization avoids redundant computation in attention mechanisms for repeated or similar queries, a standard practice in production inference serving. These are engineering efficiencies that compound — but they operate against a backdrop of continuously growing model sizes and query volumes.

InferenceThe process of running a trained model on new inputs to produce outputs. Unlike training, it does not update model weights — but at web scale, its cumulative energy cost dwarfs the training run.

QuantizationReducing numerical precision of weights (e.g., 32→4 bit) to cut memory and compute at inference time, with minimal accuracy loss at 8-bit and modest loss at 4-bit.

DistillationTraining a smaller model to replicate the behavior of a larger one — achieving inference efficiency without full retraining from scratch.

The Efficiency Paradox

Jevons' Paradox — the historical observation that efficiency gains increase total resource use by making the resource cheaper and demand higher — appears to be operating in AI inference. Faster, cheaper inference enables more applications, more queries, and more integration into products, expanding total energy consumption even as per-query efficiency improves.

Lesson 2 Quiz

Inference at Scale — check your understanding

1. Based on Hugging Face/CMU estimates for BLOOM, at 100 million queries per day, approximately how quickly does cumulative inference energy exceed the energy cost of BLOOM's training run?

Correct. BLOOM's training consumed ~433 MWh. At ~1 Wh per query and 100M queries/day, inference uses ~100 MWh/day — surpassing training energy in about five days.

Not quite. At 100M queries/day and ~1 Wh each, the daily inference energy is ~100 MWh vs. the 433 MWh training run — making the crossover roughly five days.

2. The IEA estimated that a single AI-powered search query uses approximately how much more electricity than a conventional keyword search?

Correct. The IEA's 2024 report cited estimates of ~10× greater electricity per AI-powered search query compared to a conventional Google keyword search.

Not quite. The IEA estimated ~10× — a significant multiplier that matters at Google's scale of 8.5 billion searches per day.

3. What is "quantization" in the context of inference efficiency?

Correct. Quantization reduces weight precision (e.g., 32-bit → 4-bit integers), cutting memory and compute at inference time with minimal accuracy loss at moderate compression levels.

Not quite. Quantization specifically means reducing numerical precision of weights. Mimicking a larger model is distillation; caching is KV-cache optimization.

4. Jevons' Paradox, as applied to AI inference efficiency, predicts what outcome?

Correct. Jevons' Paradox describes how efficiency gains lower the cost of a resource, increasing demand and potentially raising total consumption — precisely what is observed as cheaper inference enables more AI-powered products.

Not quite. Jevons' Paradox predicts the counterintuitive outcome: efficiency lowers cost, demand rises, and total consumption may increase even though each unit is more efficient.

Lab 2 — Inference Energy Audit

Model the ongoing energy cost of running a deployed AI system.

Your Mission

You are advising a startup planning to deploy an AI assistant for customer service. The model will handle ~5 million queries per day. Your lab assistant can help you estimate total inference energy, compare optimization strategies, and think through the tradeoffs between cost, quality, and carbon footprint.

Try asking: "How do I estimate daily inference energy for a 7B parameter model?" or "Would distillation or quantization reduce our footprint more at 5M queries/day?" or "When does our inference footprint exceed our training footprint?"

AI Lab Assistant

Inference Audit

Hello! I'm your inference energy audit assistant. You're planning a 5-million-query-per-day deployment. Let's work through the energy math, explore optimization strategies, and figure out your cumulative carbon footprint over time. What would you like to start with?

Module 4 · Lesson 3

Beyond Carbon — Water, Land, and Hardware

The environmental cost of AI is not only about carbon. Cooling water, scarce land, and rare minerals complete the picture.

What environmental costs of AI infrastructure remain invisible when we focus only on carbon emissions?

In July 2023, documents obtained through a public records request revealed that a Microsoft data center in West Des Moines, Iowa had drawn approximately 6.4 million gallons of water from the local utility in a single month — the same month that Microsoft was training GPT-4. The West Des Moines Water Works noted the spike explicitly in its internal records. The data center's cooling towers had consumed water at a rate that local officials described as significant, though the utility noted it remained within contracted limits.

The story, reported by Sharon Goldman and others, illuminated a dimension of AI infrastructure that had received almost no public attention: water. Evaporative cooling systems — the dominant cooling method in large data centers — consume water that does not return to the local watershed.

The Water Consumption Problem

Most large data centers use evaporative cooling: warm air from servers is passed over water-saturated surfaces, and the evaporation carries heat away. This is highly energy-efficient but consumes water that is lost to the atmosphere rather than recycled. The metric used is Water Usage Effectiveness (WUE) — liters of water consumed per kilowatt-hour of IT load.

Researchers at UC Riverside, led by Pengfei Li, published a 2023 study estimating that training GPT-3 consumed approximately 700,000 liters of fresh water at Microsoft's data centers — enough to fill roughly 280 Olympic swimming pools if scaled across the full training run and cooling infrastructure. The study, "Making AI Less 'Thirsty'," also estimated that a conversation of 20–50 questions with ChatGPT consumes roughly 500 milliliters of water — about the volume of a standard water bottle.

The water source matters enormously. Data centers in water-stressed regions — including parts of the US Southwest, northern Chile (where copper mining also strains water supplies), and parts of India — draw water from aquifers under pressure from agriculture, urban growth, and climate change. A data center in Norway drawing on abundant cold freshwater presents a fundamentally different environmental profile than one in Phoenix drawing from the Colorado River basin.

700K

Estimated liters of fresh water to train GPT-3 (Li et al., UC Riverside, 2023)

500ml

Estimated water per 20–50 ChatGPT exchanges (Li et al., 2023)

6.4M

Gallons drawn by Microsoft's Iowa data center in one month, July 2023

~1.7L

Typical WUE for large US data centers (liters per kWh of IT load)

Hardware: The Mining Footprint

AI hardware — GPUs, TPUs, and the high-bandwidth memory chips that support them — requires rare and strategically significant minerals. Cobalt, used in lithium-ion batteries powering UPS systems; tantalum, in capacitors; and the rare earth elements used in precision electronics all have extraction footprints that carbon accounting ignores entirely.

NVIDIA's H100 GPU — the dominant chip for AI training in 2023–2024 — requires sophisticated semiconductor manufacturing processes at TSMC's facilities in Taiwan, which themselves consume enormous volumes of ultrapure water (a semiconductor manufacturing requirement distinct from cooling water). The full lifecycle carbon cost of hardware manufacturing — called embodied carbon — can be substantial. A 2022 analysis in the journal Nature Electronics found that for some computing scenarios, manufacturing emissions exceeded operational emissions over the hardware's lifetime.

Hardware generations also turn over rapidly. The shift from A100 to H100 GPUs, then to H200 and Blackwell architectures, creates large volumes of high-value electronic waste. GPU clusters displaced by new hardware may be resold, but the embodied carbon of manufacturing is already spent regardless of secondary use.

Land Use — A Less Discussed Cost

Hyperscale data centers occupy tens to hundreds of acres. Microsoft's planned expansion in Goodyear, Arizona; Google's facilities in The Dalles, Oregon; and Meta's data center campuses represent significant land footprints. The Dalles, Oregon facility drew local controversy as Google sought additional water rights from the Columbia River while operating in a region with growing water stress concerns.

What Labs Are (and Aren't) Disclosing

Microsoft's 2023 Environmental Sustainability Report was notable for disclosing absolute water consumption: 6.4 million cubic meters globally in 2022, up 34% from 2021. The company attributed growth to expanded data center operations and committed to being "water positive" by 2030 — replenishing more water than it consumes globally. Critics noted that global replenishment accounting does not address local water stress in specific deployment regions.

Google's 2023 Environmental Report disclosed similar water metrics. Neither company breaks down water consumption by specific product or model training run, making independent verification of specific figures like the GPT-3 training estimate difficult. The UC Riverside study used public power consumption estimates and typical WUE figures to reconstruct the estimate — an approach that introduces uncertainty but is the best available given disclosure norms.

WUEWater Usage Effectiveness — liters of water consumed per kilowatt-hour of IT equipment load. A lower WUE indicates more water-efficient cooling. Industry average: ~1.7 L/kWh; best-in-class: ~0.2–0.5 L/kWh using ambient cooling.

Embodied CarbonCarbon emissions from manufacturing hardware (mining, processing, fabrication, transport) — distinct from operational emissions. Often omitted from AI carbon accounting despite being substantial for chips and servers.

Evaporative CoolingCooling method using water evaporation to remove heat — highly energy-efficient but consumes water that does not return to the local watershed.

The Accounting Gap

Current AI environmental reporting typically covers operational carbon (and sometimes renewable energy matching) but rarely addresses water consumption at the training-run level, embodied hardware carbon, or land use. A complete environmental accounting of AI would require all four dimensions: operational carbon, operational water, embodied hardware impacts, and land footprint.

Lesson 3 Quiz

Beyond Carbon — Water, Land, and Hardware

1. UC Riverside researchers (Li et al., 2023) estimated that training GPT-3 consumed approximately how much fresh water?

Correct. Li et al. estimated ~700,000 liters of fresh water consumed to train GPT-3, via evaporative cooling systems at Microsoft's data centers.

Not quite. The estimate was ~700,000 liters. Review the stat grid in Lesson 3 for the key water figures.

2. What is WUE, and what does a lower WUE indicate?

Correct. WUE (Water Usage Effectiveness) is liters of water consumed per kWh of IT equipment load. A lower number indicates more efficient cooling — best-in-class facilities using ambient cooling can reach 0.2–0.5 L/kWh.

Not quite. WUE stands for Water Usage Effectiveness — liters per kWh of IT load. Lower is better: it indicates the cooling system is using less water to remove each unit of heat.

3. What is "embodied carbon" in the context of AI hardware?

Correct. Embodied carbon covers the full manufacturing lifecycle — from raw mineral extraction through chip fabrication — and is often omitted from AI environmental accounting despite being substantial for advanced semiconductors.

Not quite. Embodied carbon refers to emissions from manufacturing the hardware itself: mining rare minerals, semiconductor fabrication, assembly, and transport — all before the chip is ever turned on.

4. Why is global water replenishment accounting insufficient to address local water stress from data centers?

Correct. Microsoft's "water positive" commitment involves global replenishment, but a data center drawing from the Colorado River basin in Arizona still stresses that local watershed — water replenished in a different region doesn't help communities near the facility.

Not quite. The core issue is geographic specificity: global accounting aggregates water impacts across all locations, masking severe local stress in the specific regions where data centers operate.

Lab 3 — Water & Hardware Footprint Audit

Think through the non-carbon environmental costs of a real AI deployment.

Your Mission

You are advising a city government in a water-stressed region that wants to attract a major AI data center. Your lab assistant can help you think through water consumption estimates, embodied hardware carbon, siting considerations, and what questions to ask prospective data center operators.

Try asking: "What water questions should a city ask before approving a data center?" or "How does evaporative cooling compare to liquid cooling in water stress regions?" or "What is embodied carbon and why should it appear in a data center's environmental review?"

AI Lab Assistant

Water & Hardware

Welcome to Lab 3. I'm your environmental footprint advisor for AI infrastructure. Your city is evaluating a data center proposal in a water-stressed region. Let's think through the water, hardware, and land dimensions that often get overlooked in standard carbon-focused environmental reviews. What aspect would you like to explore first?

Module 4 · Lesson 4

Measuring, Disclosing, and Reducing

Accountability requires measurement. Measurement requires standards. Standards require will.

What would a credible framework for AI environmental accountability actually look like — and who would enforce it?

When the Biden administration released its Executive Order on Safe, Secure, and Trustworthy AI in October 2023, Section 5.2 included a directive for the Department of Energy to evaluate the energy and water implications of AI. It was the first time a major regulatory document in the United States explicitly linked AI development to environmental infrastructure concerns. The order did not mandate emissions disclosure, but it directed agencies to study the problem — a precursor step to potential future requirements.

In the EU, the AI Act adopted in 2024 included environmental requirements for high-risk AI systems, but critics noted these were high-level obligations to report resource use rather than specific methodological standards.

The State of Current Disclosure

As of 2024, no binding international standard requires AI labs to disclose training emissions, water consumption, or hardware lifecycle impacts. Voluntary disclosure varies dramatically. Large public companies in the EU face broader sustainability reporting requirements under the Corporate Sustainability Reporting Directive (CSRD), which began phasing in for large companies in 2024, but these are general environmental frameworks not specific to AI.

The most specific voluntary standard developed for AI-related computing is the work of the Green Software Foundation, which has produced a Software Carbon Intensity (SCI) specification — a method for calculating emissions per unit of software work. This has been adopted as an ISO standard (ISO/IEC 21031) but remains voluntary and adoption in AI labs has been limited.

Some labs have moved toward self-disclosure. Hugging Face's Model Cards include a "Carbon Footprint" section that encourages authors to report training compute and estimated CO₂. Google's research papers occasionally include compute and carbon estimates. These are steps, but they are inconsistent and unverified.

Key Disclosure Frameworks (2024)

Software Carbon Intensity (SCI) / ISO 21031 — Voluntary per-function-unit emissions metric developed by the Green Software Foundation.

EU CSRD — Requires large companies to disclose environmental sustainability data including energy and water; not AI-specific but applicable to major labs operating in Europe.

Hugging Face Model Cards — Voluntary community standard encouraging compute and CO₂ disclosure in model documentation.

US EO 14110, Section 5.2 — Directed DoE to study AI energy/water implications; no mandatory disclosure requirement as of 2024.

What Better Measurement Would Require

Researchers working on this problem — including Jesse Dodge at AI2, Sasha Luccioni at Hugging Face, and the authors of the MLCO2 Calculator — have converged on a set of minimum requirements for credible AI carbon reporting:

1. Compute disclosure: Training runs should report total GPU/TPU hours, chip type, and batch size. These are the primary inputs for third-party carbon estimation. Without them, estimates cannot be independently verified.

2. Grid carbon factor: The carbon intensity of the electricity grid at the time and location of training should be disclosed. This is available from utilities and national grid operators and eliminates the largest source of estimation uncertainty.

3. PUE disclosure: Data center PUE at the facility used for training should be reported. Industry average is ~1.4; hyperscale facilities can achieve ~1.1.

4. Inference reporting: Ongoing inference emissions should be estimated and disclosed, not just training runs. This requires establishing per-query energy baselines and reporting at regular intervals.

5. Failed run accounting: Exploratory compute — hyperparameter searches, ablations — should be included in total carbon accounting, even if reported separately.

Real Reduction Strategies in Practice

Temporal and geographic shifting: Google and DeepMind have developed systems that shift compute workloads to times and locations with lower grid carbon intensity — running batch training during periods of high renewable generation. Google's 2020 paper "Carbon-Intelligent Computing" described shifting ~30–40% of flexible compute loads to lower-carbon windows. This does not reduce compute, but it lowers the carbon intensity of each unit of compute.

Hardware efficiency: Moving from older GPU generations to current ones provides substantial efficiency gains. NVIDIA reports that its H100 delivers ~3.5× the training performance per watt compared to the A100. Labs that have upgraded hardware benefit from this even holding model scale constant.

Efficient architecture research: Work on sparse models (Mixture of Experts), state-space models, and architectural innovations that achieve similar capability with less compute per token — such as Mamba, RWKV, and Mistral's sliding window attention — represent genuine reductions in the compute required for a given capability level.

Renewable energy procurement: Microsoft, Google, and Meta have all made large Power Purchase Agreements (PPAs) for renewable energy. The credibility of these commitments depends on whether they represent additionality — new renewable capacity brought online — or simply offsetting existing grid consumption with certificates from elsewhere.

Additionality — The Critical Test

A renewable energy commitment is most meaningful when it causes new clean generation capacity to exist that would not have otherwise. Purchasing Renewable Energy Certificates (RECs) from existing hydroelectric plants displaces no fossil generation — it is accounting, not emission reduction. Power Purchase Agreements that fund new solar or wind installations represent genuine additionality and are the credible standard.

The Road Ahead

The trajectory is genuinely mixed. On one hand, the research community has moved from near-complete ignorance of AI's energy footprint (pre-2019) to active measurement, tooling, and policy engagement (2024). The MLCommons organization now includes energy efficiency in its MLPerf benchmarks. The Green Software Foundation has created a community of practitioners. Sasha Luccioni's CodeCarbon tool has been integrated into Hugging Face and used by hundreds of organizations to track emissions per training run.

On the other hand, the absolute scale of AI energy use is growing faster than the efficiency and disclosure ecosystem can track. The IEA projects data center energy doubling by 2026. The models being trained in 2024 are orders of magnitude larger than those in 2019. And the commercial incentive to downplay environmental costs remains strong in an industry competing for massive capital investment.

The honest assessment: the tools for accountability exist. The will to apply them consistently and transparently is still being negotiated between researchers, companies, regulators, and the public.

SCISoftware Carbon Intensity — a metric expressing emissions per unit of software work (per API call, per query, per training run), enabling apples-to-apples comparison across systems. Now ISO/IEC 21031.

AdditionalityIn renewable energy procurement, the principle that a commitment should bring new clean generation capacity into existence — not merely purchase certificates from already-operating renewables.

PPAPower Purchase Agreement — a long-term contract to buy electricity directly from a generator, often used to fund new renewable capacity and ensure additionality.

Lesson 4 Quiz

Measuring, Disclosing, and Reducing — check your understanding

1. The Software Carbon Intensity (SCI) specification was developed by which organization and what is its current standards status?

Correct. The Green Software Foundation developed SCI, which was adopted as ISO/IEC 21031 — a voluntary international standard for measuring emissions per unit of software work.

Not quite. SCI was developed by the Green Software Foundation and adopted as ISO/IEC 21031. It is voluntary, not mandatory, and has limited adoption in major AI labs so far.

2. Google's "Carbon-Intelligent Computing" system reduces AI's carbon footprint primarily by doing what?

Correct. Google's Carbon-Intelligent Computing shifts batch workloads to periods of high renewable generation — not reducing total compute but lowering its carbon intensity by matching it to cleaner grid windows.

Not quite. The system works by temporal and geographic load shifting — running batch training during periods when the grid has more renewable energy, reducing emissions per unit of compute.

3. What is "additionality" in the context of renewable energy procurement, and why does it matter?

Correct. Additionality means the renewable purchase actually causes new clean capacity to exist — versus buying certificates from existing hydro plants, which is accounting rather than genuine emission reduction.

Not quite. Additionality is the key test: does the renewable commitment fund new clean capacity, or merely purchase certificates from already-operating generation? Only the former represents genuine emission displacement.

4. Which of the following is NOT among the minimum requirements researchers like Jesse Dodge and Sasha Luccioni have identified for credible AI carbon reporting?

Correct. Mandatory offset purchase is not part of the disclosure framework researchers advocate. The framework focuses on transparency and measurement: compute, grid factor, PUE, inference reporting, and failed-run accounting.

Not quite. Mandatory offset purchase before publication is not part of the researcher-proposed framework. The focus is on accurate measurement and disclosure of actual emissions, not requiring specific mitigation actions.

Lab 4 — AI Environmental Policy Workshop

Design a credible disclosure framework for an AI lab or regulator.

Your Mission

You are drafting environmental disclosure requirements for a new AI governance body. Your lab assistant can help you think through what information should be mandatory, how to handle verification, what exemptions might be reasonable, and how to compare your framework to existing approaches like SCI, CSRD, and Model Cards.

Try asking: "What are the strongest arguments against mandatory compute disclosure?" or "How would I design a per-query carbon intensity benchmark for inference?" or "What did the 2023 White House Executive Order actually require on AI energy?"

AI Lab Assistant

Policy Workshop

Welcome to Lab 4. I'm your AI environmental policy advisor. You're designing disclosure requirements for a new AI governance body. We can work through what mandatory disclosures make sense, how verification would work, what the counterarguments from industry are, and how existing frameworks like SCI and CSRD compare. Where would you like to start?

Module 4 Test

The Carbon Cost of AI — 15 questions · 80% to pass

1. Which researchers published the landmark 2019 paper measuring the carbon cost of training large NLP models?

Correct. Strubell, Ganesh, and McCallum published "Energy and Policy Considerations for Deep Learning in NLP" (2019), the first systematic measurement of AI training carbon costs.

Not quite. The 2019 paper was by Strubell, Ganesh, and McCallum. Patterson et al. (2021) extended this work; Luccioni works on ongoing inference measurement; Li et al. (2023) focused on water.

2. The Strubell et al. (2019) headline figure included overhead from which computationally intensive process?

Correct. The ~626,000 lb figure included Neural Architecture Search overhead — automated exploration of optimal network configurations — which multiplied compute dramatically above a single training run.

Not quite. The large figure was driven by Neural Architecture Search (NAS) — automated search for optimal model architectures — which multiplies compute cost substantially.

3. According to Patterson et al. (2021), training the same model on a coal-heavy grid vs. a renewable-heavy grid could produce how much difference in emissions?

Correct. Patterson et al. found grid carbon intensity could create up to a 10× difference in total training emissions for identical compute, making location choice often more impactful than hardware choice.

Not quite. Patterson et al. found up to 10× difference from grid carbon intensity alone — making where you train sometimes more important than what hardware you use.

4. Based on Hugging Face/CMU estimates, approximately how much energy does a single long-form BLOOM inference query consume?

Correct. ~1 Wh per long-form query seems trivial, but at 100M queries/day it becomes 100 MWh/day — surpassing BLOOM's entire training energy (433 MWh) within five days.

Not quite. The estimate is ~1 Wh per long-form query. That figure's significance emerges at scale: 100M daily queries produces ~100 MWh/day — exceeding the full training run in five days.

5. What does the IEA's 2024 electricity report project about global data center energy consumption by 2026?

Correct. The IEA projected global data center electricity consumption could roughly double by 2026, with AI inference workloads as the primary driver of growth.

Not quite. The IEA projected approximately a doubling by 2026, driven primarily by AI inference workloads — a significant concern for grid carbon intensity if decarbonization lags.

6. Distillation, as an inference efficiency strategy, works by doing what?

Correct. Distillation transfers the capability of a large teacher model into a smaller student model through a training process that targets the teacher's output distributions rather than ground truth labels alone.

Not quite. Distillation trains a smaller student to replicate a larger teacher. Converting weights to lower precision is quantization; attention caching is KV-cache optimization.

7. UC Riverside researchers estimated that a conversation of 20–50 questions with ChatGPT consumes approximately how much fresh water through data center cooling?

Correct. Li et al. (UC Riverside, 2023) estimated ~500 mL of water per 20–50 ChatGPT exchanges through data center evaporative cooling consumption.

Not quite. The estimate is ~500 mL — about one standard water bottle — per 20–50 exchanges, via evaporative cooling at Microsoft's data centers.

8. A Power Purchase Agreement (PPA) is considered more credible than Renewable Energy Certificates (RECs) for carbon accounting primarily because:

Correct. The key advantage of PPAs is additionality — they typically fund construction of new renewable capacity, whereas RECs from existing hydro or wind represent accounting transfers without new clean generation.

Not quite. The core distinction is additionality: PPAs fund new renewable capacity, whereas RECs can be bought from already-operating generators — accounting for emissions without actually causing them to be reduced.

9. The EU Corporate Sustainability Reporting Directive (CSRD) is relevant to AI environmental accountability in what way?

Correct. CSRD applies broad environmental reporting requirements to large companies — including tech firms with EU operations — but is not specifically designed for AI and does not mandate training-run-level disclosure.

Not quite. CSRD is a general environmental sustainability reporting requirement for large companies, not an AI-specific standard. It covers tech firms operating in Europe but does not mandate training-run-level AI carbon disclosure.

10. Jevons' Paradox predicts that improving inference efficiency will most likely:

Correct. Jevons' Paradox: efficiency gains lower cost, demand rises, total consumption may increase. Cheaper inference enables more AI products, more queries, and more integration — expanding total energy use.

Not quite. Jevons' Paradox predicts the counterintuitive: efficiency gains lower the cost of a service, stimulating enough additional demand to potentially increase total consumption.

11. Google's "Carbon-Intelligent Computing" system achieves emissions reductions primarily through:

Correct. Carbon-Intelligent Computing shifts batch workloads to periods of high renewable generation — reducing emissions per unit of compute without reducing the compute itself.

Not quite. The system does temporal and geographic load shifting — running flexible batch jobs when and where the grid is cleaner — rather than reducing compute or purchasing offsets.

12. The Microsoft data center in West Des Moines, Iowa made news in 2023 because public records revealed what?

Correct. Public records showed the Iowa facility drew ~6.4 million gallons of water in July 2023 — the same month Microsoft was reportedly training GPT-4 — highlighting AI's water footprint.

Not quite. The Iowa story was about water — ~6.4 million gallons drawn in a single month, coinciding with GPT-4 training, brought data center water consumption into public view for the first time.

13. "Embodied carbon" in AI hardware refers to:

Correct. Embodied carbon covers the full manufacturing lifecycle — raw materials through fabrication — and can be substantial for advanced AI chips. It is almost never included in AI carbon accounting.

Not quite. Embodied carbon is the manufacturing footprint: mining rare minerals, semiconductor fabrication at TSMC, assembly, and transport — all occurring before operational use begins.

14. The White House Executive Order on AI (October 2023) addressed AI's environmental impacts by:

Correct. EO 14110 Section 5.2 directed the DoE to study AI's energy and water implications but did not impose mandatory emissions disclosure — a study mandate, not a reporting requirement.

Not quite. The EO directed the DoE to evaluate AI energy and water impacts — a study mandate that precedes but does not itself create disclosure requirements or emission limits.

15. Which of the following best describes the current state of AI environmental accountability as of 2024?

Correct. The honest assessment: measurement tools, SCI, Model Cards, and CodeCarbon exist — but application is voluntary and inconsistent while total AI energy use grows rapidly. The will to apply these tools consistently is still being negotiated.

Not quite. The field has moved from ignorance to active measurement tooling, but no binding standards exist, adoption is inconsistent, and absolute energy consumption is growing faster than the accountability ecosystem can track.