Researchers Emma Strubell, Ananya Ganesh, and Andrew McCallum published a paper that made the AI community uncomfortable. They had done something few had bothered to do: they measured the actual carbon cost of training large NLP models. Their headline finding โ that training a single large transformer with neural architecture search emitted roughly 626,000 pounds of COโ equivalent โ traveled far beyond academic circles. It was the first time many researchers had confronted the arithmetic of their own work.
The paper did not claim that AI was uniquely villainous. It claimed that the field had been operating without transparency. No cost, no accountability.
Training a large neural network is fundamentally an optimization problem solved by repetition. A model with hundreds of billions of parameters is exposed to vast datasets โ trillions of tokens โ and its weights are nudged incrementally toward configurations that minimize prediction error. Each nudge requires a forward pass, a loss calculation, and a backward pass of gradient computation across every parameter. This cycle repeats billions of times.
The hardware doing this work โ typically thousands of specialized GPUs or TPUs running in parallel โ draws power continuously for weeks or months. The electricity to run those chips, plus the electricity to cool the data centers housing them, produces the carbon footprint researchers began measuring in 2019.
Compute efficiency has improved dramatically: models trained in 2024 accomplish far more per GPU-hour than those trained in 2019. But model scale has grown faster than efficiency gains, meaning absolute energy consumption has risen even as relative efficiency improved.
In 2021, Google and UC Berkeley researchers led by David Patterson published "Carbon Emissions and Large Neural Network Training," offering a more systematic methodology. They computed energy consumption from chip specifications, PUE (Power Usage Effectiveness) of data centers, and the carbon intensity of the electricity grid used during training. Their analysis covered a range of landmark models and found that the choice of data center location โ and therefore grid carbon intensity โ was often more important than raw compute volume in determining total emissions.
Training the same model in a coal-heavy grid region could produce ten times more COโ than training it in a region powered by hydroelectricity or nuclear. This finding reframed the conversation: hardware efficiency matters, but energy source matters more.
Training a 213M-parameter T5 model in a Google data center using TPUs and renewable-matched energy produced roughly 46 tonnes COโe. The same compute on a coal-heavy grid would have produced approximately 270 tonnes โ nearly a 6ร difference from location alone.
Labs rarely publish the full details needed to reproduce emissions estimates independently. The critical inputs โ exact GPU count, training duration, data center PUE, and grid carbon factor at the time of training โ are often proprietary or simply unrecorded. Researchers like Jesse Dodge at AI2 have argued that papers should routinely report compute costs alongside accuracy benchmarks, the same way experimental papers report methodology. As of 2024, this remains voluntary and inconsistently practiced.
There is also the problem of failed runs. The published training run is not the only training run. Hyperparameter searches, ablations, and failed experiments consume additional energy that rarely appears in any carbon accounting. Strubell et al. noted this explicitly: the figure reported is a floor, not a ceiling.
No regulatory body requires AI labs to disclose training emissions. The Strubell paper's lasting contribution was not the specific numbers โ those are already outdated โ but the demonstration that the numbers are calculable and that the field's silence on them was a choice, not a necessity.
Lesson 2 will examine what happens after training โ the inference phase, where the cumulative energy cost of millions of daily queries may ultimately dwarf the training run itself.
You have been given compute budget to train a 70-billion-parameter model. Your lab assistant can help you estimate how different choices โ data center location, hardware generation, grid mix โ change the carbon cost of your training run.
Ask at least three substantive questions to complete this lab.
Within two months of ChatGPT's public launch in November 2022, OpenAI was serving an estimated 100 million users โ a milestone no consumer technology product had reached so quickly. Behind that milestone were data centers running inference around the clock. Each query required a forward pass through GPT-3.5 Turbo: not the months-long training job, but a millisecond-scale computation happening simultaneously across millions of requests.
The training run was a one-time cost. Inference was a daily tax โ one that scaled with every new user, every longer conversation, every additional product built on the API.
Training and inference have fundamentally different cost profiles. Training is compute-bound and occurs once (or a handful of times per model generation). Inference is demand-bound and continuous. The energy cost of a single inference query is small โ measured in milliwatt-hours โ but multiplied by billions of queries daily, it becomes substantial.
Researchers at Hugging Face and Carnegie Mellon published an analysis in 2022 estimating that inference energy consumption for a model like BLOOM (176B parameters) was roughly 1.0 Wh per query for a long-form generation task. At a hypothetical 100 million queries per day, that is 100 MWh daily โ or about 36,500 MWh annually. Compare that to BLOOM's training run, which consumed approximately 433 MWh. Inference overtakes training in cumulative energy within five days of heavy use.
Model quantization, caching, and smaller specialized models are the primary levers for reducing per-query inference cost. But as model capability expands and query volumes grow, efficiency gains have not kept pace with demand growth.
The International Energy Agency's 2023 report "Electricity 2024" cited estimates that a single AI-powered search query uses approximately 10 times more electricity than a conventional keyword search. Google processes roughly 8.5 billion searches per day. If even a fraction of those migrate to generative AI responses โ as Google's own Search Generative Experience and Bing AI integration suggest โ the aggregate energy impact on search infrastructure alone becomes significant.
Google reported in its 2023 Environmental Report that its data centers consumed 24.2 terawatt-hours of electricity, a figure that predates the full integration of generative AI into Search. Independent analysts expect the figure to rise substantially through 2025โ2027 as AI inference workloads displace or supplement traditional search.
The IEA's 2024 electricity report projected that global data center electricity consumption could double by 2026, with AI inference workloads as the primary growth driver. The agency noted that without significant grid decarbonization, this growth could add materially to global emissions even if data centers operate at high efficiency.
Quantization reduces the numerical precision of model weights (e.g., from 32-bit floats to 8-bit or 4-bit integers), cutting memory use and computation per token at some accuracy cost. Research from Hugging Face, EleutherAI, and others has demonstrated that 4-bit quantized models retain most capability while using roughly 4ร less memory and significantly less compute.
Distillation trains a smaller "student" model to mimic a larger "teacher" model. Microsoft's Phi series and Google's Gemma series are examples of models designed for efficient inference while maintaining strong performance. Distilled models can be deployed on edge hardware, eliminating data center round-trips entirely.
KV-cache optimization avoids redundant computation in attention mechanisms for repeated or similar queries, a standard practice in production inference serving. These are engineering efficiencies that compound โ but they operate against a backdrop of continuously growing model sizes and query volumes.
Jevons' Paradox โ the historical observation that efficiency gains increase total resource use by making the resource cheaper and demand higher โ appears to be operating in AI inference. Faster, cheaper inference enables more applications, more queries, and more integration into products, expanding total energy consumption even as per-query efficiency improves.
You are advising a startup planning to deploy an AI assistant for customer service. The model will handle ~5 million queries per day. Your lab assistant can help you estimate total inference energy, compare optimization strategies, and think through the tradeoffs between cost, quality, and carbon footprint.
In July 2023, documents obtained through a public records request revealed that a Microsoft data center in West Des Moines, Iowa had drawn approximately 6.4 million gallons of water from the local utility in a single month โ the same month that Microsoft was training GPT-4. The West Des Moines Water Works noted the spike explicitly in its internal records. The data center's cooling towers had consumed water at a rate that local officials described as significant, though the utility noted it remained within contracted limits.
The story, reported by Sharon Goldman and others, illuminated a dimension of AI infrastructure that had received almost no public attention: water. Evaporative cooling systems โ the dominant cooling method in large data centers โ consume water that does not return to the local watershed.
Most large data centers use evaporative cooling: warm air from servers is passed over water-saturated surfaces, and the evaporation carries heat away. This is highly energy-efficient but consumes water that is lost to the atmosphere rather than recycled. The metric used is Water Usage Effectiveness (WUE) โ liters of water consumed per kilowatt-hour of IT load.
Researchers at UC Riverside, led by Pengfei Li, published a 2023 study estimating that training GPT-3 consumed approximately 700,000 liters of fresh water at Microsoft's data centers โ enough to fill roughly 280 Olympic swimming pools if scaled across the full training run and cooling infrastructure. The study, "Making AI Less 'Thirsty'," also estimated that a conversation of 20โ50 questions with ChatGPT consumes roughly 500 milliliters of water โ about the volume of a standard water bottle.
The water source matters enormously. Data centers in water-stressed regions โ including parts of the US Southwest, northern Chile (where copper mining also strains water supplies), and parts of India โ draw water from aquifers under pressure from agriculture, urban growth, and climate change. A data center in Norway drawing on abundant cold freshwater presents a fundamentally different environmental profile than one in Phoenix drawing from the Colorado River basin.
AI hardware โ GPUs, TPUs, and the high-bandwidth memory chips that support them โ requires rare and strategically significant minerals. Cobalt, used in lithium-ion batteries powering UPS systems; tantalum, in capacitors; and the rare earth elements used in precision electronics all have extraction footprints that carbon accounting ignores entirely.
NVIDIA's H100 GPU โ the dominant chip for AI training in 2023โ2024 โ requires sophisticated semiconductor manufacturing processes at TSMC's facilities in Taiwan, which themselves consume enormous volumes of ultrapure water (a semiconductor manufacturing requirement distinct from cooling water). The full lifecycle carbon cost of hardware manufacturing โ called embodied carbon โ can be substantial. A 2022 analysis in the journal Nature Electronics found that for some computing scenarios, manufacturing emissions exceeded operational emissions over the hardware's lifetime.
Hardware generations also turn over rapidly. The shift from A100 to H100 GPUs, then to H200 and Blackwell architectures, creates large volumes of high-value electronic waste. GPU clusters displaced by new hardware may be resold, but the embodied carbon of manufacturing is already spent regardless of secondary use.
Hyperscale data centers occupy tens to hundreds of acres. Microsoft's planned expansion in Goodyear, Arizona; Google's facilities in The Dalles, Oregon; and Meta's data center campuses represent significant land footprints. The Dalles, Oregon facility drew local controversy as Google sought additional water rights from the Columbia River while operating in a region with growing water stress concerns.
Microsoft's 2023 Environmental Sustainability Report was notable for disclosing absolute water consumption: 6.4 million cubic meters globally in 2022, up 34% from 2021. The company attributed growth to expanded data center operations and committed to being "water positive" by 2030 โ replenishing more water than it consumes globally. Critics noted that global replenishment accounting does not address local water stress in specific deployment regions.
Google's 2023 Environmental Report disclosed similar water metrics. Neither company breaks down water consumption by specific product or model training run, making independent verification of specific figures like the GPT-3 training estimate difficult. The UC Riverside study used public power consumption estimates and typical WUE figures to reconstruct the estimate โ an approach that introduces uncertainty but is the best available given disclosure norms.
Current AI environmental reporting typically covers operational carbon (and sometimes renewable energy matching) but rarely addresses water consumption at the training-run level, embodied hardware carbon, or land use. A complete environmental accounting of AI would require all four dimensions: operational carbon, operational water, embodied hardware impacts, and land footprint.
You are advising a city government in a water-stressed region that wants to attract a major AI data center. Your lab assistant can help you think through water consumption estimates, embodied hardware carbon, siting considerations, and what questions to ask prospective data center operators.
When the Biden administration released its Executive Order on Safe, Secure, and Trustworthy AI in October 2023, Section 5.2 included a directive for the Department of Energy to evaluate the energy and water implications of AI. It was the first time a major regulatory document in the United States explicitly linked AI development to environmental infrastructure concerns. The order did not mandate emissions disclosure, but it directed agencies to study the problem โ a precursor step to potential future requirements.
In the EU, the AI Act adopted in 2024 included environmental requirements for high-risk AI systems, but critics noted these were high-level obligations to report resource use rather than specific methodological standards.
As of 2024, no binding international standard requires AI labs to disclose training emissions, water consumption, or hardware lifecycle impacts. Voluntary disclosure varies dramatically. Large public companies in the EU face broader sustainability reporting requirements under the Corporate Sustainability Reporting Directive (CSRD), which began phasing in for large companies in 2024, but these are general environmental frameworks not specific to AI.
The most specific voluntary standard developed for AI-related computing is the work of the Green Software Foundation, which has produced a Software Carbon Intensity (SCI) specification โ a method for calculating emissions per unit of software work. This has been adopted as an ISO standard (ISO/IEC 21031) but remains voluntary and adoption in AI labs has been limited.
Some labs have moved toward self-disclosure. Hugging Face's Model Cards include a "Carbon Footprint" section that encourages authors to report training compute and estimated COโ. Google's research papers occasionally include compute and carbon estimates. These are steps, but they are inconsistent and unverified.
Software Carbon Intensity (SCI) / ISO 21031 โ Voluntary per-function-unit emissions metric developed by the Green Software Foundation.
EU CSRD โ Requires large companies to disclose environmental sustainability data including energy and water; not AI-specific but applicable to major labs operating in Europe.
Hugging Face Model Cards โ Voluntary community standard encouraging compute and COโ disclosure in model documentation.
US EO 14110, Section 5.2 โ Directed DoE to study AI energy/water implications; no mandatory disclosure requirement as of 2024.
Researchers working on this problem โ including Jesse Dodge at AI2, Sasha Luccioni at Hugging Face, and the authors of the MLCO2 Calculator โ have converged on a set of minimum requirements for credible AI carbon reporting:
1. Compute disclosure: Training runs should report total GPU/TPU hours, chip type, and batch size. These are the primary inputs for third-party carbon estimation. Without them, estimates cannot be independently verified.
2. Grid carbon factor: The carbon intensity of the electricity grid at the time and location of training should be disclosed. This is available from utilities and national grid operators and eliminates the largest source of estimation uncertainty.
3. PUE disclosure: Data center PUE at the facility used for training should be reported. Industry average is ~1.4; hyperscale facilities can achieve ~1.1.
4. Inference reporting: Ongoing inference emissions should be estimated and disclosed, not just training runs. This requires establishing per-query energy baselines and reporting at regular intervals.
5. Failed run accounting: Exploratory compute โ hyperparameter searches, ablations โ should be included in total carbon accounting, even if reported separately.
Temporal and geographic shifting: Google and DeepMind have developed systems that shift compute workloads to times and locations with lower grid carbon intensity โ running batch training during periods of high renewable generation. Google's 2020 paper "Carbon-Intelligent Computing" described shifting ~30โ40% of flexible compute loads to lower-carbon windows. This does not reduce compute, but it lowers the carbon intensity of each unit of compute.
Hardware efficiency: Moving from older GPU generations to current ones provides substantial efficiency gains. NVIDIA reports that its H100 delivers ~3.5ร the training performance per watt compared to the A100. Labs that have upgraded hardware benefit from this even holding model scale constant.
Efficient architecture research: Work on sparse models (Mixture of Experts), state-space models, and architectural innovations that achieve similar capability with less compute per token โ such as Mamba, RWKV, and Mistral's sliding window attention โ represent genuine reductions in the compute required for a given capability level.
Renewable energy procurement: Microsoft, Google, and Meta have all made large Power Purchase Agreements (PPAs) for renewable energy. The credibility of these commitments depends on whether they represent additionality โ new renewable capacity brought online โ or simply offsetting existing grid consumption with certificates from elsewhere.
A renewable energy commitment is most meaningful when it causes new clean generation capacity to exist that would not have otherwise. Purchasing Renewable Energy Certificates (RECs) from existing hydroelectric plants displaces no fossil generation โ it is accounting, not emission reduction. Power Purchase Agreements that fund new solar or wind installations represent genuine additionality and are the credible standard.
The trajectory is genuinely mixed. On one hand, the research community has moved from near-complete ignorance of AI's energy footprint (pre-2019) to active measurement, tooling, and policy engagement (2024). The MLCommons organization now includes energy efficiency in its MLPerf benchmarks. The Green Software Foundation has created a community of practitioners. Sasha Luccioni's CodeCarbon tool has been integrated into Hugging Face and used by hundreds of organizations to track emissions per training run.
On the other hand, the absolute scale of AI energy use is growing faster than the efficiency and disclosure ecosystem can track. The IEA projects data center energy doubling by 2026. The models being trained in 2024 are orders of magnitude larger than those in 2019. And the commercial incentive to downplay environmental costs remains strong in an industry competing for massive capital investment.
The honest assessment: the tools for accountability exist. The will to apply them consistently and transparently is still being negotiated between researchers, companies, regulators, and the public.
You are drafting environmental disclosure requirements for a new AI governance body. Your lab assistant can help you think through what information should be mandatory, how to handle verification, what exemptions might be reasonable, and how to compare your framework to existing approaches like SCI, CSRD, and Model Cards.