Module 6 · Lesson 1

The Energy Cost of AI Itself

Before AI can help the climate, we must reckon with what AI consumes.

How much energy does training and running large AI models actually require — and who is measuring it honestly?

When Google published its 2024 Environmental Report, it disclosed that its total greenhouse gas emissions had risen 48% since 2019 — the opposite direction of its net-zero pledge. The primary culprit, the company acknowledged, was the surging electricity demand of its AI data centers. This was not a small company's growing pain. It was one of the world's most sophisticated engineering organizations admitting that the very tool it was deploying to help humanity could not yet account for its own footprint.

The Training Computation Problem

Training a large language model requires running billions of matrix multiplications across thousands of specialized chips, continuously, for weeks or months. A landmark 2019 study by Emma Strubell and colleagues at the University of Massachusetts Amherst estimated that training a single large transformer-based NLP model could emit as much CO₂ as five average American cars over their entire lifetimes — roughly 284 tonnes. Later models are far larger.

GPT-3, released by OpenAI in 2020, was estimated to have required approximately 1,287 megawatt-hours of electricity during training. For context, that is roughly the annual consumption of 120 U.S. households. GPT-4's training costs have not been officially disclosed, but independent researchers and leaked estimates place energy use substantially higher. Anthropic, Google DeepMind, and Meta have similarly declined to publish precise training energy figures for their flagship models.

The lack of standardized disclosure is itself a governance problem. Without mandatory reporting, the field cannot accurately calculate its own impact or set meaningful reduction targets.

Real Data Point

The 2023 paper "Power Hungry Processing: Watts Driving the Cost of AI Deployment?" by Luccioni, Viguier & Ligozat measured inference energy across 88 open-source models. They found that text generation tasks consumed up to 4,757 times more energy per query than simple classification tasks — demonstrating that model architecture and task type matter enormously for real-world consumption.

Inference at Scale: The Hidden Daily Cost

Training is a one-time (per model version) cost. Inference — running the model to answer queries — is continuous and accumulates at enormous scale. According to research published by the International Energy Agency in 2024, a single ChatGPT query consumes roughly 10 times the electricity of a standard Google Search. When multiplied across billions of daily queries, inference energy dwarfs training energy over a model's lifetime.

Microsoft, which integrated OpenAI models into Bing and its Office suite in 2023, reported in its 2023 Sustainability Report that data center water consumption — used for cooling — had increased by 34% year-over-year, reaching 6.4 million cubic meters. Water stress is a co-consequence of compute intensity, particularly relevant in arid regions where many large data centers are sited.

10×

ChatGPT vs. Google Search electricity per query (IEA 2024)

48%

Google GHG emissions increase 2019–2023

6.4M m³

Microsoft data center water use in 2023

284t

CO₂ equivalent from training one large NLP model (Strubell et al.)

Hardware Efficiency and the Jevons Paradox

NVIDIA's H100 GPU, released in 2022, delivers roughly 3× the training throughput of the previous-generation A100 at comparable power draw — a meaningful efficiency gain. Google's custom Tensor Processing Units (TPUs) similarly optimize matrix operations for lower watt-per-FLOP ratios than general-purpose silicon. These hardware improvements are real.

However, economists and climate researchers warn of the Jevons Paradox: as computing becomes more efficient, it also becomes cheaper, which historically has caused total consumption to rise rather than fall. The history of computing supports this concern. CPU efficiency improved dramatically from the 1970s through the 2010s, yet global data center electricity use grew continuously. There is no strong evidence that AI hardware efficiency gains will decouple from demand growth.

Key Principle

Sustainable AI requires both supply-side improvements (greener electricity, more efficient hardware) and demand-side discipline (choosing the right-sized model for each task, avoiding unnecessary inference, measuring and disclosing consumption). Neither alone is sufficient.

FLOPsFloating-Point Operations — the standard unit for measuring computational work in neural network training.

InferenceRunning a trained model to produce outputs. At scale, inference energy exceeds training energy over a model's deployed lifetime.

PUEPower Usage Effectiveness — ratio of total data center energy to IT equipment energy. A PUE of 1.0 is perfect; typical modern data centers run 1.1–1.5.

Jevons ParadoxThe counterintuitive finding that increased efficiency in resource use often leads to increased total consumption due to lower costs driving higher demand.

Lesson 1 Quiz

The Energy Cost of AI Itself — 4 questions

1. According to the IEA's 2024 research, how does the electricity consumption of a single ChatGPT query compare to a standard Google Search?

Correct. The IEA's 2024 analysis found a ChatGPT query consumes roughly 10× the electricity of a Google Search, making inference-at-scale a significant energy concern.

Not quite. The IEA's 2024 estimate is approximately 10 times more energy per query — a meaningful gap that compounds enormously at billions of daily queries.

2. What does the Jevons Paradox predict about AI hardware efficiency improvements?

Correct. The Jevons Paradox warns that efficiency gains lower costs, which historically increase demand — potentially causing total consumption to rise even as per-unit efficiency improves.

The Jevons Paradox specifically warns against assuming efficiency gains translate to total reductions. Lower cost per FLOP tends to expand demand, potentially increasing total energy use.

3. What was the primary reason Google cited for its greenhouse gas emissions rising 48% between 2019 and 2023?

Correct. Google's 2024 Environmental Report explicitly attributed the emissions increase primarily to the electricity demands of expanding AI infrastructure and data centers.

Google's 2024 Environmental Report identified AI data center electricity demand as the primary driver of its 48% emissions increase — a significant disclosure for a company with net-zero pledges.

4. What does PUE measure, and what does a PUE value of 1.0 represent?

Correct. PUE (Power Usage Effectiveness) = total facility energy ÷ IT equipment energy. A PUE of 1.0 means every watt goes directly to computing, with zero overhead — an engineering ideal.

PUE is Power Usage Effectiveness: total data center energy divided by IT equipment energy. A PUE of 1.0 would mean zero energy wasted on cooling, lighting, or infrastructure — the theoretical ideal.

Lab 1: Auditing AI's Energy Footprint

Interactive discussion — minimum 3 exchanges to complete

Your Mission

You are advising a mid-sized tech company that wants to deploy a large language model in its customer service pipeline. Before they commit, they've asked you to assess the energy implications. Use this lab to explore how to estimate, disclose, and reduce AI's energy footprint in a real deployment context.

Start here: "Our company wants to deploy a large LLM for customer support — maybe 50,000 queries per day. How should we think about the energy cost, and what questions should we be asking our cloud provider?"

AI Energy Advisor

Lab 1

Welcome to Lab 1. I'm your AI energy footprint advisor. This lab focuses on understanding and quantifying the real energy costs of deploying AI at scale — and what your organization can actually do about them. Ask me anything about inference energy, cloud provider questions, disclosure standards, or hardware choices. What's on your mind?

Module 6 · Lesson 2

Green Electricity & Data Center Siting

Where AI runs matters as much as how efficiently it runs.

How are technology companies procuring renewable energy for AI, and what does "24/7 carbon-free energy" actually mean in practice?

In 2022, Google announced it had achieved 100% renewable energy matching globally since 2017 — meaning that on an annual basis, the company purchased as many megawatt-hours of renewable energy certificates as it consumed in electricity. This sounds definitive. But the company simultaneously acknowledged a more demanding target: 24/7 Carbon-Free Energy, or CFE — matching clean power to consumption in every hour, in every grid region, by 2030. The gap between annual matching and hourly matching is vast, and reveals how much work remains.

Power Purchase Agreements and RECs

The standard corporate mechanism for claiming renewable energy use is the Renewable Energy Certificate (REC). One REC represents one megawatt-hour of electricity generated from a renewable source. Companies buy RECs to "match" their consumption — but a REC purchased in Texas can offset electricity consumed from a coal-heavy grid in Virginia on a dark, windless night. Critics, including environmental nonprofit Rocky Mountain Institute, call this "spreadsheet decarbonization."

Power Purchase Agreements (PPAs) are a stronger commitment: a company contracts directly with a renewable energy developer to purchase output from a specific project over 10–25 years. Microsoft, Google, and Amazon are among the largest corporate PPA signatories globally. According to BloombergNEF's 2023 Corporate Energy Market Outlook, these three companies collectively signed over 20 gigawatts of new renewable PPAs in 2022–2023. PPAs fund the actual construction of new renewable capacity — a meaningful climate contribution — but still don't guarantee that the electrons powering a given server at a given moment are clean.

The 24/7 CFE Standard

Google's 24/7 CFE initiative, launched in 2020 in partnership with the UN and other technology companies, sets a more rigorous standard: for every hour of electricity consumption, the company aims to procure an equal amount of carbon-free energy on the same regional grid. This requires dispatchable clean energy (storage, geothermal, hydro, nuclear) to cover hours when solar and wind are unavailable.

Google's 2024 report showed it achieved a global average of 64% CFE in 2023, meaning 36% of its hourly consumption was still matched by fossil-heavy grid power. Progress is uneven: its Singapore data centers achieved only 4% CFE due to that grid's heavy reliance on natural gas. Its operations in Denmark and Finland, where grids are heavily renewable, performed far better.

Case Study · Microsoft & Nuclear

In September 2023, Microsoft signed a 20-year agreement with Constellation Energy to purchase power from the restarted Three Mile Island Unit 1 nuclear plant in Pennsylvania — specifically to power its AI data centers. The deal underscored a broader industry recognition that intermittent renewables alone cannot meet 24/7 clean power demands for always-on computing infrastructure.

Geographic Arbitrage and Water Stress

Data center siting decisions carry large carbon and water consequences. A data center in Iceland, powered almost entirely by geothermal and hydroelectric energy, has a near-zero operational carbon footprint. The same workload run from a data center in Singapore or parts of the U.S. Midwest might have a carbon intensity five to ten times higher, depending on the grid mix.

Water cooling is a related concern. Evaporative cooling towers — the dominant technology in large data centers — consume enormous volumes of freshwater. A 2021 study in Nature Communications estimated that U.S. data centers withdrew roughly 1.7 billion liters of water per day. Microsoft's reported 34% increase in water use from 2022 to 2023 illustrates how AI scaling is accelerating this demand. Data centers in the American West — including major cloud regions in Arizona, Nevada, and Oregon — face increasing scrutiny from water authorities as drought conditions worsen.

Principle · Additionality

The strongest form of renewable energy procurement creates additionality — it funds new clean capacity that would not have existed otherwise. PPAs that finance new solar or wind projects contribute to additionality. Buying existing RECs from projects that were already running does not. Evaluating additionality is now a core criterion in serious corporate sustainability assessments.

RECRenewable Energy Certificate — one certificate per MWh of renewable generation. Tradeable but geographically and temporally decoupled from actual consumption.

PPAPower Purchase Agreement — a long-term direct contract with a renewable energy generator, typically 10–25 years, funding new project construction.

24/7 CFE24/7 Carbon-Free Energy — the goal of matching every hour of electricity consumption with carbon-free generation on the same regional grid.

AdditionalityWhether a renewable energy purchase funds new capacity that would not otherwise have been built — the strongest standard of climate contribution.

Lesson 2 Quiz

Green Electricity & Data Center Siting — 4 questions

1. What is the core weakness of using Renewable Energy Certificates (RECs) as a sustainability claim?

Correct. A REC bought in Texas can "offset" coal-powered consumption in Virginia — the physical electrons and the certificate are entirely disconnected, which is why critics call this "spreadsheet decarbonization."

The central weakness is temporal and geographic decoupling. A REC represents renewable generation somewhere, sometime — not necessarily where or when your data center actually consumed power. Physical grid electrons and REC accounting are separate systems.

2. What percentage of its hourly electricity consumption did Google match with carbon-free energy in 2023, according to its Environmental Report?

Correct. Google's 2024 report disclosed a global average of 64% CFE in 2023, with significant variation by region — from 4% in Singapore to much higher figures in Scandinavia.

Google's 2024 Environmental Report disclosed 64% global average 24/7 CFE for 2023 — meaning 36% of its hourly consumption was still backed by fossil-heavy grid power, underscoring how far even the most ambitious companies remain from true 24/7 clean power.

3. What distinguished Microsoft's 2023 agreement with Constellation Energy from a typical renewable energy purchase?

Correct. The Three Mile Island deal represented a recognition that dispatchable, always-on clean power sources like nuclear are necessary to achieve 24/7 CFE goals that intermittent solar and wind cannot fulfill alone.

Microsoft signed a 20-year nuclear power agreement with Constellation for the restarted Three Mile Island Unit 1 in Pennsylvania — specifically to provide dispatchable, always-on clean power for AI data centers that cannot rely solely on intermittent renewables.

4. Why does data center siting location matter significantly for carbon footprint?

Correct. Grid carbon intensity is the dominant variable. Iceland's near-100% renewable grid versus Singapore's gas-heavy grid means the same compute task can produce radically different emissions — a 5–10× difference is realistic.

Grid carbon intensity is the key variable. A data center in Iceland (geothermal/hydro) versus Singapore (natural gas dominant) can produce 5–10× different emissions for identical workloads. Siting decisions are effectively carbon decisions.

Lab 2: Evaluating Renewable Energy Claims

Interactive discussion — minimum 3 exchanges to complete

Your Mission

Your organization's sustainability team has received three vendor proposals for cloud AI services. Each vendor makes different renewable energy claims. You need to evaluate these claims critically and recommend a procurement approach that reflects genuine climate impact rather than marketing.

Start here: "We have three cloud vendors: Vendor A claims '100% renewable energy' via REC purchases. Vendor B has PPAs for new solar projects but only achieves 72% 24/7 CFE. Vendor C offers a data center in Iceland running on geothermal with 98% CFE but with 40ms higher latency. How should we evaluate these options?"

Renewable Energy Procurement Advisor

Lab 2

Welcome to Lab 2. I'm your renewable energy procurement advisor. We'll work through how to critically evaluate clean energy claims from cloud and AI vendors — moving beyond marketing language to actual climate impact. The scenario you're facing is very realistic: most organizations encounter exactly this mix of REC claims, PPA commitments, and geography-based options. What would you like to dig into first?

Module 6 · Lesson 3

Efficient AI: Model Design & Algorithmic Choices

The most sustainable compute is the compute you don't run.

How do model architecture decisions, quantization, and task-appropriate model sizing reduce AI's environmental footprint without sacrificing usefulness?

In February 2023, Meta released LLaMA — a family of open-source language models ranging from 7 billion to 65 billion parameters. The 7B version, researchers quickly discovered, outperformed GPT-3 on several benchmarks while requiring a fraction of the compute for inference. This was not magic. It reflected a decade of architectural improvement — better training data curation, more efficient attention mechanisms, and lessons from the scaling laws literature. The release forced a broader conversation: had the AI industry been over-parameterizing models out of competitive pressure rather than necessity?

Scaling Laws and the Chinchilla Insight

In 2022, researchers at DeepMind published "Training Compute-Optimal Large Language Models," quickly nicknamed the Chinchilla paper. Their central finding: most large language models up to that point had been significantly undertrained relative to their parameter count. The optimal trade-off, they argued, is to train a smaller model on more data rather than a larger model on less data.

Their Chinchilla model, at 70 billion parameters trained on 1.4 trillion tokens, outperformed the 280-billion-parameter Gopher model on nearly every benchmark — while requiring substantially less compute for both training and inference. The implication for sustainability is direct: if a 70B model consistently beats a 280B model, deploying the 280B model at scale is not just computationally wasteful — it is environmentally wasteful.

Real Finding · Chinchilla Paper

The Chinchilla paper (Hoffmann et al., DeepMind, 2022) demonstrated that for a given compute budget, the optimal strategy is to roughly scale model size and training tokens equally. Most prior large models had used compute budgets to maximize parameters rather than training duration — resulting in "compute-optimal" models that were 3–4× smaller than their predecessors but comparably capable.

Quantization and Pruning

Once a model is trained, several techniques can dramatically reduce its inference cost:

Quantization reduces the numerical precision of model weights. A standard model uses 32-bit floating-point numbers (FP32). Quantizing to 8-bit integers (INT8) roughly halves memory usage and speeds inference significantly with minimal accuracy loss on most tasks. 4-bit quantization is increasingly viable. The open-source community developed tools like GPTQ and bitsandbytes that made 4-bit quantization of LLaMA-family models practical on consumer hardware — enabling the same model to run on a laptop that previously required a server cluster.

Pruning removes weights or attention heads identified as low-contribution during a structured analysis phase. Structured pruning can reduce model size by 30–50% with modest accuracy degradation for many real-world tasks. Knowledge distillation trains a smaller "student" model to mimic a larger "teacher" model, transferring learned behavior into a more efficient architecture. Google's DistilBERT, published in 2019, achieved 97% of BERT's performance on GLUE benchmarks at 40% fewer parameters and 60% faster inference.

Right-Sizing: Matching Model to Task

Perhaps the most actionable insight for practitioners is right-sizing: using the smallest model capable of achieving acceptable performance for a given task. A 175-billion-parameter model is not appropriate for classifying whether a customer support ticket is about billing or shipping. A fine-tuned BERT-class model with ~110M parameters can achieve near-identical accuracy on such classification tasks at 1/1,000th the inference cost.

A 2023 study from Hugging Face and Carnegie Mellon University ("Efficiency Benchmarks for NLP") found that for the majority of enterprise NLP tasks — classification, extraction, summarization of short texts — models in the 1–7B parameter range performed comparably to 70B+ models, while consuming 10–50× less energy per inference. The researchers recommended that organizations build explicit model selection criteria based on task complexity rather than defaulting to the largest available model.

97%

DistilBERT performance vs. BERT at 40% fewer parameters

4×

Chinchilla smaller than Gopher while outperforming it

50×

Max inference energy reduction from right-sizing for classification tasks

60%

Faster inference for DistilBERT vs. BERT

Sustainable AI Design Checklist

1. Benchmark task requirements before selecting model size. 2. Apply quantization (INT8 minimum) for production inference. 3. Evaluate distilled alternatives before deploying full-scale models. 4. Monitor inference energy using tools like CodeCarbon or the ML CO₂ Impact calculator. 5. Cache frequent responses to avoid redundant inference computation.

QuantizationReducing numerical precision of model weights (e.g., FP32 → INT8 or INT4) to shrink memory use and accelerate inference with minimal accuracy loss.

PruningRemoving low-importance weights or attention heads from a trained model to reduce size and inference cost.

Knowledge DistillationTraining a smaller "student" model to replicate a larger "teacher" model's outputs, compressing capability into a more efficient architecture.

Chinchilla Scaling LawsDeepMind's 2022 finding that optimal training balances model size and training data equally, yielding smaller models with competitive performance.

Lesson 3 Quiz

Efficient AI: Model Design & Algorithmic Choices — 4 questions

1. What was the central finding of DeepMind's "Chinchilla" paper (2022)?

Correct. The Chinchilla paper showed that scaling model size and training tokens roughly equally yields compute-optimal models — and that most prior models had been significantly undertrained relative to their parameter counts.

The Chinchilla insight was that prior large models had been undertrained. The 70B Chinchilla model, trained on 1.4 trillion tokens, outperformed the 280B Gopher — demonstrating that more training data for a smaller model beats more parameters with less data.

2. What does INT8 quantization do to a model's weights, and what is the primary benefit?

Correct. Quantization converts high-precision (FP32) weights to lower-precision formats (INT8 or INT4), reducing memory footprint and enabling faster matrix operations — with minimal accuracy loss on most practical tasks.

Quantization reduces numerical precision. Moving from FP32 to INT8 roughly halves memory use and accelerates inference operations, since 8-bit arithmetic is faster and more parallelizable than 32-bit on modern hardware — with very small accuracy tradeoffs for most tasks.

3. Google's DistilBERT achieved what performance level compared to BERT, at what cost reduction?

Correct. DistilBERT's 2019 paper demonstrated that knowledge distillation could compress BERT's capabilities substantially: 97% of benchmark performance, 40% fewer parameters, 60% faster inference — a compelling efficiency case.

DistilBERT (Sanh et al., 2019) achieved 97% of BERT's performance on GLUE benchmarks with 40% fewer parameters and 60% faster inference — a landmark result for knowledge distillation demonstrating that most of a large model's capability can be compressed into a much smaller one.

4. For enterprise NLP classification tasks, what did the 2023 Hugging Face / CMU efficiency study find about model size and energy use?

Correct. The study found that for the majority of real enterprise NLP tasks, 1–7B models matched 70B+ performance while consuming 10–50× less inference energy — making right-sizing one of the most impactful sustainability interventions available.

The efficiency benchmarks study found that 1–7B parameter models matched 70B+ performance on most enterprise classification tasks while using 10–50× less energy per inference. Right-sizing to task complexity is one of the most impactful and immediately actionable sustainability choices available to AI practitioners.

Lab 3: Right-Sizing AI for Real Tasks

Interactive discussion — minimum 3 exchanges to complete

Your Mission

A logistics company is deploying AI across four internal workflows: (1) routing optimization, (2) customer email classification, (3) generating contract summaries, and (4) real-time driver safety alerts. They're planning to use GPT-4-class models for all four. Your job is to advise on right-sizing, quantization opportunities, and the energy implications of their current plan.

Start here: "Our logistics team wants to standardize on one large frontier model for all four use cases to simplify our stack. What's wrong with that approach from a sustainability standpoint, and what would you recommend instead?"

AI Efficiency Advisor

Lab 3

Welcome to Lab 3. I'm your AI efficiency advisor, focused on model right-sizing and algorithmic efficiency. We'll work through how to match model size and architecture to task requirements — one of the highest-leverage sustainability decisions available to any organization deploying AI. The logistics scenario you've described is extremely common: organizations defaulting to frontier models for everything. Let's dig into why that's both expensive and wasteful, and what a better approach looks like. What's on your mind?

Module 6 · Lesson 4

Measurement, Disclosure & Governance

You cannot manage what you do not measure — and the AI industry has largely avoided measuring.

What frameworks, standards, and regulations are emerging to require AI energy disclosure, and how should organizations prepare to report their AI's environmental impact?

When the U.S. Securities and Exchange Commission finalized its climate disclosure rules in March 2024 — requiring large public companies to report material climate-related risks and Scope 1 and 2 emissions — the rule did not explicitly mention AI energy consumption. But legal analysts immediately noted the implication: for technology companies where AI workloads constitute a significant portion of energy use, AI-driven emissions would be material climate-related risks requiring disclosure. The regulatory pressure, long anticipated, had arrived — even if AI wasn't named directly.

Current Measurement Tools

Several practical tools now exist for measuring AI energy consumption at the code level:

CodeCarbon is an open-source Python library developed by Mila (Montréal Institute for Learning Algorithms), Université de Montréal, and partners. It measures the energy consumption of Python code execution and converts it to estimated CO₂ equivalent based on the carbon intensity of the electricity grid where the computation is running. As of 2024, CodeCarbon has been downloaded over 400,000 times and is integrated into several cloud ML platforms.

ML CO₂ Impact Calculator (mlco2.github.io) allows researchers to estimate the emissions of training runs by inputting hardware type, cloud provider, region, and training duration. It draws on the ElectricityMaps API for regional grid intensity data. The tool was used in a 2022 NeurIPS paper that found median emissions reporting among published ML papers was absent — fewer than 5% of accepted papers reported training energy.

Experiment trackers such as Weights & Biases and MLflow have added energy tracking integrations that log kWh alongside loss curves, enabling teams to visualize the energy cost of hyperparameter experiments — often revealing that extensive search is consuming disproportionate energy relative to accuracy gains.

Disclosure Gap · NeurIPS 2022

An analysis of 1,700 papers accepted to NeurIPS 2022 found that fewer than 5% reported any training energy figure. The authors called this a "reproducibility crisis for sustainability" — without energy disclosure, it is impossible for the field to build cumulative knowledge about which approaches are computationally efficient, or to hold itself accountable for its footprint.

Emerging Regulatory Frameworks

The EU AI Act, formally adopted in 2024, includes provisions requiring high-risk AI systems to document their energy consumption in technical documentation submitted to regulators. While the implementing regulations are still being developed, the Act establishes energy use as a required disclosure element for systems above a defined risk threshold.

The EU Corporate Sustainability Reporting Directive (CSRD), effective for large companies from fiscal year 2024, requires detailed reporting under the European Sustainability Reporting Standards (ESRS). ESRS E1 on climate change explicitly requires disclosure of energy consumption by source, Scope 1–3 emissions, and targets — all of which encompass data center and AI workload energy.

In the United States, the AI Act of 2024 Executive Order on AI directed the Department of Energy to develop methodologies for assessing AI energy and water consumption across federal agencies. The National AI Initiative also published a voluntary framework for AI sustainability reporting, though it lacks enforcement mechanisms.

Scope 3 and Supply Chain Emissions

For organizations using third-party AI APIs (OpenAI, Anthropic, Google, Microsoft), the AI-related emissions fall under Scope 3 — indirect emissions from purchased goods and services. Scope 3 accounting is the most contested and least standardized domain of corporate carbon reporting. The GHG Protocol's Scope 3 Technical Guidance recommends that companies include purchased cloud computing emissions, but few AI providers currently publish the granular data needed to calculate this accurately.

Amazon Web Services, Google Cloud, and Microsoft Azure have all developed carbon footprint reporting tools for enterprise customers — but these tools rely on average fleet carbon intensities rather than workload-specific measurements, which tends to underestimate AI-heavy workloads' actual footprint. Advocates including the Green Software Foundation have called for API-level energy reporting as a standard feature of cloud AI services.

What Good Disclosure Looks Like

A credible AI sustainability disclosure should include: (1) Total kWh consumed by AI workloads (training + inference separately); (2) Carbon intensity of electricity source (grid average or specific PPA); (3) Water consumption for cooling; (4) Hardware utilization rates; (5) Comparison of model-size choices and alternatives considered; (6) Year-over-year trends. The Green Software Foundation's Software Carbon Intensity (SCI) specification provides a standardized formula: SCI = (E × I) + M per R, where E is energy, I is grid carbon intensity, M is embodied hardware emissions, and R is functional unit (query, user, transaction).

Scope 1 / 2 / 3 EmissionsGHG Protocol categories: Scope 1 = direct emissions, Scope 2 = purchased electricity, Scope 3 = all other indirect emissions including supply chain and purchased services.

SCISoftware Carbon Intensity — a Green Software Foundation standard for measuring the carbon impact of software per functional unit of work, accounting for energy, grid intensity, and hardware embodied carbon.

CSRDEU Corporate Sustainability Reporting Directive — requires large EU companies to disclose detailed sustainability metrics including energy consumption under ESRS standards from 2024.

CodeCarbonOpen-source Python library for measuring the energy consumption and carbon footprint of computational workloads, developed by Mila and Université de Montréal.

Lesson 4 Quiz

Measurement, Disclosure & Governance — 4 questions

1. What did analysis of NeurIPS 2022 accepted papers find about AI energy reporting?

Correct. The analysis found fewer than 5% of NeurIPS 2022 papers disclosed any training energy data — a disclosure gap researchers called a "reproducibility crisis for sustainability."

The NeurIPS 2022 analysis found that fewer than 5% of accepted papers disclosed any energy figure for training. This near-total absence of measurement and reporting means the ML research community has essentially no cumulative data on the field's actual energy trajectory.

2. Under the GHG Protocol, in which "Scope" do emissions from using third-party AI APIs fall?

Correct. Emissions from purchasing AI API services (OpenAI, Anthropic, Google, etc.) are Scope 3 — indirect emissions embedded in purchased services. This is the hardest scope to measure and the least standardized in reporting frameworks.

Third-party AI API usage falls under Scope 3 — indirect emissions from purchased goods and services. This is significant because most organizations using AI APIs have essentially no visibility into the actual energy consumed to serve their requests, and providers don't currently publish the granular data needed to calculate this accurately.

3. What does the Green Software Foundation's Software Carbon Intensity (SCI) formula measure?

Correct. SCI = (E × I) + M per R — energy times grid carbon intensity, plus embodied hardware emissions, divided by a functional unit (per query, per user, per transaction). This enables apples-to-apples comparison across different systems and deployment scales.

The SCI formula is: (E × I) + M per R — where E is energy, I is grid carbon intensity, M is embodied hardware emissions, and R is the functional unit (query, user, etc.). It measures carbon per unit of useful work, enabling meaningful comparisons between systems of different scales and architectures.

4. What does the EU AI Act (2024) require regarding energy consumption for high-risk AI systems?

Correct. The EU AI Act includes provisions requiring high-risk AI systems to document their energy consumption in technical documentation filed with regulators — establishing energy as a required compliance element, not merely voluntary disclosure.

The EU AI Act, adopted in 2024, includes provisions requiring documentation of energy consumption for high-risk AI systems in technical filings to regulators. This makes energy use a formal compliance element — not a voluntary sustainability initiative — for the highest-risk AI applications in the EU market.

Lab 4: Building an AI Sustainability Report

Interactive discussion — minimum 3 exchanges to complete

Your Mission

A publicly traded retail company has just become subject to EU CSRD requirements. Their legal and sustainability teams have asked you — as their AI governance advisor — to help design a methodology for measuring and disclosing the carbon footprint of their three deployed AI systems: a demand forecasting model, a product recommendation engine, and a customer service chatbot. They have no existing measurement infrastructure.

Start here: "We have three AI systems in production and we have to start disclosing their environmental impact for CSRD. We've never measured any of this. Where do we even begin, and what data do we need to collect?"

AI Sustainability Reporting Advisor

Lab 4

Welcome to Lab 4. I'm your AI sustainability reporting advisor. This lab focuses on building practical measurement and disclosure frameworks for organizations that need to report AI energy consumption under CSRD, SEC climate rules, or voluntary standards like the Green Software Foundation's SCI. Your scenario — starting from zero with existing production systems — is exactly what most organizations face. Let's build a measurement approach together. What would you like to tackle first?

Module 6 Test

Building for Sustainability — 15 questions · Pass threshold: 80%

1. Which document disclosed that Google's greenhouse gas emissions rose 48% between 2019 and 2023, and what was the primary cause?

Correct.

Google's 2024 Environmental Report disclosed the 48% increase and attributed it primarily to AI data center electricity demand.

2. According to the 2019 Strubell et al. paper, training a single large transformer NLP model could produce roughly how much CO₂ equivalent?

Correct.

Strubell et al. estimated approximately 284 tonnes CO₂e — comparable to five average American cars over their full lifetimes.

3. What is the Jevons Paradox, and why is it relevant to AI hardware efficiency improvements?

Correct.

The Jevons Paradox holds that efficiency gains lower per-unit costs, which historically increases demand enough to raise total consumption — a major concern for whether AI hardware efficiency will actually reduce the sector's total energy use.

4. What distinguishes a Power Purchase Agreement (PPA) from a Renewable Energy Certificate (REC) in terms of climate impact?

Correct.

PPAs fund new renewable capacity (additionality) by contracting directly with developers. RECs are certificates that can be purchased from already-operating projects — a financial claim without necessarily driving new clean generation.

5. Google's 2024 report showed its Singapore data centers achieved what 24/7 CFE percentage, and why?

Correct.

Google's Singapore operations achieved only 4% CFE — among the lowest globally — because Singapore's grid runs predominantly on natural gas with very limited renewable generation.

6. What was distinctive about Microsoft's September 2023 energy agreement with Constellation Energy?

Correct.

Microsoft contracted for 20 years of nuclear power from the restarted Three Mile Island Unit 1 in Pennsylvania — specifically to provide dispatchable, always-on clean electricity for AI data center loads.

7. The Chinchilla paper (DeepMind, 2022) compared Chinchilla (70B parameters) to Gopher (280B parameters). What did it find?

Correct.

Chinchilla (70B parameters, trained on 1.4 trillion tokens) outperformed Gopher (280B parameters) on nearly all benchmarks — demonstrating that training data volume matters as much as parameter count, and that optimal models can be far smaller than the largest ones.

8. What does quantization from FP32 to INT8 primarily achieve in a deployed AI model?

Correct.

INT8 quantization converts 32-bit floating-point weights to 8-bit integers, roughly halving memory footprint and enabling faster matrix operations — with minimal accuracy loss for most practical tasks.

9. What is knowledge distillation, and what real-world example demonstrated its effectiveness?

Correct.

Knowledge distillation trains a small "student" model to replicate a large "teacher" model's behavior. DistilBERT (Google, 2019) is the landmark example: 97% of BERT's benchmark performance at 40% fewer parameters and 60% faster inference.

10. According to the 2023 Hugging Face / CMU efficiency study, what is the energy benefit of right-sizing models for classification tasks?

Correct.

The study found 1–7B models performed comparably to 70B+ models on most enterprise classification tasks while consuming 10–50× less energy per inference — making right-sizing one of the highest-leverage sustainability interventions available.

11. What tool did Mila and Université de Montréal develop to measure Python code's carbon footprint?

Correct.

CodeCarbon is the open-source Python library developed by Mila (Montréal Institute for Learning Algorithms) and Université de Montréal for measuring energy consumption and CO₂ equivalent of computational workloads.

12. Under the EU AI Act (2024), what must developers of high-risk AI systems document regarding energy?

Correct.

The EU AI Act requires that energy consumption be documented in technical documentation submitted to regulators for high-risk AI systems — making energy use a formal compliance requirement.

13. What is the Green Software Foundation's SCI formula, and what does each component represent?

Correct.

SCI = (E × I) + M per R: energy times grid carbon intensity, plus embodied hardware emissions, all divided by a functional unit (per query, user, transaction). This gives a comparable carbon-per-unit-of-work metric across different systems.

14. Why do cloud provider carbon footprint tools tend to underestimate AI-heavy workloads' actual environmental impact?

Correct.

Cloud carbon tools typically use fleet-average carbon intensities across all workloads. AI training and inference are far more compute-intensive than average cloud workloads, so averaging across all server types substantially underestimates the actual footprint of AI-specific operations.

15. A company wants to claim "100% renewable energy." Which approach provides the weakest actual climate contribution?

Correct.

Purchasing RECs from an already-operational legacy hydro dam provides no additionality — the dam would have generated the same power regardless. It is a financial claim of renewable matching without any actual new clean energy entering the grid.