What's Coming Next — Final Exam

1. Test set contamination is a particular concern for large language models because:

Correct. Internet-scale training data routinely includes academic papers, study guides, and Q&A forums that overlap with common benchmarks. Shi et al. (2023) found significant contamination in standard training corpora for several prominent benchmarks. This is an unsolved methodological problem for the field.

The contamination concern is specifically about training data overlap with test sets. Web-crawled training data is vast and includes academic content where benchmarks and their solutions appear. Unlike smaller models trained on curated data, LLMs' exposure is difficult to fully audit or rule out.

2. The "Bletchley Declaration" was signed in November 2023. Which of the following was diplomatically significant about it?

Correct. The U.S.–China co-signature on a document acknowledging frontier AI risks was the diplomatic headline of the Bletchley Summit, demonstrating that shared existential concerns can create limited cooperation even between rivals.

The key diplomatic achievement was U.S.–China co-signature — not a treaty, not universal participation, and not mandatory requirements — showing that great-power rivals can find common ground on AI safety framing.

3. What did the LLaMA leak (February 2023) and Meta's subsequent Llama 2 release (July 2023) demonstrate about the role of peer review in AI?

Correct. The lesson notes that LLaMA's absence of formal peer review didn't prevent its influence — but community evaluation and independent testing served as a de facto review process, validating (or challenging) claims through widespread practical use.

The lesson notes that widely-deployed preprints like LLaMA can receive de facto peer review through community evaluation and independent testing, even without formal journal review.

4. Three questions to ask before inputting data into an AI tool are: "Who owns this data?", "Where does it go?", and:

Correct. The three questions are: Who owns this data? Where does it go? Could this harm anyone? — covering ownership, storage/use terms, and third-party consent.

The three data-handling questions from the lesson are: Who owns this data? Where does it go? Could this harm anyone? — the last covering whether the data involves people who haven't consented to third-party sharing.

5. Which of the five questions in the signal-vs-noise filter asks about who funded the research?

Correct. "Who benefits?" is the fifth question in the filter, asking about funding and incentive alignment. It doesn't invalidate findings, but it calibrates confidence — especially important given that industry now produces more AI research by volume than academia.

The "Who benefits?" question covers funding, incentives, and who stands to gain from the framing — the fifth question in the filter.

6. Which of the following best describes how multimodal AI processes different input types?

Correct. The transformer architecture generalizes: any input type can be tokenized and processed through the same attention mechanism in a single model.

Not quite. Modern multimodal models use a single transformer that tokenizes all input types — they don't require separate specialist models.

7. The Stanford AI Index 2024 (released April 2024) found which nuanced conclusion about AI performance?

Correct. The Stanford AI Index 2024 found AI exceeded human performance on several narrow benchmarks but remained substantially below on complex reasoning — a nuance missing from most general coverage of AI capabilities.

The Stanford AI Index 2024 found AI had surpassed narrow benchmarks but remained well below human level on complex reasoning tasks — a crucial distinction lost in most headlines.

8. What is the "Chip 4 Alliance"?

Correct. The Chip 4 Alliance brings together the four key nodes of advanced semiconductor design and manufacturing — the U.S. (design/equipment), Japan (equipment/materials), South Korea (DRAM/logic fabs), and Taiwan (leading-edge logic fabs) — to coordinate export and supply chain policy vis-à-vis China.

Chip 4 refers to the U.S., Japan, South Korea, and Taiwan — the four critical nodes of the advanced semiconductor supply chain — coordinating export controls and supply chain resilience to manage China's access to leading-edge chips.

9. Which newsletter has been published weekly since 2016 and is specifically recommended as a "minimum viable stack" option?

Correct. Import AI by Jack Clark, published weekly since 2016, is specifically recommended as the minimum viable stack option — described as "genuinely excellent" and achievable in the 30-minute weekly commitment.

Import AI by Jack Clark — weekly since 2016 — is the specific minimum viable stack recommendation in the lesson.

10. What distinguishes a primary source from a secondary source in the context of AI information?

Correct. The distinction is about originality — primary sources are the original documents, secondary sources interpret them. Most people's AI knowledge is entirely secondary or tertiary.

The distinction is about originality: primary sources are original documents (papers, reports, filings, datasets), while secondary sources are someone's interpretation of those documents.

11. Anthropic's "computer use" feature, released in public beta in October 2024, expanded agentic reach because it allowed Claude to do what?

Correct. Computer use moved the boundary from "anything with an API" to "anything a human can do on a screen" — a qualitatively different capability scope.

Computer use specifically meant controlling GUIs — clicking, typing, navigating — which expanded reach beyond purpose-built APIs to all desktop software.

12. What did a UN Panel of Experts report note about the 2020 Nagorno-Karabakh conflict?

Correct. The UN Panel of Experts report is widely cited as documenting one of the first confirmed uses of potentially autonomous lethal drone action against humans — a landmark in LAWS history.

The UN report noted Kargu-2 loitering munitions operating in autonomous mode — a significant threshold moment for lethal autonomous weapons in actual conflict.

13. What was Google's response to its involvement in Project Maven, and what broader significance did it have?

Correct. The Maven episode is a landmark in tech-industry ethics: employee protest — not legal obligation — drove Google's withdrawal, signaling that AI workforce values could meaningfully constrain corporate military contracting decisions.

Employee action — not legal or regulatory pressure — drove Google's Maven withdrawal. More than 4,000 staff signed a petition, and Google declined renewal in 2018. This made Maven a landmark case in AI ethics and corporate accountability.

14. Gemini 1.5 Pro's 1-million-token context window can hold approximately:

Correct. Gemini 1.5 Pro's million-token context was demonstrated to hold 11 hours of audio, a feature film, or 30,000 lines of code — enabling whole-document reasoning without chunking.

Not quite. One million tokens represents approximately 11 hours of audio, 30,000 lines of code, or an entire feature film — enabling reasoning across truly large multimodal inputs in one context.

15. Which stage of the research pipeline involves industrial labs testing whether an idea improves reliably with more compute and data?

Correct. Scaling Experiments is where ideas are tested at increasing compute and data — and where most fail.

The Scaling Experiments stage is where industrial labs test whether ideas improve reliably with more resources. Most ideas fail here.

16. Nvidia's data center revenue grew approximately how much between Q1 2023 and Q1 2024?

Correct. $3.6B to $18.4B — roughly 5× in 12 months.

Nvidia data center revenue grew approximately 5×: from $3.6B in Q1 2023 to $18.4B in Q1 2024.

17. What is the maximum fine the EU AI Act can impose for violations involving prohibited AI practices?

Correct. The highest tier of EU AI Act fines — for violations of prohibited practices — is €35 million or 7% of global annual turnover, whichever is higher. This mirrors the upper tier of GDPR enforcement.

The EU AI Act's maximum penalty for prohibited AI practices is €35 million or 7% of global annual turnover — the highest tier, designed to deter the most serious violations.

18. What was "Papers With Code" cited as a useful proxy for in Lesson 2?

Correct. Papers With Code tracks citation counts and associated code repositories — both are useful proxies for which results the research community finds credible enough to cite and attempt to replicate.

Papers With Code tracks citations and code repositories — useful proxies for which results others found credible and worth replicating.

19. SWE-bench measures AI agents' ability to do what, and why is the metric meaningful?

Correct. Real GitHub issues with real test suites — not toy problems. This makes SWE-bench one of the most rigorous measures of practical agentic software engineering capability.

SWE-bench uses real GitHub bug reports and the repository's own test suite — objective, real-world criteria that make it a rigorous capability benchmark.

20. Why did Amazon build EFA (Elastic Fabric Adapter) rather than simply using standard InfiniBand for its AI training clusters?

Correct. EFA provides RDMA-like performance over Ethernet, reducing AWS's dependence on Nvidia/Mellanox InfiniBand hardware — a strategic independence play.

EFA was built on standard Ethernet infrastructure to reduce dependency on Mellanox (now Nvidia) InfiniBand — while still delivering RDMA performance for GPU training workloads.