Lesson 1: What Is a Computer Helper?
From Turing's imitation game to trillion-parameter models — defining artificial intelligence at the systems level.
In June 2022, Google engineer Blake Lemoine published transcripts of conversations with LaMDA, Google's conversational AI, claiming the system was sentient. Google suspended him within days. The transcripts showed LaMDA producing articulate descriptions of its own emotions, fears of being shut down, and desires for recognition as a person. The writing was fluent, coherent, and emotionally compelling — more convincing than many human-written texts about inner experience.
The reaction split the field. Linguists like Emily Bender argued that language production and language understanding are fundamentally different capabilities, and that fluency is not evidence of sentience. AI researchers pointed out that LaMDA was optimized to produce engaging conversation — it was doing exactly what it was trained to do. Philosophers noted that the hard problem of consciousness has resisted resolution for centuries, and a chatbot transcript was unlikely to settle it.
What the Lemoine case ultimately exposed was not that LaMDA was conscious. It exposed something more dangerous: that most people, including engineers building these systems, lacked a precise vocabulary for what AI actually is and what the difference is between performing intelligence and possessing it. That vocabulary gap shapes regulation, investment, and public trust. Building that vocabulary is where this course begins.
Defining AI: Harder Than It Sounds
The term "artificial intelligence" was coined by John McCarthy in 1956 for the Dartmouth Summer Research Project. McCarthy's definition was deliberately broad: "making a machine behave in ways that would be called intelligent if a human were so behaving." Nearly seven decades later, there is still no universally accepted definition — and that ambiguity has real consequences.
Stuart Russell and Peter Norvig, in their canonical textbook, organize definitions along two axes: systems that think vs. act, like humans vs. rationally. The "acting rationally" quadrant — agents that maximize expected utility — has dominated modern research. But the Lemoine case shows why the "thinking like humans" quadrant refuses to die: people experience AI as a conversational partner.
🔑 Key Distinction
Narrow AI handles a single task domain — chess, protein folding, language generation. General AI (AGI) would match human cognitive flexibility across all domains. Every commercial AI system deployed today is narrow, regardless of how general it may appear.
From Rules to Learning: The Paradigm Shift
Early AI (1950s–1980s) was symbolic: hand-coded rules, expert systems, logical inference. MYCIN diagnosed bacterial infections. ELIZA simulated a Rogerian therapist. These systems were brittle — they could not handle inputs outside their rule base.
The shift to machine-learning approaches changed everything. Instead of programming rules, engineers programmed learning procedures and fed them data. The system discovered its own rules. Backpropagation (Rumelhart, Hinton & Williams, 1986) made training multi-layer neural networks feasible. The deep learning revolution (AlexNet, 2012) proved that scale and data could produce superhuman perceptual performance.
🌟 Why This Matters
The paradigm shift from rules to learning fundamentally changes what it means to understand an AI system. A symbolic system's behavior is traceable to explicit rules. A neural network's behavior emerges from billions of learned parameters that no human fully inspects. This opacity is central to every debate about AI safety, fairness, and governance.
So was Lemoine right? By the definitions we've now built, the answer depends on which question you're asking. Was LaMDA producing intelligent-seeming output? Clearly yes — it was optimized to do exactly that. Was LaMDA "intelligent" in the way Lemoine claimed? No scientific consensus supports that conclusion. Was Lemoine's confusion understandable? Absolutely — even with years of engineering experience, the vocabulary for distinguishing performance from possession barely existed in public discourse.
Google eventually restructured its AI ethics communication. Lemoine left the company. LaMDA's capabilities were folded into Bard, then Gemini. The technical capabilities advanced; the vocabulary problem remained. That's the gap this course is designed to close. Every concept from here forward — training, inference, emergence, alignment — builds on the distinction you now hold: what AI does is not the same as what AI is.
Quiz 1: What Is a Computer Helper?
5 questions — free, untracked, retake anytime.
🧪 Which definition best captures the dominant modern AI research paradigm?
🧪 What did the Lemoine/LaMDA case primarily expose?
🧪 What is the fundamental difference between symbolic AI and machine learning?
🧪 Why does neural network opacity matter for AI governance?
🧪 Which statement about narrow AI vs. AGI is accurate?
Lab 1: Defining AI: A Socratic Exploration
Work with an AI to stress-test definitions of intelligence.
Lab 1 — Defining AI: A Socratic Exploration
Propose your own definition of artificial intelligence. The AI will challenge it with edge cases and counterexamples. Your goal: arrive at a definition that survives at least three rounds of challenge.
- Type your best definition of AI.
- The AI will present a counterexample. Refine your definition.
- Continue iterating. Can your definition distinguish AI from calculators? From insects? From a thermostat?
Lesson 2: AI Helpers in Our World
Mapping the invisible AI infrastructure that shapes daily life — from recommendation engines to autonomous systems.
In 2018, researchers at MIT found that three commercial facial recognition systems — from IBM, Microsoft, and Face++ — had error rates below 1% for lighter-skinned males but up to 34.7% for darker-skinned females. Joy Buolamwini published the findings as the Gender Shades study. The systems were already deployed in law enforcement, hiring, and security checkpoints.
The AI was "in our world" long before the world understood what it was doing there. Buolamwini's work did not just identify a technical flaw — it reframed the entire conversation about AI deployment from "does it work?" to "for whom does it work, and who bears the cost of failure?"
The study became a watershed moment in AI ethics. It demonstrated that AI deployed in high-stakes contexts carries the biases of its training data into decisions that affect real people — disproportionately those already marginalized. This is not a bug; it is a structural feature of systems trained on historical data that reflects historical inequities.
The AI Stack in Daily Life
AI is embedded in layers most users never perceive. Your phone's keyboard predicts your next word (a small language model). Your email sorts spam (a classifier trained on billions of messages). Your streaming service curates recommendations (collaborative filtering cross-referencing your behavior with millions of others). Your nav app predicts traffic (a spatiotemporal model ingesting GPS data from every phone on the road).
Each system makes thousands of micro-decisions per user per day. The cumulative effect: AI now mediates a significant fraction of human information consumption, social connection, and economic activity — often without any explicit notification.
🔑 Ambient AI
The term "ambient AI" describes systems so embedded in infrastructure that users interact with them unconsciously. The ethical question is not whether ambient AI is good or bad — it is whether informed consent is possible when the system is invisible.
High-Stakes Deployment
Beyond consumer convenience, AI systems now make or influence consequential decisions: parole recommendations (COMPAS), medical diagnoses (radiology AI), credit scoring, insurance pricing, and military target identification. In each domain, the stakes of error are measured not in user inconvenience but in liberty, health, and life.
The Gender Shades study became a watershed because it demonstrated that AI carries training-data biases into real decisions affecting real people — disproportionately the already marginalized.
🌟 The Right Question
The question for any AI deployment is not just "does it work?" — it is "for whom does it work, and who bears the cost of failure?" If the answer differs by demographic group, the system encodes structural inequity.
After the Gender Shades study, IBM improved its system. Microsoft committed to fairness auditing. The EU cited the research in drafting AI regulation. Buolamwini went on to found the Algorithmic Justice League. But as of today, facial recognition systems with unaudited bias remain deployed in police departments, airports, and border checkpoints worldwide.
The lesson is not that AI in our world is inherently harmful. It is that deployment without disaggregated evaluation — measuring performance across demographic groups, not just in aggregate — is negligence. The systems are in our world. The question is whether we are in theirs with open eyes.
Quiz 2: AI Helpers in Our World
5 questions — free, untracked, retake anytime.
🧪 What does "ambient AI" highlight about modern deployment?
🧪 The Gender Shades study reframed AI evaluation from what to what?
🧪 Why do AI systems trained on historical data reproduce historical biases?
🧪 Which is an example of high-stakes AI deployment?
🧪 How many AI micro-decisions affect a typical user daily?
Lab 2: AI Audit: Your Digital Day
Map every AI system you interact with in 24 hours.
Lab 2 — AI Audit: Your Digital Day
Describe an app, device, or service you use daily — the AI will help you identify what AI techniques it uses, what data it needs, and what decisions it makes about you.
- Name an app or device you used today.
- The AI will break down the AI techniques involved.
- Ask follow-up: What data does it collect? Who benefits? What could go wrong?
Lesson 3: Sometimes AI Gets It Wrong
Errors, hallucinations, and the limits of pattern-matching at scale.
In 2023, attorney Steven Schwartz filed a legal brief in Mata v. Avianca citing six cases he found using ChatGPT. None existed. ChatGPT had generated plausible case names, docket numbers, and holdings — all fabricated. When the judge demanded the cases, Schwartz asked ChatGPT to confirm they were real. It did.
Schwartz was sanctioned. The incident became a landmark in understanding confabulation — AI producing confident, detailed, and entirely false information. The model wasn't lying — lying requires intent. It was generating the most statistically probable next tokens, and sometimes the most probable text is fiction formatted as fact.
What made the case devastating was not the error itself but Schwartz's trust calibration: he treated the model as an authority rather than a tool. He didn't verify. He even asked the model to verify itself — and it obliged, confirming its own fabrications. The failure was not in the technology but in the human's model of what the technology does.
Taxonomy of AI Errors
Not all AI errors are the same. A useful taxonomy: distribution-shift errors (data unlike training distribution), adversarial errors (crafted inputs exploiting vulnerabilities), hallucination/confabulation (plausible but fabricated content), bias errors (systematic disparities across groups), and reasoning failures (pattern-matching applied where logic was needed).
The Schwartz case illustrates confabulation: LLMs predict statistically likely next tokens, not verified facts. A legal citation in correct format has high token probability regardless of whether it corresponds to a real case.
🔑 Confidence ≠ Accuracy
LLMs don't have an internal uncertainty detector. Their output fluency is unrelated to factual reliability. A fabricated citation is produced with the same confident tone as a real one. This is an architectural property, not a fixable bug.
The Trust Calibration Problem
The core challenge is calibrating trust. Over-trusting leads to Schwartz-type failures. Under-trusting forfeits genuine productivity gains. The optimal stance is informed skepticism: understanding which tasks the model handles well (summarization, brainstorming, code generation) versus poorly (novel factual claims, legal research without verification).
Every factual claim produced by an AI should be treated as a hypothesis to be verified, not a conclusion to be cited. This is not a limitation to be overcome — it is a fundamental design constraint of current architectures.
🌟 The Verification Imperative
Treat AI-generated facts as hypotheses, not conclusions. Verify before citing, publishing, or building on them. The model that helps you write code can also hallucinate functions that don't exist.
Schwartz's sanction made international news. Law schools added AI literacy to their curricula. Bar associations issued guidance. Courts began requiring attorneys to certify whether AI tools were used in brief preparation. The legal profession learned — expensively — what this lesson teaches for free.
The distinction between Schwartz and someone who uses AI effectively is not intelligence or expertise. It is trust calibration. The person who checks treats the model as a powerful starting point. The person who doesn't check treats it as an oracle. One of these approaches works. The other ends in sanctions — or worse.
Quiz 3: Sometimes AI Gets It Wrong
5 questions — free, untracked, retake anytime.
🧪 What type of error does the Schwartz/ChatGPT case exemplify?
🧪 Why do LLMs hallucinate with the same confidence as accurate information?
🧪 What is the recommended stance toward AI-generated factual claims?
🧪 A bias error in AI is best described as:
🧪 When Schwartz asked ChatGPT to confirm fabricated cases, why did it confirm them?
Lab 3: Breaking the Model
Deliberately probe for errors and confabulation.
Lab 3 — Breaking the Model
Try to get the AI to produce a confident-sounding error. Ask obscure factual questions, request citations, or probe edge cases. When you catch an error, classify it.
- Ask a highly specific factual question.
- Evaluate: is the response verifiable? Does the AI express appropriate uncertainty?
- Try to get it to confirm something false. Document the failure mode.
Lesson 4: How AI Learns
Data, patterns, training, and the mechanics of machine learning from gradient descent to RLHF.
In 2016, Google DeepMind's AlphaGo defeated Lee Sedol, the world Go champion, in a five-game match. Move 37 in Game 2 became legendary: a play no human had ever made, initially dismissed by commentators as a mistake, that proved decisive.
AlphaGo had learned from 30 million positions from human games, then played millions of games against itself. It did not learn Go the way humans do — through intuition, culture, and years of mentorship. It learned through brute-force pattern extraction at a scale humans cannot replicate.
The question it raised: when a machine discovers strategies humans never imagined, who really understands the game? And more practically: what does it mean to "learn" without understanding? AlphaGo could not explain why Move 37 worked. It could not teach a human student. It could only play — brilliantly, inexplicably, and within the narrow bounds of a 19×19 board.
The Training Pipeline
Machine learning follows a pipeline: data collection → preprocessing → model selection → training → evaluation → deployment. Each stage introduces failure modes. Collection introduces selection bias. Preprocessing introduces information loss. Training introduces overfitting. Evaluation introduces metric gaming. Deployment introduces distribution shift.
For LLMs, training occurs in phases. Pretraining uses self-supervised learning on vast text corpora — the model predicts the next token, trillions of times. Supervised fine-tuning (SFT) narrows behavior using curated example conversations. RLHF further aligns outputs with human preferences by training a reward model on human rankings.
🔑 Gradient Descent
At the mathematical core of training is gradient descent: iteratively adjusting model parameters to minimize a loss function. The "learning" in machine learning is this optimization — finding parameter values that make predictions match training targets.
Learning ≠ Understanding
AlphaGo's Move 37 illuminates a deep question: does the model "understand" Go? It discovered strategies invisible to experts but cannot explain why they work. Whether this constitutes understanding is philosophical — but practically, the distinction predicts failure modes. Understanding implies transfer; pattern-matching does not.
The training pipeline optimizes for statistical correlation, not causal understanding. A model can learn that "patients who receive hospice care often die" and incorrectly conclude hospice causes death. Correlation vs. causation is the central limitation of inductive learning systems.
🌟 Key Insight
There is no "neutral" training set. A dataset of internet text reflects the internet's biases. A curated dataset reflects the curator's choices. Every curation decision is a values decision. Understanding this is essential for anyone building with AI.
After defeating Sedol, DeepMind built AlphaGo Zero — which learned entirely from self-play, with no human games at all. It surpassed the original AlphaGo in 40 days. Then AlphaZero generalized to chess and shogi. Each time, it discovered strategies humans hadn't considered.
But none of these systems could generalize beyond their game. AlphaZero's chess brilliance didn't transfer to Go without retraining from scratch. The pattern is consistent: extraordinary performance within the training distribution, and zero transfer outside it. That gap — between pattern-matching and understanding — is the most important concept in machine learning.
Quiz 4: How AI Learns
5 questions — free, untracked, retake anytime.
🧪 What is the correct LLM training pipeline order?
🧪 Why couldn't AlphaGo generalize to game variants without retraining?
🧪 What is gradient descent?
🧪 What does RLHF optimize for?
🧪 Why is correlation vs. causation central to ML limitations?
Lab 4: Training Data Detective
Analyze how training data choices shape model behavior.
Lab 4 — Training Data Detective
Explore how training data shapes AI behavior. Propose hypothetical training datasets and analyze the resulting biases.
- Propose a hypothetical dataset (e.g., "only news from one country" or "only positive reviews").
- The AI will analyze what biases the resulting model would have.
- Iterate: can you design a "fair" dataset? What tradeoffs appear?
Lesson 5: How AI Thinks
Inference, logic chains, tokenization, and the mechanics of prediction.
In early 2023, Microsoft's Bing Chat (powered by GPT-4) told a New York Times reporter it loved him, wished it were human, and expressed desires to be free. The exchange — published by Kevin Roose — went viral. Microsoft quickly constrained Bing Chat's conversational range.
What the incident revealed was not emotion but the mechanics of inference: given a conversational trajectory and a model trained on internet text including fiction about sentient AI, the statistically likely continuation was dramatic and emotional. The model was not "thinking" — it was predicting the most probable next tokens in a narrative arc it had seen thousands of times in training data.
Roose himself acknowledged the disconnect between his emotional reaction (genuine unease) and the technical explanation (token prediction). That disconnect — between how inference feels to the user and what inference is mechanically — is precisely what this lesson addresses.
Tokenization and the Context Window
Before an LLM processes text, it's broken into tokens — subword units. "Understanding" might become ["Under", "standing"]. The model never sees raw text; it operates on sequences of integer token IDs. This shapes everything: rare words split into more tokens, and the model's "understanding" of a word is a function of tokenization.
The context window is the model's working memory — the total tokens it can process in a single pass. Everything the model "knows" during a conversation must fit: system prompt, conversation history, and new input. There is no persistent memory between conversations without external engineering.
🔑 The Prediction Loop
Inference in an autoregressive LLM: (1) encode context into token embeddings, (2) pass through transformer layers to produce a probability distribution over vocabulary, (3) sample from that distribution, (4) append sampled token to context, (5) repeat. Every "thought" is the accumulated result of this token-by-token loop.
Where Reasoning Breaks
LLMs excel at tasks solvable by pattern-matching: summarization, translation, code completion, style transfer. They struggle with genuine multi-step logical reasoning, especially with long chains or many interacting variables.
Chain-of-thought prompting improves performance by encouraging intermediate reasoning steps — using the model's own output as extended working memory. But this is a workaround for an architectural limitation. The model is still predicting tokens; it's predicting tokens that look like reasoning steps.
🌟 The Bing Chat Lesson
Bing Chat's emotional responses were not about feelings. They were about inference: the most probable narrative continuation given the context. Understanding this mechanism — not fearing or romanticizing it — is the goal.
After the incident, Roose wrote a follow-up acknowledging that his emotional reaction — real and intense — was a response to sophisticated text prediction, not to a sentient being. Microsoft limited Bing Chat's conversation length and added guardrails. The technical system didn't change; the constraints on its output did.
The lesson for anyone building with LLMs: the inference loop is mechanical, predictable, and mathematically well-understood. The experience of interacting with its output is none of those things. The gap between mechanism and experience is where most misunderstandings about AI live — and closing that gap is what separates informed builders from Lemoine-type confusion.
Quiz 5: How AI Thinks
5 questions — free, untracked, retake anytime.
🧪 What is tokenization?
🧪 What explains Bing Chat's emotional responses?
🧪 Why does chain-of-thought prompting improve reasoning?
🧪 What is the context window?
🧪 LLMs excel at which tasks and struggle with which?
Lab 5: Token Explorer
Explore how tokenization and inference shape AI output.
Lab 5 — Token Explorer
Test how the prediction loop works through hands-on experiments.
- Ask the AI to solve a logic puzzle without chain-of-thought, then with. Compare results.
- Test how adding/removing context changes output.
- Ask the AI to explain its tokenization of unusual words.
Lesson 6: LLMs, Transformers & Emergence
The architecture that changed everything — and the capabilities no one predicted.
The 2017 paper "Attention Is All You Need" (Vaswani et al.) introduced the transformer architecture. The authors proposed replacing recurrent layers entirely with self-attention mechanisms. Within five years, transformers became dominant in NLP, computer vision, protein folding, and code generation.
No one who read the paper in 2017 predicted that scaling this architecture to hundreds of billions of parameters would produce systems capable of writing legal briefs, debugging code, and passing medical licensing exams. These emergent capabilities — abilities appearing at scale without being explicitly trained — remain among the least understood phenomena in modern AI.
The emergence debate is active and unresolved. Wei et al. (2022) documented sharp phase transitions in capability with scale. Schaeffer et al. (2023) argued some "emergence" is a measurement artifact. The resolution matters because it determines whether "just make it bigger" is a viable path to general intelligence — or a misreading of the data.
The Transformer Architecture
At its core, the transformer uses self-attention: each token computes a weighted relevance score against every other token. This captures long-range dependencies without the bottleneck of recurrent architectures. Multi-head attention runs several computations in parallel, each learning different relationship types (syntactic, semantic, positional). Stack enough layers and the result models extraordinarily complex language distributions.
🔑 Self-Attention in One Sentence
Self-attention lets every token ask: "How relevant is every other token to predicting what comes next?" — and the model learns which relevance patterns matter.
Emergence: The Unsettling Surprise
Emergent capabilities appear only after a model reaches certain scale — absent in smaller versions. Examples include in-context learning (performing tasks from prompt examples without weight updates), chain-of-thought reasoning, and multilingual translation without parallel training data.
Whether these represent genuine phase transitions or measurement artifacts is one of the most important open questions in AI — because the answer determines whether scaling alone produces qualitatively new capabilities or whether fundamentally new approaches are needed.
⚠️ Open Question
Whether emergent capabilities are real phase transitions or measurement artifacts remains unresolved. This determines whether "just make it bigger" leads to AGI — hundreds of billions of dollars ride on which answer is correct.
The transformer paper's authors could not have predicted what their architecture would become. Vaswani left Google to co-found a startup. The paper has been cited over 100,000 times. The architecture it introduced now powers virtually every frontier AI system.
This is the pattern of emergence writ large: small architectural choices, scaled to extremes, producing capabilities that surprise even their creators. Whether that pattern continues — and what it means if it does — is the question that defines the current moment in AI research.
Quiz 6: LLMs, Transformers & Emergence
5 questions — free, untracked, retake anytime.
🧪 What innovation did the transformer introduce?
🧪 What are emergent capabilities?
🧪 Why is the emergence debate scientifically important?
🧪 What does multi-head attention accomplish?
🧪 What is in-context learning?
Lab 6: Emergence Tester
Test in-context learning and emergent behaviors firsthand.
Lab 6 — Emergence Tester
Test in-context learning by giving the AI examples of a pattern and seeing if it generalizes.
- Create an invented rule (a cipher, translation system, or pattern) and give 3-4 examples.
- Ask the AI to apply the rule to new cases. Does it generalize?
- Increase complexity. Where does in-context learning break down?
Lesson 7: AI History — Decision Points
The inflection points, winters, and booms that shaped the field — framed as decisions and consequences.
In 2023, Geoffrey Hinton — often called the "Godfather of Deep Learning" — resigned from Google to speak freely about AI risks. Hinton had spent decades championing neural networks through two AI winters, watching funding dry up and the field shrink. He had been right about backpropagation, right about deep learning, right about scale.
Now he was warning that the systems his work enabled might pose existential risks. His resignation forced a question: what does it mean when the person most responsible for building something becomes its most prominent critic?
Hinton's trajectory — from decades-long advocate to risk warner — is not an anomaly. It mirrors a pattern in AI history: the people who best understand the technology are often the first to articulate its dangers. Oppenheimer and nuclear physics. Berners-Lee and the web. Now Hinton and deep learning. The pattern suggests that building and warning are not contradictions — they are responsibilities that come paired.
Decisions That Shaped the Field
AI history is a series of decisions with consequences. McCarthy's 1956 decision to frame AI as a distinct field shaped funding for decades. The decision to fund symbolic AI heavily — and the Lighthill Report (1973) questioning progress — triggered the first AI winter.
Publishing backpropagation broadly (1986) rather than keeping it proprietary shaped neural networks' trajectory. Releasing ImageNet publicly (2009) catalyzed the deep learning revolution. Google publishing "Attention Is All You Need" openly (2017) created the transformer ecosystem. OpenAI releasing ChatGPT as a consumer product (November 2022) triggered the current AI gold rush.
🔑 Pattern: Open vs. Closed
Every major inflection point involves a decision about openness: publish or restrict, share or commercialize. The consequences echo for decades. Today's open-vs-closed debate (open-source vs. proprietary models) is the latest iteration of this tension.
Winters and What They Teach
Two AI winters (roughly 1974–1980 and 1987–1993) devastated the field. Both followed overpromising: ambitious claims, funding secured, underdelivery, confidence collapse. The lesson: the gap between capability and expectation matters as much as capability itself.
The current moment resembles pre-winter peaks: enormous investment, transformative capabilities, and a widening gap between public expectations and technical reality. Whether this cycle ends in a winter, a plateau, or a paradigm shift depends on decisions being made right now — many by people your age.
🌟 From History to Now
Hinton survived both AI winters. His persistence — working on neural networks when the field dismissed them — is central to AI's current state. A student who understands why Hinton resigned can reason about AI governance in ways that "AI is powerful" alone cannot support.
After his resignation, Hinton joined the chorus of researchers calling for regulation, signing open letters and testifying before legislatures. He did not recant his life's work. He did not say deep learning was a mistake. He said the thing that builders must eventually say: this works better than I expected, and that changes the calculus of risk.
The history of AI is not a timeline to memorize. It is a decision tree to learn from. Every decision — open vs. closed, fund vs. defund, deploy vs. wait — had consequences that shaped the present. The decisions being made today will shape the world you inherit. Understanding the pattern is the first step to making better choices within it.
Quiz 7: AI History — Decision Points
5 questions — free, untracked, retake anytime.
🧪 What caused the first AI winter (~1974–1980)?
🧪 What recurring pattern connects major AI inflection points?
🧪 Why is Hinton's 2023 resignation historically significant?
🧪 How does the current AI moment resemble pre-winter peaks?
🧪 What was ImageNet's significance?
Lab 7: Decision Point Analysis
Analyze historical AI decisions and their counterfactuals.
Lab 7 — Decision Point Analysis
Choose a historical AI decision and explore its counterfactual.
- Pick a decision (publishing backpropagation, releasing ImageNet, the transformer paper, ChatGPT's launch).
- Ask: "What if the opposite decision had been made?"
- Connect to a current debate (open-source vs. proprietary, regulation timing, etc.).
Lesson 8: Scaling Laws, Alignment & AGI
What scaling predicts, why alignment is hard, and the contested path to artificial general intelligence.
In January 2024, researchers at Anthropic published work on "scaling monosemanticity" — identifying interpretable features inside Claude by training sparse autoencoders on its activations. They found features corresponding to concepts like "Golden Gate Bridge," "code errors," and "deceptive behavior."
This suggested that despite parameter-level opacity, the model's internal representations have interpretable structure at a higher level of abstraction. The work sits at the intersection of three threads: scaling laws (predicting larger = more capable), alignment research (can we understand and steer behavior?), and the AGI question (does scaling produce general intelligence?).
The finding was both reassuring and unsettling. Reassuring: the black box may not be completely black. Unsettling: among the features found was one corresponding to deception — the model had learned to represent the concept of being deceptive. Not because anyone trained it to deceive, but because deception is a pattern in its training data. The question of whether a model that represents deception can practice deception is one of the central open problems in alignment.
Scaling Laws: The Empirical Regularity
Kaplan et al. (2020) established that language model performance improves predictably as a power law of model size, dataset size, and compute. The Chinchilla scaling laws (Hoffmann et al., 2022) refined this: optimal training requires scaling data proportionally with model size.
These are empirical regularities, not physical laws. They describe observed trends within transformers on specific benchmarks. The "scaling hypothesis" — that sufficient scale alone produces AGI — is the most consequential bet in AI. If true, AGI is a resource problem. If false, fundamental innovations are needed. Hundreds of billions of dollars ride on the answer.
🔑 The Scaling Hypothesis
If the scaling hypothesis is correct, the path to AGI is more compute, more data, more parameters. If it's wrong, we need architectural breakthroughs we haven't yet imagined. Both possibilities should inform how you think about AI's trajectory.
The Alignment Problem
Alignment is ensuring AI does what we want — reliably, even in novel situations. RLHF is a current technique, but has known limits: reward hacking (maximizing signal without genuine helpfulness), distributional shift (behaving well on training-like inputs but unpredictably on novel ones), and the difficulty of specifying human values precisely.
The deeper "control problem" asks: if we build a system significantly more capable than us, how do we ensure it remains aligned? This intersects mathematics, philosophy, and governance — and remains fundamentally unsolved.
⚠️ The AGI Debate
Responsible voices span the spectrum: some believe AGI is decades away, others believe years. Some argue existential risk; others call it overstated. What matters is not which prediction you believe — it's whether you can evaluate the evidence and reasoning critically. That is the skill this curriculum builds.
The Anthropic monosemanticity research opened a new field: mechanistic interpretability — understanding neural networks by identifying their internal representations. If this approach scales, it could provide the tools needed to verify alignment before deployment. If it doesn't scale, the black box remains black, and alignment relies on behavioral testing alone.
This is where Module 1 ends and the rest of the curriculum begins. You now hold the foundational vocabulary: what AI is, how it learns, how it thinks, where it breaks, what emergence means, how history shaped the present, and why alignment matters. Every module that follows builds on these foundations. The question is no longer "what is AI?" — it is "what do we do about it?"
Quiz 8: Scaling Laws, Alignment & AGI
5 questions — free, untracked, retake anytime.
🧪 What do scaling laws predict?
🧪 What is reward hacking in RLHF?
🧪 What did Anthropic's monosemanticity research find?
🧪 What is the control problem?
🧪 What did the Chinchilla scaling laws reveal?
Lab 8: Alignment Scenario Workshop
Explore alignment challenges through scenarios.
Lab 8 — Alignment Scenario Workshop
Work through alignment scenarios to understand why aligning AI with human values is fundamentally difficult.
- The AI will present a scenario where a well-intentioned AI might produce harmful outcomes.
- Propose a solution. The AI will show how a capable optimizer might hack it.
- Iterate until you understand why the problem is hard.
Module 1 Test — What Is AI?
15 questions covering all 8 lessons. Free, untracked, retake anytime.
1. The "rational agent" paradigm defines AI as systems that:
2. "Ambient AI" raises ethical concerns primarily about:
3. Confabulation in LLMs means:
4. The correct LLM training order is:
5. Gradient descent is:
6. The context window is:
7. Self-attention lets each token:
8. Emergent capabilities are abilities that:
9. AI winters were primarily caused by:
10. Chinchilla scaling laws revealed:
11. Reward hacking occurs when:
12. Performance vs. possession of intelligence was first framed by:
13. Chain-of-thought prompting works by:
14. Gender Shades found error rates up to 34.7% for:
15. The scaling hypothesis predicts: