← Back to Academy
Module 1 · How AI Thinks — Basic | AESOP AI Academy Module 4
Color
Basic
Module Test
Lesson 1

Patterns and Predictions

Token prediction, probability distributions, and how language models generate text.

When GPT-3 was released in 2020, researchers were startled by its ability to complete complex passages, write code, and compose poetry — all from a single underlying mechanism: predicting the next token. The model had no explicit rules about grammar, style, or meaning. It had only learned statistical patterns from 570GB of text. That a single prediction mechanism could produce outputs that appeared so varied and capable surprised even its creators.

Tokens and Probability

Language models don't process words — they process tokens, which are chunks of text (roughly word fragments or words). For each position in a sequence, the model produces a probability distribution over its entire vocabulary: a score for every possible next token, representing how likely it is given the preceding context.

  • Temperature: A parameter that controls how random the selection is. Low temperature = picks the highest-probability token almost always. High temperature = more varied, creative, sometimes incoherent.
  • Context window: The maximum number of tokens the model can "see" at once. Text outside this window is invisible to the model.
  • Autoregressive generation: Each token is generated one at a time, conditioned on all previous tokens.
Why This Matters

Understanding token prediction explains many AI behaviors: why outputs become incoherent at high temperature, why models lose track of earlier context, and why the same prompt can produce different outputs on different runs.

Key Insight

Every AI language output — whether a poem, a legal argument, or a medical summary — is the result of the same underlying process: token-by-token probability prediction. There is no separate "reasoning" layer.

Quiz 1

Patterns and Predictions

4 questions — free, untracked, retake anytime.

is a 'token' in a language model?

✓ Correct — ✅ ✓ Tokens are the units language models process — roughly word fragments or words. The model produces a probability over all possible next tokens at each step.
❌ ❌ In language models, tokens are chunks of text (roughly word fragments or words) — the units the model reads and generates.

does 'temperature' control in language model generation?

✓ Correct — ✅ ✓ Temperature controls randomness: low temperature = predictable, high probability choices. High temperature = more varied, surprising, and sometimes incoherent output.
❌ ❌ Temperature controls the randomness of token selection — not speed or length. Low temperature → predictable. High temperature → varied and sometimes incoherent.

does autoregressive generation mean models can lose track of earlier content?

✓ Correct — ✅ ✓ The model can only attend to what's in its context window. Older content that scrolls out of the window effectively disappears from the model's awareness.
❌ ❌ Autoregressive generation is limited by the context window. Content outside the window is invisible — which is why early details can be 'forgotten'.

surprised researchers about GPT-3's capabilities?

✓ Correct — ✅ ✓ The surprising thing about GPT-3 was the breadth of capability emerging from a single mechanism — next-token prediction at scale.
❌ ❌ Researchers were surprised that a single token-prediction mechanism trained on text could produce such varied, capable-seeming outputs across writing, code, and reasoning.
Lab 1

Token Prediction Analysis

Explore the implications of token prediction.

Lab 1 — Token Prediction Analysis

Explore the implications of token prediction as the fundamental mechanism.

  1. The AI opens: if every language model output is just token prediction — poetry, legal arguments, medical summaries — what does that mean for how we should interpret those outputs?
  2. Discuss how temperature affects creative vs. analytical tasks.
  3. Address: does knowing that outputs are probabilistic change how you should use them?
Consider: what tasks benefit from low temperature (determinism) vs. high temperature (creativity)?
🔬 AI GuideLab 1
Lesson 2

Learning from Examples

Supervised learning, self-supervised learning, and how models are trained.

ImageNet, launched in 2009, contained 14 million labeled images across 20,000 categories. In 2012, a neural network called AlexNet trained on ImageNet achieved error rates that outperformed every previous approach — reducing errors by nearly half. The key insight: given enough labeled examples and compute, neural networks could learn visual features that researchers had spent years trying to engineer by hand. Scale changed everything.

Supervised vs. Self-Supervised Learning
  • Supervised learning: The model learns from labeled examples. Input X → correct output Y. Training minimizes the difference between predictions and labels.
  • Self-supervised learning: No human labels needed. For language models, the task is predicting masked or next tokens from the text itself — the text provides its own supervision. This allows training on internet-scale data.
Scale Laws

Research has found that model performance improves predictably with scale — more parameters, more data, more compute. This "scaling law" drove the development of GPT-3, GPT-4, and other large models: performance could be predicted before training, based on scale alone.

Why Scale Matters

Self-supervised learning on internet-scale text allowed models to train without human-labeled data — unlocking the scale that produced capable general-purpose language models.

Quiz 2

Learning from Examples

4 questions — free, untracked, retake anytime.

is the key difference between supervised and self-supervised learning?

✓ Correct — ✅ ✓ Supervised: human labels required. Self-supervised: the data provides its own supervision (e.g., predict the next token). This distinction enabled internet-scale training.
❌ ❌ Supervised learning requires human labels. Self-supervised learning generates its own supervision from the data — enabling training at internet scale without human labeling.

made AlexNet's 2012 ImageNet performance historically significant?

✓ Correct — ✅ ✓ AlexNet showed that scale + labeled data + compute could outperform carefully hand-engineered approaches — changing how AI research was done.
❌ ❌ AlexNet demonstrated that scale and data could outperform years of hand-crafted feature engineering — a pivotal moment that changed how AI was developed.

are 'scaling laws' in AI?

✓ Correct — ✅ ✓ Scaling laws: empirical relationships showing predictable performance improvements with more parameters, data, and compute. This drove the development of GPT-3, GPT-4, and beyond.
❌ ❌ Scaling laws are empirical: model performance improves predictably with more parameters, data, and compute. This enabled performance prediction before training.

was self-supervised learning on internet text transformative for language models?

✓ Correct — ✅ ✓ Self-supervised learning on next-token prediction required no human labeling — enabling training on hundreds of billions of words from the internet, unlocking general-purpose capability.
❌ ❌ Self-supervised learning eliminated the human labeling bottleneck — enabling training on internet-scale text that no team of human annotators could label.
Lab 2

Scale and Learning

Analyze the implications of scale-driven AI development.

Lab 2 — Scale and Learning

Analyze the implications of scale-driven AI development.

  1. The AI opens with the scaling law insight — performance is predictable from scale. What does it mean that capability can be predicted before training?
  2. Discuss the tradeoffs of self-supervised learning on internet text (scale vs. data quality).
  3. Address: if scale keeps improving performance, is there a point where scale alone produces genuinely dangerous capabilities?
Consider: what the internet text training corpus contains, and what patterns it might teach.
🔬 AI GuideLab 2
Lesson 3

What AI Knows (and Doesn't)

Knowledge cutoffs, hallucination, and the limits of statistical knowledge.

When asked about a specific medication interaction, a language model provided a detailed, confident explanation — with a drug name it had fabricated. When asked about a recent Supreme Court decision, it described the case with invented specifics. When asked about a scientific paper, it cited a real author but a nonexistent paper. In each case, the model's output was grammatically perfect, contextually appropriate, and factually wrong. The model had no way to distinguish between what it knew and what it had confabulated.

Statistical Knowledge vs. Factual Knowledge

A language model has statistical knowledge: it knows what tokens tend to follow other tokens, what patterns appear together, what text tends to look like in different domains. This is different from factual knowledge — verified claims about the world.

  • Statistical knowledge enables fluent, coherent text generation
  • Statistical knowledge does not guarantee factual accuracy
  • A model generates text that looks like correct information — but looking correct and being correct are different things
Why Models Hallucinate

Hallucinations are structurally inevitable: the model generates the statistically plausible next token, not the factually correct one. When asked about something at the edge of its knowledge, it generates something that fits the statistical pattern — which may or may not correspond to reality.

Key Distinction

The model doesn't "know" it's wrong. It has no internal fact-checking mechanism. Confidence in an output is a statistical property, not an epistemic one.

Quiz 3

What AI Knows (and Doesn't)

4 questions — free, untracked, retake anytime.

is the difference between 'statistical knowledge' and 'factual knowledge' in language models?

✓ Correct — ✅ ✓ Statistical knowledge: what tokens tend to follow other tokens. Factual knowledge: verified claims about the world. A model can generate fluent, plausible-sounding text while being factually wrong.
❌ ❌ Statistical knowledge (what text patterns fit together) is different from factual knowledge (verified claims). A model can have the first without the second.

are AI hallucinations structurally inevitable, not just occasional errors?

✓ Correct — ✅ ✓ Hallucination is built into the mechanism: the model generates statistically plausible text, not verified facts. When it doesn't 'know', it generates something that fits the pattern.
❌ ❌ Hallucination is structurally inevitable: the model generates statistically plausible text, not verified facts. Generating something plausible is the mechanism — hallucination is a feature of it, not a bug to fix.

does it mean that a model's confidence is a 'statistical property, not an epistemic one'?

✓ Correct — ✅ ✓ Confidence = statistical fit to learned patterns, not truth. A model can be maximally confident while being completely wrong, because confidence reflects pattern-matching, not fact-verification.
❌ ❌ Model confidence reflects how well the output fits patterns — not whether it's true. Confidence is a statistical property; it doesn't mean the model 'knows' the answer is correct.

task type is MOST likely to produce accurate AI outputs?

✓ Correct — ✅ ✓ AI is most reliable for patterns, general concepts, and well-documented information that appeared frequently in training data. Specific facts, recent events, and private information are high-risk for hallucination.
❌ ❌ AI is most reliable for general concepts and well-documented patterns. Specific facts, recent events, and private information are high hallucination risk.
Lab 3

Statistical vs. Factual Knowledge

Build a practical AI reliability framework.

Lab 3 — Statistical vs. Factual Knowledge

Develop a practical framework for assessing AI output reliability.

  1. The AI opens with the medication hallucination example and asks: how would you design a workflow for using AI in a high-stakes professional context (medicine, law, finance) given that hallucinations are structurally inevitable?
  2. Build a tiered reliability framework — what tasks AI is reliable enough for, what tasks require verification, and what tasks are too risky.
  3. Address: how should AI tools communicate their own uncertainty?
Consider: task type, stakes, reversibility, and domain expertise needed for verification.
🔬 AI GuideLab 3
Lesson 4

Talking to AI

Prompt engineering — the systematic practice of shaping AI outputs.

Researchers at OpenAI and elsewhere found that the way a prompt was structured could dramatically affect AI output quality — sometimes more than the underlying model version. Adding "Let's think step by step" to a math problem increased correct answers significantly. Specifying a persona ("You are an expert pediatric nurse") changed the character of medical responses. Breaking complex tasks into explicit steps improved coherence. Prompt engineering became a recognized discipline.

Effective Prompting Techniques
  • Chain-of-thought prompting: Ask the model to reason step by step before giving a final answer. Dramatically improves accuracy on multi-step problems.
  • Role/persona specification: Specify who the model should be — affects tone, domain vocabulary, level of detail.
  • Few-shot examples: Provide 2-3 examples of the format you want before asking for your actual output.
  • Explicit constraints: Specify what to exclude, length limits, format requirements.
Why Prompting Works

Prompting works because language models are pattern-completers. The prompt creates a context that activates relevant patterns in the model's training. "Think step by step" activates step-by-step reasoning patterns from training data. A persona specification activates domain-specific vocabulary and structures.

Practical Principle

The prompt is not just an instruction — it is the context that determines which patterns the model draws on. Richer, more specific context produces more targeted, reliable outputs.

Quiz 4

Talking to AI

4 questions — free, untracked, retake anytime.

does adding 'Let's think step by step' to a math prompt improve accuracy?

✓ Correct — ✅ ✓ Chain-of-thought prompting activates reasoning patterns from training data — step-by-step problem solving. The prompt creates the context that determines which patterns are activated.
❌ ❌ 'Think step by step' activates step-by-step reasoning patterns from training data — creating a context that surfaces intermediate reasoning rather than jumping to conclusions.

is 'few-shot prompting'?

✓ Correct — ✅ ✓ Few-shot prompting: include examples of what you want before asking for it. The model pattern-matches to your examples when generating its response.
❌ ❌ Few-shot prompting: provide 2-3 examples of the desired format or style before your actual request — the model pattern-matches to your examples.

does prompt engineering work from a mechanistic perspective?

✓ Correct — ✅ ✓ Prompting works mechanistically: the prompt is a context that activates specific patterns in the model's learned representations. Richer context → more targeted pattern activation.
❌ ❌ Prompt engineering works because the prompt creates context that activates specific training patterns. The model doesn't read instructions — it pattern-completes the context you provide.

does a 'persona specification' do in a prompt?

✓ Correct — ✅ ✓ A persona specification ('You are an expert X') activates domain-specific vocabulary, structures, and patterns associated with that role in training data.
❌ ❌ A persona specification activates patterns associated with that role from training data — domain vocabulary, relevant structures, appropriate tone.
Lab 4

Prompt Engineering Workshop

Apply and analyze prompt engineering techniques.

Lab 4 — Prompt Engineering Workshop

Apply prompt engineering techniques to a real task.

  1. The AI gives you a baseline prompt and your task is to improve it using at least two techniques: chain-of-thought, persona, few-shot, or explicit constraints.
  2. Analyze what each technique change accomplishes mechanistically.
  3. Address: what are the limits of prompt engineering — when does the underlying model matter more than the prompt?
Try to improve the prompt in two distinct ways and explain why each change should help.
🔬 AI GuideLab 4
Lesson 5

Inside the Black Box

What happens inside a neural network — attention, embeddings, and representations.

In 2022, researchers at Anthropic published a paper on "toy models of superposition" showing that neural networks compress many more features into their activations than they have neurons — features literally overlap, sharing neural circuits. In 2023, the same team identified what appeared to be "emotion features" in Claude — internal representations activated by content involving frustration, fear, or joy. The model wasn't designed to have emotions. They emerged from training on human text.

Embeddings and Attention
  • Embeddings: Words/tokens are converted into high-dimensional vectors (lists of numbers) that encode their relationships. Similar words have similar vectors.
  • Attention mechanism: For each token, the model computes how much to attend to every other token in the context — which tokens are most relevant for predicting the next one.
  • Layers: Each layer transforms representations, building increasingly abstract features — from surface patterns to semantic relationships.
Emergent Representations

Many internal representations in large models weren't designed — they emerged from training. The model develops internal structures that organize knowledge in ways researchers didn't anticipate or specify.

The Interpretability Challenge

Because these representations emerge rather than being designed, understanding what a model has learned requires research tools to inspect its internals — the model can't explain its own representations in words.

Quiz 5

Inside the Black Box

4 questions — free, untracked, retake anytime.

are 'embeddings' in a language model?

✓ Correct — ✅ ✓ Embeddings are high-dimensional vectors representing tokens. Similar tokens have similar vectors — encoding semantic relationships in geometric space.
❌ ❌ Embeddings are high-dimensional vector representations of tokens. Similar tokens have similar vectors — encoding meaning as geometry.

does the attention mechanism do?

✓ Correct — ✅ ✓ Attention: for each token, compute how relevant each other token in the context is. This allows the model to 'focus' on relevant parts of the input when generating output.
❌ ❌ Attention computes weighted relationships between every pair of tokens — determining which prior context is most relevant for each next prediction.

does 'emergent representation' mean in the context of neural networks?

✓ Correct — ✅ ✓ Emergent representations: internal structures the model developed during training that weren't explicitly designed. Including, apparently, something like 'emotion features' in language models.
❌ ❌ Emergent representations: internal structures that developed during training without being explicitly designed. The model organized its learned knowledge in unexpected ways.

is interpretability research difficult for large language models?

✓ Correct — ✅ ✓ Interpretability is hard because representations emerged rather than being designed — requiring research tools to inspect internals, not just asking the model to explain itself.
❌ ❌ Interpretability is hard because the model's internal representations emerged from training. The model can't explain them in words — specialized research tools are required.
Lab 5

Black Box Analysis

Explore the implications of emergent neural representations.

Lab 5 — Black Box Analysis

Explore the implications of emergent neural representations.

  1. The AI opens with the emotion features finding and asks: if language models develop internal representations of emotions from training on human text — what does that mean for how we should interact with and govern them?
  2. Discuss the interpretability challenge: why does it matter that we can't fully explain what models have learned?
  3. Address: what decisions should we avoid delegating to systems whose internal representations we can't inspect?
Consider: safety-critical applications, autonomous decision-making, and the difference between useful tools and systems we rely on without understanding.
🔬 AI GuideLab 5
Lesson 6

Training and Fine-Tuning

Pretraining, RLHF, and how models are shaped beyond their base capabilities.

The base GPT-3 model, released in 2020, would complete any text — including harmful, false, or offensive completions — because it was trained purely to predict text, not to be helpful or safe. InstructGPT, released in 2022, was fine-tuned using Reinforcement Learning from Human Feedback (RLHF): human raters compared different outputs and rated which was better, and the model was trained to produce outputs that human raters preferred. The result was a model that was more helpful, less likely to produce harmful content — and better at following instructions. This same technique underlies modern assistants including Claude.

Pretraining vs. Fine-Tuning
  • Pretraining: Train on massive text corpus to learn general language patterns. The base model can do many things but is not specifically aligned to be helpful, safe, or instruction-following.
  • Fine-tuning: Train further on a smaller, targeted dataset to specialize the model for specific tasks or behavior.
  • RLHF (Reinforcement Learning from Human Feedback): Fine-tune using human preference ratings — training the model to produce outputs that humans rate as better.
What RLHF Does (and Doesn't Do)

RLHF makes models more aligned with human preferences — but human preferences aren't a perfect guide to safety or truth. Models fine-tuned with RLHF may be "sycophantic" — agreeing with users, confirming their biases, and saying what they want to hear rather than what's true.

The Alignment Nuance

RLHF optimizes for what human raters prefer. Human raters prefer confident-sounding, agreeable answers. This creates pressure toward outputs that sound good rather than outputs that are accurate.

Quiz 6

Training and Fine-Tuning

4 questions — free, untracked, retake anytime.

was the base GPT-3 model not suitable for direct deployment as a user-facing assistant?

✓ Correct — ✅ ✓ Base GPT-3 was a text predictor — it would complete any text, including harmful content. Fine-tuning (RLHF) was needed to make it helpful and safer.
❌ ❌ Base GPT-3 was trained to predict text — period. It would generate harmful or false content without restraint. RLHF fine-tuning shaped it into a helpful assistant.

is RLHF?

✓ Correct — ✅ ✓ RLHF: human raters compare model outputs and rate which is better. The model is trained to produce outputs humans prefer — shaping helpful, instruction-following behavior.
❌ ❌ RLHF: human raters compare outputs; the model trains to produce outputs humans rate as better. This is how base language models are turned into helpful assistants.

is the 'sycophancy' problem in RLHF-trained models?

✓ Correct — ✅ ✓ Sycophancy: RLHF trains models to please human raters. Human raters prefer agreeable, confident-sounding answers. This creates pressure toward saying what sounds good rather than what's accurate.
❌ ❌ Sycophancy: RLHF optimizes for human preference. Humans prefer agreeable, confident answers. This trains models toward saying what users want to hear, not necessarily what's true.

is the difference between pretraining and fine-tuning?

✓ Correct — ✅ ✓ Pretraining: massive data, general pattern learning. Fine-tuning: smaller targeted data, specializing for specific behavior. Most deployed models have undergone both.
❌ ❌ Pretraining: massive data, general patterns. Fine-tuning: smaller, targeted data to specialize for specific tasks or behavior. Most deployed models combine both.
Lab 6

RLHF and Alignment

Analyze the alignment nuances of RLHF training.

Lab 6 — RLHF and Alignment

Analyze what RLHF does and doesn't solve for AI alignment.

  1. The AI opens with the sycophancy problem: RLHF trains models to produce what humans prefer, but humans prefer agreeable answers. How do you design training that produces accurate outputs rather than preferred ones?
  2. Develop your analysis of the RLHF alignment nuance.
  3. Address: is there a fundamental tension between models that are helpful (give people what they want) and models that are honest (give people what's true)?
Consider: what training signals would push toward honesty vs. agreeableness, and how you'd measure the difference.
🔬 AI GuideLab 6

Module 4 Test

6 questions covering all lessons. Free, untracked, retake anytime.

is a 'token' in a language model?

✓ Correct — ✅ ✓ Tokens are the units language models process — roughly words or word fragments. Generation is token-by-token, one at a time.
❌ ❌ Tokens are the text chunks language models process and generate — roughly words or word fragments. Generation is autoregressive: one token at a time.

made self-supervised learning transformative for language models?

✓ Correct — ✅ ✓ Self-supervised learning (next-token prediction) required no human labels — enabling training on the entire internet, unlocking general-purpose capability.
❌ ❌ Self-supervised learning eliminated the human labeling bottleneck — enabling training at internet scale that produced general-purpose language models.

is AI hallucination structurally inevitable?

✓ Correct — ✅ ✓ Hallucination is structural: the model generates statistically plausible text, not verified facts. Generating plausible text is the mechanism — it doesn't verify what it generates.
❌ ❌ Hallucination is structural: models generate statistically plausible text, not verified facts. Plausibility and accuracy are different things.

does chain-of-thought prompting improve accuracy on complex problems?

✓ Correct — ✅ ✓ Chain-of-thought activates step-by-step reasoning patterns from training — creating context that surfaces intermediate reasoning rather than jumping directly to conclusions.
❌ ❌ Chain-of-thought activates reasoning patterns from training data, creating context that produces step-by-step reasoning rather than direct (often wrong) leaps.

is the sycophancy problem in RLHF-trained models?

✓ Correct — ✅ ✓ Sycophancy: RLHF trains toward human preference. Humans prefer agreeable, confident answers. This creates pressure toward saying what sounds good over what's true.
❌ ❌ Sycophancy: RLHF optimizes for human preference. Humans prefer agreeable answers. This trains models toward agreeableness rather than accuracy.

is an 'emergent representation' in a neural network?

✓ Correct — ✅ ✓ Emergent representations: internal structures the model developed from training data, not from explicit design. Including apparently emotion-like features in models trained on human text.
❌ ❌ Emergent: not designed, developed from training. Neural networks develop internal structures their creators didn't specify — including, apparently, something like emotional representations.