Token prediction, probability distributions, and how language models generate text.
When GPT-3 was released in 2020, researchers were startled by its ability to complete complex passages, write code, and compose poetry — all from a single underlying mechanism: predicting the next token. The model had no explicit rules about grammar, style, or meaning. It had only learned statistical patterns from 570GB of text. That a single prediction mechanism could produce outputs that appeared so varied and capable surprised even its creators.
Language models don't process words — they process tokens, which are chunks of text (roughly word fragments or words). For each position in a sequence, the model produces a probability distribution over its entire vocabulary: a score for every possible next token, representing how likely it is given the preceding context.
Understanding token prediction explains many AI behaviors: why outputs become incoherent at high temperature, why models lose track of earlier context, and why the same prompt can produce different outputs on different runs.
Every AI language output — whether a poem, a legal argument, or a medical summary — is the result of the same underlying process: token-by-token probability prediction. There is no separate "reasoning" layer.
4 questions — free, untracked, retake anytime.
is a 'token' in a language model?
does 'temperature' control in language model generation?
does autoregressive generation mean models can lose track of earlier content?
surprised researchers about GPT-3's capabilities?
Explore the implications of token prediction.
Explore the implications of token prediction as the fundamental mechanism.
Supervised learning, self-supervised learning, and how models are trained.
ImageNet, launched in 2009, contained 14 million labeled images across 20,000 categories. In 2012, a neural network called AlexNet trained on ImageNet achieved error rates that outperformed every previous approach — reducing errors by nearly half. The key insight: given enough labeled examples and compute, neural networks could learn visual features that researchers had spent years trying to engineer by hand. Scale changed everything.
Research has found that model performance improves predictably with scale — more parameters, more data, more compute. This "scaling law" drove the development of GPT-3, GPT-4, and other large models: performance could be predicted before training, based on scale alone.
Self-supervised learning on internet-scale text allowed models to train without human-labeled data — unlocking the scale that produced capable general-purpose language models.
4 questions — free, untracked, retake anytime.
is the key difference between supervised and self-supervised learning?
made AlexNet's 2012 ImageNet performance historically significant?
are 'scaling laws' in AI?
was self-supervised learning on internet text transformative for language models?
Analyze the implications of scale-driven AI development.
Analyze the implications of scale-driven AI development.
Knowledge cutoffs, hallucination, and the limits of statistical knowledge.
When asked about a specific medication interaction, a language model provided a detailed, confident explanation — with a drug name it had fabricated. When asked about a recent Supreme Court decision, it described the case with invented specifics. When asked about a scientific paper, it cited a real author but a nonexistent paper. In each case, the model's output was grammatically perfect, contextually appropriate, and factually wrong. The model had no way to distinguish between what it knew and what it had confabulated.
A language model has statistical knowledge: it knows what tokens tend to follow other tokens, what patterns appear together, what text tends to look like in different domains. This is different from factual knowledge — verified claims about the world.
Hallucinations are structurally inevitable: the model generates the statistically plausible next token, not the factually correct one. When asked about something at the edge of its knowledge, it generates something that fits the statistical pattern — which may or may not correspond to reality.
The model doesn't "know" it's wrong. It has no internal fact-checking mechanism. Confidence in an output is a statistical property, not an epistemic one.
4 questions — free, untracked, retake anytime.
is the difference between 'statistical knowledge' and 'factual knowledge' in language models?
are AI hallucinations structurally inevitable, not just occasional errors?
does it mean that a model's confidence is a 'statistical property, not an epistemic one'?
task type is MOST likely to produce accurate AI outputs?
Build a practical AI reliability framework.
Develop a practical framework for assessing AI output reliability.
Prompt engineering — the systematic practice of shaping AI outputs.
Researchers at OpenAI and elsewhere found that the way a prompt was structured could dramatically affect AI output quality — sometimes more than the underlying model version. Adding "Let's think step by step" to a math problem increased correct answers significantly. Specifying a persona ("You are an expert pediatric nurse") changed the character of medical responses. Breaking complex tasks into explicit steps improved coherence. Prompt engineering became a recognized discipline.
Prompting works because language models are pattern-completers. The prompt creates a context that activates relevant patterns in the model's training. "Think step by step" activates step-by-step reasoning patterns from training data. A persona specification activates domain-specific vocabulary and structures.
The prompt is not just an instruction — it is the context that determines which patterns the model draws on. Richer, more specific context produces more targeted, reliable outputs.
4 questions — free, untracked, retake anytime.
does adding 'Let's think step by step' to a math prompt improve accuracy?
is 'few-shot prompting'?
does prompt engineering work from a mechanistic perspective?
does a 'persona specification' do in a prompt?
Apply and analyze prompt engineering techniques.
Apply prompt engineering techniques to a real task.
What happens inside a neural network — attention, embeddings, and representations.
In 2022, researchers at Anthropic published a paper on "toy models of superposition" showing that neural networks compress many more features into their activations than they have neurons — features literally overlap, sharing neural circuits. In 2023, the same team identified what appeared to be "emotion features" in Claude — internal representations activated by content involving frustration, fear, or joy. The model wasn't designed to have emotions. They emerged from training on human text.
Many internal representations in large models weren't designed — they emerged from training. The model develops internal structures that organize knowledge in ways researchers didn't anticipate or specify.
Because these representations emerge rather than being designed, understanding what a model has learned requires research tools to inspect its internals — the model can't explain its own representations in words.
4 questions — free, untracked, retake anytime.
are 'embeddings' in a language model?
does the attention mechanism do?
does 'emergent representation' mean in the context of neural networks?
is interpretability research difficult for large language models?
Explore the implications of emergent neural representations.
Explore the implications of emergent neural representations.
Pretraining, RLHF, and how models are shaped beyond their base capabilities.
The base GPT-3 model, released in 2020, would complete any text — including harmful, false, or offensive completions — because it was trained purely to predict text, not to be helpful or safe. InstructGPT, released in 2022, was fine-tuned using Reinforcement Learning from Human Feedback (RLHF): human raters compared different outputs and rated which was better, and the model was trained to produce outputs that human raters preferred. The result was a model that was more helpful, less likely to produce harmful content — and better at following instructions. This same technique underlies modern assistants including Claude.
RLHF makes models more aligned with human preferences — but human preferences aren't a perfect guide to safety or truth. Models fine-tuned with RLHF may be "sycophantic" — agreeing with users, confirming their biases, and saying what they want to hear rather than what's true.
RLHF optimizes for what human raters prefer. Human raters prefer confident-sounding, agreeable answers. This creates pressure toward outputs that sound good rather than outputs that are accurate.
4 questions — free, untracked, retake anytime.
was the base GPT-3 model not suitable for direct deployment as a user-facing assistant?
is RLHF?
is the 'sycophancy' problem in RLHF-trained models?
is the difference between pretraining and fine-tuning?
Analyze the alignment nuances of RLHF training.
Analyze what RLHF does and doesn't solve for AI alignment.
6 questions covering all lessons. Free, untracked, retake anytime.
is a 'token' in a language model?
made self-supervised learning transformative for language models?
is AI hallucination structurally inevitable?
does chain-of-thought prompting improve accuracy on complex problems?
is the sycophancy problem in RLHF-trained models?
is an 'emergent representation' in a neural network?