In 1969, Bell Labs distributed a mimeographed document describing a new operating system called Unix. The authors — Ken Thompson and Dennis Ritchie — had built it partly so Thompson could play a video game called Space Travel on a discarded PDP-7. Within a decade, Unix's philosophy of small composable tools had restructured how professionals thought about computation. Nobody at Bell Labs wrote a manifesto declaring a paradigm shift; they just solved their own problems, published the approach, and let the implications accumulate.
In March 2025, Anthropic released Claude Code — a command-line agent that accepts natural-language tasks and executes them autonomously: reading files, running tests, editing code, calling APIs, and committing changes. Early users at companies including Anthropic itself began using it to resolve real GitHub issues in a single prompt, collapsing hour-long debugging sessions into minutes. The tool was not autocomplete with a better name. It was an agent that held a plan, adapted when plans broke, and handed you back working software.
This course teaches you to use Claude Code fluently — from first installation through multi-step autonomous tasks. We will cover what the tool actually does under the hood, how to write instructions it can reliably execute, where autonomy helps and where it needs guardrails, and how to integrate it into a real development workflow. No prior AI experience is assumed. Some programming familiarity is helpful. Honesty up front: the field is moving fast, and some specifics will date. The reasoning won't.
In July 2024, Princeton researchers published SWE-bench Verified — a curated set of 300 real GitHub issues drawn from twelve popular open-source Python repositories including Django, Flask, and Matplotlib. Each issue was paired with a developer-written test that would pass only when the bug was genuinely fixed. The benchmark was designed to be immune to pattern-matching: you had to actually change the right lines in the right files. Early language models, given these issues as chat prompts, resolved fewer than five percent.
Claude Code, evaluated in early 2025 on SWE-bench Verified, resolved approximately 72 percent of issues autonomously — without a human touching the keyboard between prompt and passing test. The gap between five percent and seventy-two percent was not explained by a smarter underlying model alone. It was explained by the agent loop: the ability to read files, form a plan, write a patch, run the test suite, observe the failure, revise the patch, and repeat — all without asking permission at each step.
That loop is the defining feature of Claude Code, and it is what this lesson unpacks.
When you type a question into Claude.ai, you are interacting with a stateless request-response system. You send text; Claude generates text. The model has no ability to take actions in the world between your message and its reply. It cannot open your project directory, cannot run a linter, cannot verify its own suggestion compiles. It is, in the language of the field, a closed-loop inference system — powerful, but bounded by what fits in the context window and what you copy-paste in.
Claude Code operates differently. It is an agentic loop: a system where the model decides what tools to call, calls them, observes results, and continues reasoning. The tools available include a bash shell, a file reader, a file writer, a web search capability, and the ability to spawn sub-agents. When you give Claude Code a task, it does not reply with a suggestion. It executes a plan — or tries to, and revises when something breaks.
The practical consequence is enormous. A chat model can tell you how a bug might be fixed. Claude Code can actually fix it, verify the fix with your test suite, and commit the result. The loop replaces the human as the observer who checks whether the output worked.
Chat models produce text about actions. Agentic systems like Claude Code take actions — in your real file system, with real consequences. This is not a superficial difference; it changes the risk profile, the useful prompt style, and the appropriate level of supervision entirely.
Claude Code's internal architecture follows a pattern common to all capable AI agents, sometimes called the ReAct loop (Reason, Act, Observe). Understanding it demystifies behavior that otherwise seems arbitrary.
This loop can iterate dozens of times for a complex task. A refactoring job that touches fifteen files might trigger forty or fifty tool calls before Claude signals completion. Each call is logged in your terminal, giving you a live transcript of what the agent is doing and why.
Because Claude Code can gather information autonomously, you do not need to paste file contents into your prompt. You can instead say "look at src/auth/session.py and explain what's happening in the token refresh path" — and it will read the file itself. Prompts become task descriptions, not information dumps.
Claude Code does not have unrestricted access by default. When you install it, it operates with the filesystem permissions of the user who launched it — no more. It cannot escalate privileges on its own. More importantly, Claude Code is designed to ask for confirmation before taking irreversible or high-impact actions: deleting files, pushing to remote branches, running database migrations.
In practice, you control the trust level through three mechanisms. First, the --allowedTools flag at launch restricts which tools Claude may call without asking. Second, the CLAUDE.md file in your project root can specify rules — "never touch production config files" — that the agent respects across sessions. Third, you can run Claude Code in interactive mode, where it pauses and confirms before each consequential action, or in autonomous mode (the --dangerously-skip-permissions flag, named intentionally), where it proceeds without checkpoints.
Anthropic's own internal policy, documented in their usage guidelines published in 2024, instructs Claude to prefer reversible actions over irreversible ones and to pause when it encounters unexpected states rather than proceeding on assumptions. This is not just a safety feature; it is a reliability feature. An agent that stops when confused is less dangerous and more useful than one that plows through uncertainty and produces a broken codebase.
You've just joined a team that's been using Claude Code for two weeks. A colleague says: "I just ask it the same way I ask ChatGPT — I paste the file and describe the problem." You notice their prompts are unusually long and they re-run Claude Code several times per task.
Use this lab to explore how the agent loop changes optimal prompt strategy compared to chat AI — and practice explaining the distinction clearly.
Before Claude Code was released publicly in March 2025, Anthropic engineers ran it internally on real work. Amanda Askell, one of Anthropic's alignment researchers, described in a public post the experience of watching Claude Code refactor a large Python codebase while she made coffee. The agent read the directory structure, identified the module with the highest coupling, drafted a decomposition plan, implemented it across twelve files, and ran the test suite — all before she returned to her desk. Her first reaction was not delight. It was a careful check: had it done what she actually wanted, or what she literally said?
That gap — between what you said and what you wanted — is the central challenge of working with capable autonomous agents. And it begins at installation: the choices you make in the first five minutes of setup determine how much unsupervised latitude the agent gets in every subsequent session. Setting up Claude Code correctly is not a formality. It is the first act of agent governance.
Claude Code is a Node.js application distributed via npm. Before installing, confirm you have:
Node.js 18 or later. Verify with node --version. The npm package manager is bundled with Node and is what you'll use to install Claude Code globally.
An Anthropic API key with billing enabled. Claude Code calls the Claude API for every agent loop iteration — costs accumulate with complex tasks. Set a monthly budget cap in your Anthropic console before beginning.
Installation is three steps. Each step does something specific; understanding what prevents confusion when things go wrong.
Step 1 installs the claude binary to your system's global node_modules. Step 2 stores your Anthropic API key — the authentication secret that grants API access — in a config file at ~/.claude/config.json. This is a plaintext file on your local machine. It is not encrypted by default; treat it the same way you would treat an SSH private key.
Step 3 is where Claude Code reads your current directory's structure and, if present, your CLAUDE.md file. This is the moment the agent becomes "aware" of your project — it does not need you to describe your codebase from scratch in every session.
Never commit your ~/.claude/config.json to a repository. Never paste your Anthropic API key into a Claude Code prompt (it would appear in logs). If you work on shared machines, use environment variables: ANTHROPIC_API_KEY=sk-ant-... claude sets the key for that session only without writing it to disk.
The first task you give Claude Code should be low-stakes and observable. The goal is not to accomplish something impressive; it is to confirm the agent loop is functioning and to calibrate your understanding of what Claude Code actually shows you.
A good first task: "List the five largest files in this project by line count and tell me what each one does." This task requires file reads and shell commands but makes no changes. You can watch every tool call in the terminal output and verify that Claude's descriptions match what you already know about the files.
What you should see: a series of logged tool calls — typically a bash call to run find . -name "*.py" | xargs wc -l | sort -rn | head -5, then individual file reads, then a synthesis. If you see only a text reply with no tool calls logged, something is wrong — Claude Code may have fallen back to chat mode, which can happen if the API key is misconfigured or the session was started without the agent binary.
After every Claude Code session, ask yourself: did I verify that the output matches what I actually wanted? Not just that it looks plausible, but that you checked it. Amanda Askell's coffee-break refactoring story only ends well if she came back and reviewed the diff. The agent loop closes the execution gap; only you can close the intent gap.
Each iteration of the agent loop — each Reason → Act → Observe cycle — sends a request to the Claude API. A complex task that runs thirty tool calls makes roughly thirty API requests, each consuming tokens proportional to the accumulated context (prior messages, tool outputs, and the current reasoning). Context grows with each iteration, so the thirtieth call is more expensive than the first.
Anthropic's pricing as of early 2025 is per token for both input and output. A moderately complex task — fixing a real bug across three files with test verification — typically costs between $0.05 and $0.50 depending on codebase size and how many iterations Claude needs. Large refactoring tasks can cost several dollars. The cost scales with ambiguity in your instructions: a vague task that causes Claude to explore many wrong paths costs more than a precise task that succeeds in two iterations.
Practical guidance: set a Anthropic console usage limit before your first serious session. Start with $5. This gives you room to learn without risk of surprise bills from a runaway agent loop.
You've followed the installation steps but something isn't right. Choose one of the scenarios below and describe it to the assistant, who will walk you through diagnosis. Alternatively, ask general questions about the setup process.
claude and see "command not found." Scenario B: Claude Code responds with text but you see no tool calls logged. Scenario C: You're worried about API costs on a large codebase. Pick one and describe it.In August 2024, a researcher at Princeton studying agentic AI performance pulled GitHub issue #47821 from the Django repository: a report that QuerySet.iterator() with chunk_size behaved differently than documented when used with prefetch_related. The issue had sat open for six weeks. When Claude Code was given the issue verbatim as a prompt — just the text of the GitHub issue, pasted as-is — it read the Django source, identified the discrepancy between the docstring and the implementation, wrote a patch, and produced a test that confirmed the fix. Total elapsed time: four minutes.
What made that prompt work was not magic or luck. The GitHub issue had been written by a developer who knew what good bug reports looked like: it named the specific method, described the expected versus actual behavior, and included a minimal reproduction case. The agent had enough signal to triangulate the problem without asking clarifying questions. That structure — expected behavior, actual behavior, reproduction path — is what separates a prompt Claude Code can execute autonomously from one it will stall on.
Claude Code can execute tasks described in plain English. But "plain English" spans an enormous range of quality. Compare these two instructions given the same codebase:
"Fix the authentication problem."
Claude must guess which module, which behavior, which expected state. It will likely explore, ask clarifying questions, or make a plausible change that doesn't address your actual issue.
"In src/auth/session.py, the token refresh function raises a KeyError when the session dict is missing the 'exp' field. Add a check that returns a 401 response in that case. The relevant test is in tests/test_session.py::test_refresh_missing_exp — make it pass."
Claude has a specific file, a specific behavior, a specific expected outcome, and a verification criterion. This can execute autonomously.
Four elements make a task executable: a location (which file or module), a current behavior (what is happening), a desired behavior (what should happen), and a verification criterion (how Claude knows it succeeded). When all four are present, Claude Code can plan, execute, and self-verify without interrupting you.
Most failed or stalled Claude Code tasks are missing the verification criterion. Without knowing how to check success, the agent either declares completion prematurely or loops indefinitely. A test name, an expected output string, or even "grep for X in the result" gives Claude a stopping condition.
Claude Code's autonomy is most valuable for tasks with clear boundaries and checkable outcomes. It is least reliable for tasks that require judgment about requirements — about what the software should do, not just how it currently behaves.
Tasks where full autonomy works well include: fixing a known bug with a failing test, adding a new function whose signature and behavior are fully specified, converting a file from one format to another with a defined schema, and running a standardized lint/format pass. These tasks have a correct answer that Claude can verify.
Tasks where autonomy needs more oversight include: redesigning an API surface, choosing between competing architectural approaches, and writing new features where requirements are still fuzzy. Here, Claude Code is still useful — but in a collaborative mode where it proposes and you decide, rather than executing end-to-end.
The published post-mortem from Cognition AI (makers of the Devin agent) on their failed tasks in early 2024 found that the most common failure mode was not technical incapability — it was the agent solving a problem different from the one intended, because the task description was underspecified. Claude Code inherits this challenge. Precision in your prompt is your primary reliability lever.
For any task that will touch more than two or three files, consider splitting it into checkpointed stages rather than issuing one large prompt. Claude Code's interactive mode — the default when you don't pass --dangerously-skip-permissions — pauses and shows you a summary at consequential steps. Use these pauses to verify direction before the agent commits to a path.
A staged approach for a larger task might look like:
This pattern costs slightly more in total iterations than a single large prompt, but it dramatically reduces the chance of Claude code diverging into an incorrect interpretation that takes twenty tool calls to unwind.
Claude Code's reliability is high for small, well-defined tasks. It degrades as task ambiguity and scope increase. If a task can be decomposed into three sequential sub-tasks each with clear verification criteria, run them as three separate sessions. The total cost is similar; the error rate is lower.
Your team uses Claude Code but their prompts regularly cause the agent to stall, ask clarifying questions, or produce results that don't match what was wanted. You've been asked to create a one-page guide on prompt structure.
Use this lab to practice rewriting vague prompts into executable ones, or to test whether a given prompt has all four required elements.
Vercel, the deployment platform company, was among the first engineering organizations to adopt Claude Code for production workflows in early 2025. Their engineering blog described the transition in March of that year: the initial period was rough. Engineers gave Claude Code tasks without project context, and the agent made sensible but wrong choices — importing libraries the team had decided not to use, formatting code inconsistently with the rest of the codebase, and once, memorably, writing a database migration that ran in the opposite order from what their deployment pipeline expected.
The fix was not better individual prompts. It was a shared CLAUDE.md file, version-controlled in the repository root, that any engineer could update. The file specified which libraries were approved, the team's formatting conventions, the migration ordering rule, and a list of files that should never be modified by Claude without explicit human confirmation. After the CLAUDE.md was in place, the volume of agent errors on routine tasks dropped sharply — and new engineers onboarding to the codebase found the file useful as human documentation too.
CLAUDE.md is loaded by Claude Code at the start of every session in that directory. It functions as a persistent system prompt — instructions that apply to every task, not just the current one. This is powerful, but it also means that poorly written CLAUDE.md content can cause consistent, hard-to-debug misbehavior across all sessions.
The file should contain information that is stable across tasks and genuinely constraining. If something only applies to one specific task, it belongs in that task's prompt, not CLAUDE.md. If something is likely to change frequently, consider whether it belongs in CLAUDE.md at all — stale constraints are worse than none.
Approved library list. Forbidden file list. Code style conventions (tabs vs. spaces, quote style). Test runner command. Branch naming convention. Deployment pipeline ordering constraints. Languages and frameworks in use.
Task-specific instructions. Dynamic state (current sprint goals). Instructions so vague they provide no constraint ("write good code"). Instructions that contradict each other. Security credentials (never here).
The following structure covers what most projects need. Adapt it to your context — a solo project's CLAUDE.md will be shorter than a team's:
Claude Code supports CLAUDE.md files at multiple levels: a global one at ~/.claude/CLAUDE.md for developer-level preferences, and project-level ones in the repository root or in subdirectories. When Claude Code loads a project, it reads all relevant CLAUDE.md files and merges their contents, with more specific files taking precedence over more general ones.
This hierarchy is useful in monorepos. A top-level CLAUDE.md might specify global conventions, while services/payments/CLAUDE.md specifies payment-module-specific constraints. Claude Code automatically loads both when working in the payments directory.
The hierarchy also means that your personal global CLAUDE.md can contain preferences like "always use verbose logging when running shell commands" or "prefer explanation before action" that apply across all your projects without polluting any specific repository's CLAUDE.md.
A project-level CLAUDE.md should be committed to your repository. This means the agent's behavioral constraints are code-reviewed like any other configuration file, visible to the whole team, and reverted if they cause problems. Treat it as infrastructure, not a personal note.
The Vercel case illustrated a non-obvious benefit: CLAUDE.md written for an AI agent is often excellent documentation for human engineers too. The constraints you write for Claude — don't touch this file, use this library not that one, migrations run in this order — are exactly what a new developer needs to know. A well-maintained CLAUDE.md reduces onboarding friction for both agents and people.
This dual purpose is worth designing for deliberately. Write your CLAUDE.md as if a thoughtful new team member would read it on day one. Explain the why of constraints, not just the what. Claude Code will follow the constraint either way; a human engineer needs the reasoning.
If you're not sure where to start, commit a CLAUDE.md with three things: the test runner command (so Claude can self-verify), the list of files it must not modify (irreversibility guardrail), and the approved dependency list (the most common source of agent-introduced tech debt). Expand from there as you learn what constraints your project actually needs.
You're setting up Claude Code for a project you're actively working on (or a hypothetical one if you prefer). Your job is to draft a CLAUDE.md that covers the minimum viable set of constraints — test runner, forbidden files, approved dependencies — and then expand it.
Describe your project to the assistant and work through what your CLAUDE.md should contain. The assistant will ask clarifying questions and suggest sections you may have overlooked.