In 1965, Gordon Moore observed that the number of transistors on a chip was doubling roughly every two years. His note in Electronics magazine was not a prophecy — it was a description of something already happening. Within a decade, that observation had reordered the computing industry entirely. Engineers who had been writing assembly instructions for room-sized machines found themselves holding pocket calculators. The skill that had made them valuable — manually managing every clock cycle — became a liability. The new premium was on architecture, judgment, and domain knowledge.
Something structurally similar is happening in software development right now. In March 2023, OpenAI released GPT-4 with a function-calling API. By late 2024, Anthropic's Claude, Google's Gemini, and a cascade of open-weight models — Mistral, Llama 3, Qwen — had made powerful language models accessible through a few lines of Python. GitHub Copilot crossed one million paid subscribers in 2023. Stack Overflow traffic fell 14% year-over-year in 2024 as developers routed questions to AI assistants instead. The bottleneck in software is no longer typing code. It is knowing what to build, understanding what the model is actually doing, and connecting the right tools reliably.
This course teaches you to build real AI-powered applications using Python, modern APIs, and the tooling ecosystem that has emerged around large language models. It is practical by design: every lesson ends in a hands-on lab where you write, test, and reason through actual code. You will leave with working knowledge of Python for AI, the OpenAI and Anthropic APIs, prompt engineering as a craft, vector databases, retrieval-augmented generation, and how to evaluate model outputs honestly. This course will not make AI feel magical. It will make it feel mechanical — and that is far more useful.
If you finish every module, here's who you become:
In November 2022, the week after ChatGPT launched, Andrej Karpathy — then at Tesla, soon to return to OpenAI — posted a tweet that became widely quoted: "The hottest new programming language is English." The joke landed because it was partly true. But what the joke elided is that English instructions still flow through Python. Every major AI API — OpenAI, Anthropic, Cohere, Mistral, Hugging Face — is consumed via Python SDKs. Every deployment pipeline, every evaluation harness, every vector database client is a Python library. The language of AI infrastructure is Python. If you cannot set up a clean Python environment, install dependencies without breaking things, and structure a project sensibly, you will spend more time debugging your workspace than building anything.
Most tutorials skip environment setup or treat it as a two-line afterthought. This creates compounding problems. The Python ecosystem uses version-specific package resolution. A library installed globally can silently shadow the version your project needs. API keys stored carelessly in source files get committed to public repositories — a well-documented and expensive mistake. In 2023, GitGuardian reported detecting over 10 million secrets leaked on GitHub, the majority being API keys and tokens.
The professional approach is: one virtual environment per project, dependencies pinned in a requirements.txt or pyproject.toml, secrets loaded from environment variables, and project structure that separates code, configuration, and data from the start.
You need exactly five things to start building AI applications. Everything else is optional until proven necessary.
pip freeze > requirements.txt after stabilizing..env file into environment variables. The .env file never gets committed. Simple and effective.Starting with good structure costs five minutes and saves hours. Here is the layout used throughout this course:
Here is the exact sequence. Run these commands in your terminal — this is the same sequence used by production teams:
Once your environment is configured, making an API call is genuinely simple. Here is the full working code to call OpenAI's gpt-4o-mini — a fast, cheap model ideal for development:
gpt-4o-mini costs $0.15 per million input tokens and $0.60 per million output tokens as of mid-2025. A 300-token response costs less than $0.0002. Use it freely during development. Switch to gpt-4o or Claude Sonnet only when capability requires it.
openai, anthropic) that wraps API calls in convenient Python functions.In this lab you'll work through setting up a Python AI project from scratch — virtual environments, dependency management, .env configuration, and making your first API call. Your AI lab assistant will guide you step by step and answer questions about any part of the process.
Work through these objectives in conversation. You can ask for clarification, request code examples, or ask the assistant to explain why each step matters.
When OpenAI released the GPT-3 API in 2020, only developers on a waitlist could access it. By the time GPT-4 launched in March 2023, access was open, the Python SDK had a clean interface, and the documentation was thorough enough that a competent developer could go from zero to a working application in an afternoon. What had changed wasn't just capability — it was the design of the API itself. The chat completions format, the structured message array with roles, the parameters for controlling randomness and length: these were design decisions that made the API predictable enough to build on seriously. Understanding those parameters isn't optional. Temperature, max_tokens, stop sequences — each has a direct effect on what your application does.
Every call to the OpenAI Chat Completions API sends a JSON object and receives a JSON object. The Python SDK handles the serialization, but you should understand the underlying structure because it's what you're actually controlling.
Most parameters you will leave at their defaults. These three you will actively tune for every application:
Streaming is not optional for production user-facing applications. A 500-token response at default speed takes 3–6 seconds to complete. With streaming, the user sees the first token in under a second. Here is the correct pattern:
The Anthropic SDK follows a nearly identical pattern to OpenAI's. The key structural difference is that Anthropic separates the system prompt from the messages array — it's a top-level parameter, not a message with role "system". Claude models also have a distinct context window: Claude 3.5 Sonnet supports 200,000 tokens of context, versus GPT-4o's 128,000.
| Feature | OpenAI (gpt-4o-mini) | Anthropic (Claude Haiku) |
|---|---|---|
| System prompt | Message with role="system" | Top-level system= parameter |
| Context window | 128,000 tokens | 200,000 tokens |
| Response object | response.choices[0].message.content | message.content[0].text |
| Streaming | stream=True, iterate chunks | stream=True, message_stream |
| Price (input/output) | $0.15/$0.60 per MTok | $0.80/$4.00 per MTok (Haiku 3.5) |
Use OpenAI gpt-4o-mini for development — it's the cheapest capable model. For production, benchmark both on your actual task. Claude models often perform better on long-document tasks and structured extraction. GPT-4o performs well on code generation and tool use. The right answer is always empirical, not tribal loyalty.
You'll work through experimenting with the key API parameters from Lesson 2 — temperature, max_tokens, stop sequences, and streaming. The lab assistant will guide you through concrete experiments and help you understand what each parameter change actually produces.
In September 2023, a group of researchers at DeepMind published a paper titled "Large Language Models as Optimizers" showing that asking a model to improve its own prompt — using the model to do prompt engineering — outperformed human-written prompts on several benchmarks. The paper wasn't evidence that prompt engineering was trivial; it was evidence that it was difficult enough that automation was worth pursuing. What this field calls "prompt engineering" is not the same as writing better sentences. It is a systematic craft: specifying behavior precisely, constraining output format, providing examples that constrain the inference space, and testing against failure modes. The developers who treat prompts as engineering artifacts — versioned, tested, iterated — produce more reliable AI applications than those who treat them as magic incantations.
Every well-engineered prompt has four components. Not every prompt needs all four, but knowing which to include and why is the skill.
Few-shot prompting — including 2–5 examples of the input/output pattern you want — is the single most reliable way to improve model performance on structured tasks. Examples constrain the inference space more precisely than verbal instructions alone.
For complex reasoning tasks, adding "Think step by step before giving your final answer" measurably improves accuracy. Google's 2022 paper "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models" (Wei et al.) showed this effect consistently across arithmetic, symbolic reasoning, and commonsense tasks. It works by encouraging the model to generate intermediate steps that serve as working memory.
Parsing free-text responses programmatically is fragile. OpenAI's response_format parameter (available on gpt-4o and gpt-4o-mini) forces valid JSON output. Combined with a precise prompt specifying the schema, this is reliable enough for production use:
A prompt is a program. Version it. When you change a prompt, you are changing application behavior — you need to know that the new version is better than the old one, and on which inputs it might be worse. The minimum viable prompt testing harness:
You'll build a complete prompt engineering workflow for a real task: classifying customer support emails into structured categories and extracting key information as JSON. You'll write the prompt, test it against failure cases, and iterate to handle edge cases.
In January 2024, a developer posted on Reddit that their startup had received an unexpected OpenAI bill of $14,000 for a single weekend. A bug in their conversation management code had allowed chat histories to grow unbounded — every new user message included the entire session history, which could run to 50,000 tokens per request. Multiplied by a viral launch that sent thousands of concurrent users into hour-long sessions, the cost was catastrophic. The root cause was not using the API incorrectly — the API was doing exactly what it was told. The problem was that nobody had thought through conversation memory management, token counting, or cost controls as engineering concerns. They are.
Language model APIs are stateless. Every API call is independent. There is no persistent memory between calls. The illusion of conversation — of the model "remembering" what was said earlier — is created entirely by including previous messages in the messages array on each new call. This means conversation history grows with every exchange, and every token in the history is billed. Managing this growth is mandatory for any multi-turn application.
AI APIs fail. Rate limits, transient network errors, and occasional 500s are facts of production life. Applications that don't handle these gracefully will crash at the worst possible moments. The pattern is: catch specific exceptions, retry with exponential backoff for transient errors, and fail fast for permanent errors.
Use the tiktoken library — OpenAI's official tokenizer — to count tokens before sending requests. This lets you enforce cost limits and prevent the history growth bug described in the opening scene:
Without logging, AI application failures are nearly impossible to diagnose. Every production AI application should log at minimum: the full messages array sent, the model response, token counts, latency, and any errors. The response object itself contains usage data:
Before shipping any AI feature: ① Conversation history has a token or message count limit. ② All API calls have retry logic with exponential backoff. ③ Input token counts are validated before expensive requests. ④ Usage data is logged per request. ⑤ A monthly cost alert is configured in your API provider's dashboard.
This is the capstone lab for Module 1. You'll build a complete, production-ready chatbot in Python that incorporates everything from the module: proper environment setup, a ConversationManager class with sliding window history, exponential backoff retry logic, tiktoken-based token counting before each call, and per-request cost logging. The lab assistant will guide you through each component and help you debug as you go.