Priya is three weeks into a machine learning internship at a mid-sized biotech startup. Her manager hands her a GitHub repo and says, "Get familiar with the training loop — we ship a new model version Friday." She opens the code. It's PyTorch. She's spent the last semester doing homework in TensorFlow because that's what her professor used. The syntax feels alien. She spends two hours reading docs instead of doing actual work.
On Slack, she DMs a friend who graduated a year ahead: "Did you ever learn PyTorch or did you just use TF?" The reply comes back fast: "TF is basically dead in research. PyTorch is what everyone uses. You can pick it up in a week, honestly — the mental model is way more intuitive."
That word — intuitive — is doing a lot of work. What does it actually mean for a framework to be intuitive? And why did the entire research community essentially vote with their commits?
In 2016, TensorFlow launched from Google with enormous institutional momentum. It was fast, production-ready, and backed by the most powerful tech company in AI. By most metrics, it should have won. And for a while, it did — at companies. But in academia and research labs, something else happened.
Facebook AI Research released PyTorch in 2017, and within two years it had majority share in academic papers. By 2022, the split at major ML conferences like NeurIPS was roughly 75% PyTorch, 25% TensorFlow. By 2024, that gap has only grown. Google's own DeepMind lab migrated significant work to JAX (a PyTorch cousin in spirit), and TensorFlow 2.x was largely a reactive redesign attempting to copy what PyTorch did first.
This isn't just trivia. It's a case study in how developer experience beats institutional muscle when the community is technical enough to choose. The researchers who built the best models chose PyTorch because it let them think and debug faster. And the models those researchers built became the foundations everyone else builds on. So the toolchain propagated.
TensorFlow (v1) used a "define-and-run" static graph — you described a computation graph, compiled it, then executed it. Errors were cryptic because execution was separate from definition. PyTorch used "define-by-run" (eager execution) — the graph builds dynamically as your Python code runs. This means you can use a Python debugger, print tensors mid-computation, and write loops that actually behave like Python loops. For researchers iterating on novel architectures, this was transformative.
Everything in PyTorch revolves around two concepts. Once you have these, the rest is API surface.
requires_grad=True tracks all operations done to it. Call .backward() and it computes gradients for every variable in the computational graph — automatically. This is how learning happens.
Here's the core training loop in PyTorch, stripped down to essentials:
for each batch: zero the gradients → run a forward pass → compute loss → call loss.backward() → optimizer.step()
That's it. Five steps, repeated thousands of times. Everything else — model architecture, data loading, evaluation — is scaffolding around this loop. Once you see that the loop never changes, PyTorch stops feeling complicated and starts feeling like a very clean contract.
The torch.nn module is where you define neural network architectures. The core class is nn.Module — every model you write is a subclass of it. You define two things: the layers in __init__, and what happens on a forward pass in forward().
PyTorch ships with everything you'd need: linear layers (nn.Linear), convolutions (nn.Conv2d), attention mechanisms, batch normalization, dropout, and dozens more. You compose these like LEGO blocks. A simple two-layer classifier is maybe 10 lines. A transformer encoder is maybe 40, if you're building it from scratch — and you probably won't be, because Hugging Face (Lesson 2) ships those pre-built.
What matters right now: understand that nn.Module is the abstraction that lets you treat any neural network — from a 3-layer MLP to a 70-billion-parameter LLM — as a Python object with a forward() method. This uniformity is what makes the ecosystem composable.
Before your next project or interview, spend 45 minutes writing a bare PyTorch training loop from scratch — no tutorials, just the docs. Define a tiny model (2 linear layers), use a simple loss (MSELoss or CrossEntropyLoss), and train it on dummy data. You'll understand 80% of what real production code does. Most people skip this and feel confused forever.
The most common mistake people in our age range make with PyTorch isn't the syntax — it's the abstraction level. Lots of folks jump straight to PyTorch Lightning or Keras wrappers because they want to "write less code." That's fine for projects. But it means they can't read or debug the underlying training loop when something breaks. And things always break.
There's a real career difference between "I use PyTorch Lightning" and "I understand what Lightning is abstracting." The second person can debug a NaN loss. The first person opens a Stack Overflow tab. Both things are navigable — but knowing which one you are is important. If you're still learning the fundamentals, go deeper before you go higher.
Another common gap: not understanding the device abstraction. PyTorch tensors live on a specific device — CPU or GPU. Operations between tensors on different devices will crash with an unhelpful error. The fix is two lines of code (tensor.to(device)) but you have to understand why it's necessary. This is also why free GPU access matters, which is Lesson 3's whole topic.
requires_grad=True do when set on a PyTorch tensor?You're a junior ML engineer at a 12-person startup. A product manager just forwarded you a message from a new hire asking whether the team should migrate their training code from "raw PyTorch" to PyTorch Lightning to "save time." Your tech lead asked you to draft a recommendation. The lab AI will play your tech lead — opinionated, direct, willing to push back on weak reasoning.
Marcus is a senior studying computer science at DePaul, trying to build a portfolio project that stands out. He has an idea: a tool that reads Chicago city council meeting transcripts and automatically flags when members contradict their previous positions. Civic accountability, built with AI. He thinks he'll need six months to train a model capable of this.
His roommate, who just got a job at a data science consultancy, asks him: "Have you looked at Hugging Face? There are like 500,000 models on there. Someone probably already made something that handles legislative text."
Marcus searches. In 20 minutes he finds a fine-tuned BERT model trained on political speech. In 40 minutes he has it running locally. In three hours he has a working prototype. The six-month project became a weekend project — not because the AI got easier, but because the distribution of AI got way better.
Hugging Face started as an NLP chatbot company in 2016. By 2018, they'd pivoted to building developer tools around transformer models, releasing the Transformers library. That library became the de facto standard for working with pretrained language models. Then they launched the Hub — a repository for sharing models, datasets, and demo apps — and everything changed.
As of mid-2024, the Hugging Face Hub hosts over 750,000 models and 150,000 datasets, with thousands being uploaded weekly. The platform is community-run in the same way GitHub is — anyone can upload, fork, and build on others' work. But unlike GitHub, it's optimized for model artifacts: versioned model weights, inference APIs, and standardized metadata that makes it searchable.
The company has raised over $235 million in funding and is valued around $4.5 billion. But its real leverage is softer than that: it's the place where open-source AI momentum lives. When Meta releases Llama, it goes on Hugging Face. When Stability AI releases a new image model, it goes on Hugging Face. When a grad student fine-tunes something useful, it probably goes on Hugging Face.
The transformers Python library is the technical core of Hugging Face's value. It standardizes how you load and use pretrained models across hundreds of architectures — BERT, GPT-2, T5, Llama, Whisper, CLIP, and hundreds more. The API is consistent enough that switching from a sentence classifier to a text generator is essentially changing two lines of code.
The from_pretrained() pattern is what unlocks everything. Pass a Hub ID, get back a fully initialized model with weights trained on billions of tokens. You're not starting from scratch — you're starting from a very good starting point and adapting it to your specific problem.
Once you have a pretrained model from Hugging Face, you have three main ways to make it useful for your specific problem:
Prompting: Just write better inputs. Works immediately, costs nothing, requires no training. Best when the model is already capable of the task and you just need to guide it. Worst when you need consistent structured outputs or domain-specific vocabulary the model doesn't know.
RAG (Retrieval-Augmented Generation): Connect the model to a vector database of your own documents. The model retrieves relevant chunks before generating. Best when you have a specific knowledge base the model wasn't trained on. Doesn't update the model's weights — it just gives it better context.
Fine-tuning: Actually train the model further on your data, adjusting weights. Best when you need the model to change its style, learn new facts persistently, or specialize deeply. Most expensive in time and compute. Hugging Face's PEFT library (Parameter-Efficient Fine-Tuning) lets you fine-tune massive models by only updating a small subset of parameters — LoRA adapters being the most popular technique.
A lot of people in the "I'm learning AI" crowd immediately jump to fine-tuning because it sounds more technical and impressive. Then they spend two weeks setting up a training pipeline for a task that good prompting would have solved in an hour. Fine-tuning is a last resort, not a first move. Exhaust prompting and RAG first — they're faster, cheaper, and often good enough.
Hugging Face Spaces is a hosting platform for ML demos, built on Gradio or Streamlit. It's free for CPU-tier apps and cheap for GPU-backed ones. More importantly, it's where the community discovers what's possible — good Spaces get linked in newsletters, discussed in Discord servers, and sometimes go viral in the ML Twitter/X community.
If you're building a portfolio, a working Space is worth five GitHub repos of training scripts. Recruiters and researchers can run it without cloning anything. It demonstrates that you can go from model to deployed product, which is the gap most people can't cross. Put your project on a Space. Use Gradio — it takes 10 lines of Python.
This week, search the Hugging Face Hub for a model relevant to something you care about — your field of study, a hobby, a problem you've seen. Download it, run it locally on three examples, and think about what you'd change. You don't have to build anything. Just form a relationship with the tool. The people who are comfortable with Hugging Face before they need it are the ones who ship things fast when they do.
You're consulting for a small edtech startup. They want to build a feature that automatically generates quiz questions from uploaded textbook chapters (PDF → text already handled). They have a $200/month compute budget and a one-developer team. The AI plays their CTO — skeptical, budget-conscious, and asking you to justify your Hugging Face model recommendation specifically.
Keisha is trying to fine-tune a small language model on a dataset of her own journaling entries — a personal project, nothing commercial, just curiosity about whether a model trained on her own writing might feel different from ChatGPT. She doesn't have a GPU. Her laptop has integrated graphics. She posts in an ML Discord asking how people run training without spending money.
The responses are a mess. One person says Google Colab is fine. Another says Colab disconnects too often to be useful. A third recommends Kaggle. A fourth says she should just get a Lambda Labs instance. Someone says Colab Pro is worth it. Nobody agrees, and nobody explains why any of these apply to her specific situation.
Here's what nobody told her clearly: each free tier has specific constraints that matter depending on what you're doing. The person who says Colab is fine is running 10-minute inference jobs. The person who says it disconnects is running 6-hour training runs. Both are correct. The framework for choosing isn't "which is best" — it's "what does my workload look like?"
Google Colab free tier gives you a T4 GPU with 15GB of VRAM and 12–16GB of RAM. The catch everyone mentions is the disconnection policy: after 12 hours of runtime, your session terminates. If your browser tab is idle for too long, it also disconnects. Your files don't persist beyond the session unless you mount Google Drive or save explicitly.
For the right workloads, this is genuinely great. Running inference on a medium-sized model? 20 minutes, done. Fine-tuning a small model for a few epochs? Totally feasible in a single session. Experimenting with a new architecture or debugging a training loop? Perfect. The T4 is a real GPU — not a toy.
Where Colab fails: anything requiring more than one continuous session. Long training runs where you're saving checkpoints and resuming. Jobs that exceed 12 hours. Multi-GPU workloads (free tier is single GPU). Large model inference where 15GB VRAM isn't enough (a 7B-parameter model in float16 needs ~14GB — tight).
Kaggle's free GPU tier is consistently underrated. You get 30 hours of GPU per week (T4 or P100) with sessions up to 9 hours. Unlike Colab, Kaggle notebooks save their state and outputs — your files persist as dataset artifacts. You can also schedule notebooks to run without a browser window open, which Colab free tier doesn't support.
The 30-hours-per-week limit sounds restrictive but is actually generous if you're thoughtful. 30 hours is enough to fine-tune a 1B-parameter model with LoRA on a reasonable dataset. Many competition winners do serious work entirely on Kaggle free tier. The platform also has a massive dataset library built in, so if your project uses public data, you might not even need to upload anything.
The real Kaggle advantage: session persistence and scheduled runs. You can kick off a training job, close your laptop, and come back to results. This fundamentally changes what's feasible compared to Colab free, where you need to babysit the browser.
Use Colab when: you need fast iteration, you're debugging or experimenting, your job fits in one 12-hour session, or you want Drive integration for files.
Use Kaggle when: your training run exceeds one session, you need persistent outputs, you want to walk away and come back, or you're working with public datasets already on the platform.
Eventually, free tiers hit walls. Here's when it becomes worth paying:
You need a specific GPU: A100s have 80GB VRAM, enabling models that simply can't fit in 15GB. If you're working with 13B+ parameter models without quantization, you might genuinely need one. Lambda Labs and Vast.ai offer A100s for $1–3/hour — often dramatically cheaper than AWS or GCP for the same hardware.
Your training run takes days: Anything over 30 hours of total GPU time per week exceeds what Kaggle gives you for free. If you're training a model seriously — not just experimenting — budget for it. A multi-day run on a $1/hour GPU might cost $20–50 total. That's reasonable for a project that matters.
You need reliability: Free tiers are interrupted. If you're running something for a deadline — a paper submission, a product launch, a client demo — pay for dedicated compute where you control the runtime.
Before spending any money on compute, do this: estimate your GPU-hours. How many training steps? At what batch size? What's your model's forward-pass time per batch? Multiply it out. If the total comes to under 25 hours, Kaggle free tier handles it. Under 12 hours and it's a clean Colab run. Over 50 hours? Budget $30–80 on Lambda Labs or Vast.ai — it's cheaper than Colab Pro+ if it's just one job. The people who waste money on compute are the ones who didn't estimate first.
Here's the issue that trips up most people new to cloud notebooks: your environment is ephemeral. Every time you start a new Colab session, you get a fresh machine. Your pip installs from last session? Gone. Your downloaded model weights? Gone (unless you saved them to Drive). Your environment variables? Gone.
The fix is to treat setup as code. At the top of every notebook: install your dependencies, mount your storage, and load your data. Make the setup cell idempotent — running it twice shouldn't cause errors. This is the same discipline production engineers use for containerized environments, and the habit transfers directly.
Kaggle is better here because it lets you add packages to a "persisted" environment that survives sessions. But even there, getting into the habit of explicit, script-level environment setup is valuable. When you eventually move to a real server or a Docker container, you'll already think this way.
A friend is working on a capstone project: fine-tuning a 3B-parameter language model on a 50,000-example dataset for 5 epochs. They have no GPU, no budget, and a deadline 10 days away. They want your advice on which free platform to use and whether they'll need to spend money. The AI plays your friend — eager but inexperienced, with follow-up questions about specifics.
Darius has been "learning deep learning" for eight months. He's completed three courses, read two textbooks, and watched countless YouTube tutorials. He can explain backpropagation at a dinner party. He has not shipped a single project.
He posts in an online forum: "I feel like I know the theory but I can't turn it into anything real. Every time I try to start a project I get stuck on setup before I even get to the model."
The replies are split between "just build something" (useless advice) and "you need to learn X first" (the same trap he's in). What nobody says clearly is: the gap between understanding tools and using them is entirely about workflow, not knowledge. Darius doesn't need to learn more. He needs a repeatable process that gets him from idea to running code in under 30 minutes.
This is the actual workflow that experienced ML practitioners use for new projects — not the idealized version, but the one that accounts for time constraints, imperfect data, and free compute limits.
Step 1 — Define the task precisely. Not "I want an AI that understands text" but "I want a model that classifies customer support tickets into 8 categories with at least 85% accuracy on our test set." The more specific, the easier every subsequent decision becomes. Task type determines model family. Model family determines data format. Data format determines preprocessing. Vague tasks create vague projects that never ship.
Step 2 — Search the Hub before writing code. Go to huggingface.co/models. Filter by task. Look for models with the most downloads and recent updates. Check the model card — does it describe training data similar to your domain? Read the example code. Can you get a baseline running in under 15 lines? If yes, that's your starting point. If nothing close exists, now you know you're in fine-tuning territory.
Step 3 — Get something running on CPU first. Seriously. On your laptop, in whatever environment you have. Run a forward pass. Feed it a single example and inspect the output. Does it make sense? Is the output shape correct? Are you getting reasonable logits? This step costs you 20 minutes and saves you 3 hours of debugging on GPU where the iteration cycle is slower.
Step 4 — Move to Kaggle/Colab for real training. Set up your notebook with the setup-as-code discipline from Lesson 3. Install dependencies in the first cell. Load your data from the Hub or Drive. Confirm your GPU is active (torch.cuda.is_available()). Run one epoch, inspect the loss curve. Is it decreasing? Are you getting vanishing gradients? Now you're iterating on the real problem.
Step 5 — Ship something, even if small. A Hugging Face Space with your model. A GitHub repo with a working inference script. A demo notebook with clear output. The threshold should be: can someone else run this and see a result? Perfectionism at this stage is just a different name for not shipping. A working prototype beats a perfect plan every time.
Knowing the common failure modes saves hours. These are the real ones, not the textbook ones:
torch.nn.utils.clip_grad_norm_) as a first fix. Then check your loss function inputs for zeros or infinities.
model.gradient_checkpointing_enable()). Use mixed precision (torch.cuda.amp.autocast()). If you're still out of memory, you need a model with fewer parameters or more VRAM.
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')), then consistently call .to(device) on every tensor and the model.
Most people we know in this space spend too much time reading about ML and not enough time with error messages. Error messages are where the real learning happens — they force you to understand the actual failure mode, not the idealized explanation. Every error you've debugged and understood is worth five tutorials you've passively watched. This is the discipline that separates people who ship from people who study indefinitely.
Here's how the three tools of this module actually connect in a real project. Say you want to build a system that identifies whether a Reddit post is asking for advice versus venting (a real classification task used in mental health research).
Hugging Face: Search the Hub, find mental-health-research/roberta-base-mental-health or similar, load it with AutoModelForSequenceClassification.from_pretrained(). You have a pretrained base with domain-relevant pretraining. Add a classification head for 2 classes (it's built in when you specify num_labels=2).
PyTorch: Write your training loop. DataLoader feeds batches of tokenized text. Forward pass produces logits. Cross-entropy loss. Backward pass. AdamW optimizer step. You're doing this in raw PyTorch so you can inspect the loss at each step, plot it, and catch issues immediately. The loop is 30 lines.
Kaggle: Your dataset has 20,000 examples, you're running 3 epochs with LoRA on a 125M-parameter model. Estimated GPU time: ~4 hours. Kaggle free tier handles it in one session. You push the trained model back to the Hub with model.push_to_hub("your-username/reddit-advice-classifier"). Ship a Space. Done.
Knowing PyTorch syntax is table stakes. Knowing how to search the Hub efficiently is learnable in a weekend. Knowing when free compute is enough is just arithmetic. The actual skill — the one that makes someone genuinely useful in an ML context — is judgment under constraint: given a real problem, a real deadline, and a real resource limit, what is the fastest path to something working?
That judgment develops through doing, not reading. The people around you who are shipping things aren't smarter — they've just built the habit of starting with inadequate information and iterating. The tools in this module are specifically designed to lower the cost of starting: Hugging Face gives you a standing start with pretrained models, PyTorch gives you a debuggable loop, and free compute gives you a GPU without a credit card.
The only thing left is to start. Specifically — tonight, if you can.
Pick one small, specific problem you actually care about. Find a Hugging Face model related to it. Load it in a Kaggle notebook. Run it on three examples. Inspect the output. That's all. You don't need to train anything, fine-tune anything, or ship anything today. Just form a working relationship with the full pipeline — model to inference — in an environment with real GPU access. Most people never do this first step, which is why most people stay in tutorial mode indefinitely.
You're pitching a small AI project to a senior ML engineer at a company you want to intern at. They've agreed to review your project plan before you start building. The AI plays the engineer — experienced, direct, will push hard on vague answers. They want to see that you can connect tools to requirements, not just name-drop frameworks.