L1
·
Quiz
·
Lab
L2
·
Quiz
·
Lab
L3
·
Quiz
·
Lab
L4
·
Quiz
·
Lab
Module Test
Module 4 · Lesson 1

The Research Stack: Finding Real Problems Before You Build

Most side projects die in the gap between "this seems like a good idea" and actual evidence that anyone needs it.
How do you use AI to compress weeks of research into a single focused session?

Marcus, a junior at Georgia Tech studying CS, spent three weeks building a Chrome extension that tracked study hours and sent weekly reports to users. He used React, hooked up a Firebase backend, and deployed to the Chrome Web Store. Total downloads after two months: eleven. Seven of those were his roommates and a cousin.

When a classmate asked him what problem the extension solved, Marcus said it helped students "stay accountable." When she pressed — accountable to whom? for what outcome? compared to what they were already doing? — he couldn't answer. He had built something nobody asked for, in a space crowded with better-funded competitors, without one hour of structured research.

The painful part: he had access to every AI tool he needed to figure this out in advance. He just didn't know what to do with them, or in what order.

Why the Research Stage Kills Most Projects

Here's the pattern that plays out constantly in dorm rooms and Discord servers: someone has an itch — a workflow that annoys them, a niche they're part of, a feature some app is missing — and they jump straight into building. This feels productive. It feels like momentum. The code editor is open, the coffee is hot, you're in the zone.

But "scratch your own itch" only works if enough other people have the same itch and can't already scratch it. That's a compound condition with three parts, and most people only verify the first one.

AI tools fundamentally change the research phase because they can help you do something humans are bad at: systematically pressure-test an idea before you're emotionally attached to it. They can surface competitor landscapes in minutes, synthesize Reddit threads from a dozen subreddits simultaneously, and help you design interview questions that don't accidentally lead the witness.

The catch is that AI is only as good as the questions you ask it. If you walk into a research session already believing your idea is good, you'll unconsciously prompt your way to confirmation. The tools don't protect you from yourself — you have to build that discipline in.

Peer Reality Check

A lot of people in the 18–24 range are building things right now with basically zero structured research. That's not a criticism — the tools to do this well didn't really exist at scale until 2022–2023, and nobody formally taught this workflow in most high schools. You're not behind. But you can get ahead fast by just adding one research sprint before you open the code editor.

The Four Research Jobs AI Can Actually Do

Think of AI research tools as doing four distinct jobs. They're not all the same tool, and they're not all equally good at each job. Being explicit about which job you need helps you pick the right approach.

Job 1: Landscape Mapping
Understanding who already exists in your space — competitors, adjacent products, communities, and distribution channels. Tools like ChatGPT or Claude with web access, Perplexity AI, and You.com are strongest here. Ask for "a landscape overview of [space]" with follow-up prompts drilling into specific segments.
Job 2: Problem Synthesis
Extracting real complaints and unmet needs from public forums, reviews, and communities. Reddit (via AI-assisted search), App Store review scraping, and G2/Trustpilot synthesis prompts work well. The goal is identifying language people use naturally when frustrated — that language becomes your messaging later.
Job 3: Interview Design
Using AI to generate customer discovery questions that surface real behavior rather than hypothetical preferences. This is about designing the right research instrument. Claude and ChatGPT are good at critiquing leading questions and suggesting open-ended alternatives.
Job 4: Assumption Mapping
Explicitly listing every assumption your idea depends on and ranking them by risk. This is the one most people skip. Ask your AI tool: "What are the ten riskiest assumptions in this business model?" and be prepared to hear things you don't want to hear.

Marcus's mistake wasn't a lack of intelligence or skill. He was great at building. He just skipped all four of these jobs because nobody had told him they existed as a category. Once you see them as a workflow, you can systematically run through them before a single line of code is written.

Running a Research Sprint: The One-Session Framework

A research sprint is a focused 2–3 hour session with a clear deliverable: a one-page document that either confirms enough demand to prototype, or gives you permission to kill the idea early and cheaply. Here's the sequence:

  • Start with Perplexity AI and ask for a landscape overview of your target space. Don't filter the results — read everything, including the stuff that makes your idea look less original than you hoped. Ask: "What are the most common complaints users have about existing solutions in [space]?"
  • Move to Reddit via AI. Use ChatGPT with browsing or Perplexity to synthesize threads from relevant subreddits. Look for posts that describe a struggle, not just a preference. Struggling posts are gold; preference posts are noise.
  • Ask your AI to steelman your competition. Literally prompt: "Give me the strongest possible argument for why someone would choose [top competitor] over any new entrant." If the answer is compelling, that's information you need.
  • Use Claude to generate 8 customer discovery questions, then ask it to critique each one for leading language. Revise until the questions surface behavior, not opinions. ("Walk me through the last time you tried to solve this" beats "Would you use a tool that does X?")
  • Run the assumption map. Describe your concept in 2–3 sentences and ask: "What are the ten most dangerous assumptions embedded in this idea, ranked by probability of being wrong?" Sit with the top three. If you can't address them, your idea has structural risk.

By the end of this sprint, you should know: whether the problem is real, whether solutions already exist, what would have to be true for you to win, and what questions only real users can answer. That last category defines your next step — actual conversations with humans, not more AI research.

Practical Takeaway

Before your next project, block a single 2-hour calendar event called "Research Sprint." Use the five-step sequence above. Write a one-page output. If the output doesn't reveal at least two things you didn't already know about the space, your prompts weren't specific enough — try again with more constraints. The goal isn't to feel confident. The goal is to be less wrong.

What AI Research Cannot Do

Let's be honest about the limits, because overstating AI's research capability is its own trap. AI tools trained on static datasets can't tell you what's happening in very niche communities that don't have much online presence. They can hallucinate competitor details — especially funding rounds, product features, or pricing that has changed recently. And they absolutely cannot replace talking to actual potential users.

AI synthesizes existing public knowledge. It doesn't generate new signal. If your target audience is people who barely use the internet to talk about their problems — tradespeople, certain immigrant communities, older professionals — online AI research will return almost nothing useful and you'll have to do primary research the old-fashioned way: phone calls, community centers, actual conversations.

The rule: use AI to figure out what questions to ask humans, and use humans to answer those questions. In that order. AI can dramatically shrink the time from "vague idea" to "specific hypotheses I can test with real people" — but it can't skip the real-people step entirely.

Lesson 1 Quiz

The Research Stack — 5 questions
1. Marcus's Chrome extension failed primarily because:
That's the core issue. "Scratch your own itch" only works if enough others have the same itch and can't already scratch it — a three-part condition Marcus never verified.
The tech stack and distribution channel were fine. The underlying failure was skipping structured research before building. That's a process problem, not a technical one.
2. You're planning an app for freelance photographers. Which research job is Perplexity AI best suited for in your first session?
Right. Perplexity and similar web-search AI tools are strongest at landscape mapping — pulling together a broad overview of an existing space quickly. The other jobs need different tools or different approaches.
Think about what Perplexity actually does: it synthesizes publicly available information at scale. That maps most directly to understanding the existing landscape — who's already out there, what they do, how they're positioned.
3. In customer discovery interviews, "Walk me through the last time you tried to solve this" is stronger than "Would you use a tool that does X?" because:
Exactly. People are notoriously bad at predicting their own future behavior. What they've actually done in the past — specific, concrete, recalled — is far more predictive than what they say they would do in a hypothetical scenario.
The distinction is behavioral vs. hypothetical. Behavior-based questions pull out what people have actually done; hypothetical questions often just validate whatever the interviewer seems to want to hear.
4. You're building a tool for immigrant small business owners who rarely discuss their problems online. How should you adjust your AI research approach?
Good call. When the target audience leaves little online footprint, AI's landscape mapping and synthesis jobs mostly fail. You shift AI's role to helping you design better primary research — and then go talk to actual people. That's the right calibration.
Model power doesn't solve a data availability problem. AI synthesizes existing public information — if that information doesn't exist online, no model can generate it. You need a different approach for low-footprint communities.
5. The primary purpose of asking AI to "steelman your competition" during research is:
Right. Steelmanning forces you to genuinely understand why incumbents are winning — not to dismiss them. If the best argument for the existing solution is more compelling than your differentiation, that's critical information before you invest months of work.
Steelmanning isn't about finding weaknesses to exploit — it's about finding the strongest case for the other side. If that case is more compelling than yours, you need to know that now, not after launch.

Lab 1: The Research Sprint Advisor

Practice running a structured AI research session on a real idea

Your Role: Idea Owner. The AI's Role: Research Challenger.

You're going to bring a real project idea — something you've actually thought about building, or the closest thing to it. The AI will act as a research advisor who's seen hundreds of projects fail from insufficient pre-build research. It will push back, ask hard questions, and help you run the four research jobs before you fall in love with the idea.

Don't bring a polished pitch. Bring a rough thought. The messier, the better — that's what research is for.

Start by describing your side project idea in 2–3 sentences. Include who you think the target user is and what problem you think you're solving. Don't overthink it — rough is fine.
Research Advisor
AI Lab
Let's run your idea through a proper research sprint before you write a single line of code. Give me a rough description — 2 or 3 sentences: what are you thinking of building, who's it for, and what problem does it solve? Raw is fine. I'd rather see the unpolished version.
Module 4 · Lesson 2

The Build Stack: AI as Your Fastest Developer and Harshest Critic

The build stage isn't where most side projects die — it's where they get fat, slow, and expensive. AI changes the speed equation dramatically, but only if you're disciplined about scope.
Which AI tools actually accelerate building, and which ones just make your codebase bigger?

Priya had done the research. She'd talked to twelve people in her target market — college athletes trying to manage NIL deals — and identified a real gap: nobody was helping them track their brand commitments, deadlines, and payments in one place. She had a validated problem, a clear user type, and genuine enthusiasm.

Then she sat down to build. Three months later, she had a half-finished dashboard with eight partially working features, a backend that had been rewritten twice, and a growing sense of dread. She'd used GitHub Copilot to speed-write code — and it had worked, sort of. The problem was that Copilot was great at generating more feature code, and she kept accepting its suggestions. Every session added scope. None removed it.

By month three, the MVP she'd planned as a six-week project had ballooned into something she was afraid to show people because it was simultaneously too complex and not functional enough.

The Build Stage Problem Isn't Speed — It's Scope

AI coding assistants have genuinely changed what a single developer can ship in a week. GitHub Copilot, Cursor, and Claude's Artifacts can write boilerplate, suggest completions, debug weird errors, and draft entire components faster than most developers can spec them. This is real. It's not hype.

But the same capability that accelerates building also accelerates scope creep. When code is cheap to generate, the psychological cost of adding another feature drops to near zero. "Maybe I should also add a notifications system" becomes a 20-minute Copilot session instead of a three-day detour — which sounds good, but it means you build six things nobody asked for in the time it used to take you to build one.

The constraint that kept MVPs minimal was the cost of building. Remove that constraint without replacing it with something else — a rigid feature list, a ruthless stakeholder, a launch deadline — and scope explodes.

The answer isn't to slow down. The answer is to use AI equally aggressively for scoping and cutting as you do for building. For every feature you consider adding, ask your AI to argue against it. Build that muscle deliberately.

What Peers Are Getting Wrong Right Now

The most common pattern in AI-assisted side projects in 2024–2025: people ship more code, faster, but take longer to actually launch something usable. More code, later launches. The tools are being used to build more — not to build smarter. If your six-week MVP is still going at month four, scope creep is probably the culprit, not technical difficulty.

The Right Build Stack by Project Type

Not every project needs the same tools. Here's a practical map based on what you're actually building:

Web App
Cursor + Claude
Cursor's AI-native editor with Claude Sonnet handles complex component logic well. Use for anything React/Next.js based. Strong at refactoring existing code, not just generating new code.
No-Code/Low-Code
Bolt.new + Lovable
Prompt-to-app tools that generate functional prototypes from descriptions. Best for validating UI/UX fast. Don't use these if you need a real production codebase — they generate hard-to-maintain output.
API/Backend Logic
GitHub Copilot
Best for in-editor completions on backend code — route handlers, database queries, auth logic. Strong when you know what you want but don't want to write boilerplate. Weaker at architectural decisions.
Architecture Review
Claude (long context)
Paste your entire codebase structure and ask for a critical review. Claude is unusually good at identifying over-engineering, missing abstractions, and security gaps. Do this monthly at minimum.
Debugging
ChatGPT + Stack Trace
Paste the full error and surrounding context. Don't summarize the error — copy the exact message. ChatGPT o1/o3 is particularly strong at multi-step debugging with complex dependency chains.
Mobile Apps
Claude + React Native
For mobile prototypes, Claude handles React Native component generation well. Pair with Expo for rapid device testing. Don't try to ship native Swift/Kotlin with just AI assistance unless you already know those languages.

The pattern: use AI-native editors (Cursor, Copilot) for incremental work inside an existing codebase, and use conversational AI (Claude, ChatGPT) for architectural decisions and full-system reviews. They're solving different problems. Mixing them up wastes both.

The Three-Question Filter Before Accepting Any AI-Suggested Feature

Priya's actual fix was simple: she started asking three questions before accepting any AI-generated code that added new functionality (as opposed to fixing existing functionality).

  • "Did at least one user in my research ask for this specific capability?" If the answer is no, it goes in a backlog document, not in the codebase. This sounds obvious. It's extremely hard to do when the AI has already written the code and it looks clean.
  • "Can someone accomplish their core job without this feature?" If yes, the feature is a nice-to-have. Nice-to-haves don't go in a v1. Period. Especially not an AI-generated nice-to-have that you didn't plan and didn't validate.
  • "What is the maintenance cost of this feature if it breaks?" AI-generated code breaks. Usually at inconvenient times. Every feature you add is a feature you'll eventually debug, update when dependencies change, and explain to future users. Ask Claude to estimate maintenance burden explicitly — it's surprisingly honest about this.

Using AI to argue against your own features is counterintuitive but it's one of the highest-leverage habits you can build. Try this prompt literally: "Here are the features I'm considering adding to my MVP. Argue against each one. Don't be polite." The pushback you get is more useful than any technical advice.

Practical Takeaway

Create a two-column document right now: "In v1" and "Backlog." Before every build session, review which column you're working from. If you're in "In v1," use AI to build fast and clean. If AI suggests something new, write it in "Backlog" before you accept the code. Don't let the tool decide what's in scope — that's your job, and it's the only job AI can't do for you.

When AI Makes You a Worse Developer

There's a real skill degradation risk worth naming. If you're early in your programming journey — first or second year of serious coding — using AI assistants to write the hard parts of your code will slow your actual development as a developer. You'll ship faster in the short term and understand less in the long term.

This isn't a reason to avoid AI tools. It's a reason to be deliberate. When you're learning, use AI to explain code and help you understand errors — not to write code you can't read. When you're building something you already understand at a conceptual level, use AI to speed the execution. The line between learning and building matters, and it shifts as your skills develop.

The developers who are going to be genuinely powerful in five years are the ones who can work with AI while maintaining enough foundational knowledge to catch AI's mistakes, understand what's happening under the hood, and make architectural decisions the AI can't make well. That foundation requires actually writing some hard code yourself, even when it would be faster to outsource it to the model.

Lesson 2 Quiz

The Build Stack — 5 questions
1. Priya's MVP ballooned to four months primarily because:
Exactly right. When building becomes cheap, the psychological barrier to adding features collapses. Priya needed a replacement constraint — a rigid feature list, hard launch date, or active "no" process — that she didn't put in place.
The problem wasn't technical. Copilot generated code fine. The problem was that cheap code generation removed the natural scope constraint, and nothing replaced it.
2. You're building a Next.js web app and want AI help on refactoring a messy component file. The best tool choice is:
Right. Cursor (AI-native editor) is the right tool for in-codebase work like refactoring. Bolt.new generates new projects from scratch, not refactors. Perplexity is a research tool. ChatGPT/Claude is better for architecture reviews than file-level refactoring.
Match the tool to the job. Bolt.new builds new apps from prompts. Cursor works inside your existing codebase. Refactoring is an in-codebase job, so Cursor wins here.
3. AI suggests adding a user notification system to your MVP. According to the three-question filter, what's the first thing you should check?
Right. The first filter is whether real users asked for it. If not, it goes to backlog regardless of how clean the AI's implementation looks. Code quality is irrelevant if the feature isn't needed.
Timeline and code quality are secondary concerns. The first gate is validation: did any real user in your research express a need for this? If no, it doesn't matter how fast or clean it is.
4. You're a first-year CS student building a personal project. AI offers to write the authentication system for you. What's the most strategically sound approach?
Good judgment. Using AI to explain then learn, rather than to replace understanding, maintains your foundational skills. You'll be able to catch AI mistakes later, make architectural decisions, and actually understand what's running in your codebase. That compounds over time.
The right answer at the learning stage is to use AI as a teacher and reviewer, not as a code generator you outsource to. If you can't read and explain every line, you're building technical debt in your own skills.
5. The most effective use of Claude for a monthly codebase review is:
Exactly. Claude's long-context window makes it unusually good at looking at a whole system and identifying structural problems — over-engineering, missing abstractions, security gaps. That's a different and more valuable use than line-by-line code generation.
The specific value of Claude for architecture reviews is its long context — it can see your whole structure at once. Use that capability for structural critique, not just feature generation or test writing.

Lab 2: The Scope Defender

Use AI to pressure-test your feature list and defend your MVP scope

Your Role: Builder Defending Scope. The AI's Role: Skeptical Product Advisor.

You're going to describe what you're planning to build for your v1. The AI will challenge each feature using the three-question filter — did users ask for it, can they live without it, what's the maintenance cost? Your job is to defend your decisions or cut features that can't survive scrutiny.

This is supposed to be uncomfortable. If the AI doesn't convince you to cut at least one thing, you're either very disciplined or not being honest.

Describe your MVP feature list — list every feature you're planning to include in your first version. Be specific. Include things that feel obvious or "small." Nothing is too minor to list.
Scope Defender
AI Lab
Alright, let's do a scope audit. Tell me every feature you're planning to ship in your v1. Don't curate the list — I want to see everything you're thinking, including things that feel "obviously necessary." I'll run each one through the three-question filter and we'll see which ones survive.
Module 4 · Lesson 3

The Test Stack: AI-Assisted QA Without a Testing Team

Testing is the stage most solo builders skip because it feels like a luxury. AI tools make it closer to mandatory — and much faster than you think.
How do you build something that actually works for real users when you're testing it alone?

Devon launched his scheduling tool for freelance videographers on a Tuesday. By Thursday, he'd gotten seven signups from a ProductHunt post — more than he expected. He was excited. Then the first message came in from a user in Toronto: "The timezone thing is broken. Every meeting is showing up 5 hours off."

Then another: "The mobile view cuts off the booking button on iPhone SE." Then a third: "Your email confirmation has my client's name as undefined." Within 48 hours, five of the seven users had churned. Devon had tested the app himself, on his MacBook Pro, in his timezone, on Chrome. He had essentially tested for one user: himself.

The frustrating part wasn't that bugs exist — bugs always exist. It was that two of those three issues were the kind AI could have caught in twenty minutes if he'd known to ask.

The Solo Builder Testing Problem

When you're the only person building and testing something, you have a fundamental blind spot: you know how it's supposed to work, so you unconsciously navigate around the broken parts. You never type into the field the way a real user would. You never try the mobile view on a phone you haven't been staring at for three months. You never try a timezone different from yours because it never occurs to you that it matters.

Professional QA teams exist to be people who don't know how it's supposed to work — and that ignorance is valuable. AI can partially replicate this by helping you generate test cases you wouldn't have thought of, simulate edge cases systematically, and review your code for common categories of bugs before you ship.

The key word is "partially." AI can help you catch a category of bugs — logic errors, obvious UX failures, common security issues, edge cases in data handling. It cannot replace someone actually using your app with genuine intent to accomplish something. Both are necessary. Neither is sufficient alone.

Peer Reality Check

Most people in their first or second year of shipping projects skip structured testing almost entirely. "I tested it" usually means "I clicked through it once and it worked for me." That's not testing — that's demonstration. The gap between demonstration and real-world robustness is where side projects get humiliated in public. You don't have to do professional QA. But you do have to do more than clicking through it once.

Four Categories of AI-Assisted Testing

AI testing assistance is most valuable in four distinct areas. These are not the same thing — each requires a different approach and different prompting strategy.

Edge Case Generation
Ask Claude or ChatGPT to generate a list of edge cases for any given feature. Prompt: "I have a [feature description]. What are the 15 most likely edge cases a real user might encounter, including bad inputs, unexpected device contexts, and network conditions?" Then manually test each one. This alone would have caught Devon's timezone bug.
Code Review for Bug Categories
Paste a function or component and ask AI to identify common bug patterns: race conditions, unhandled nulls, missing error states, improper async handling. Claude is particularly good at spotting "undefined" errors in template strings — exactly the kind that showed Devon's users their name as "undefined."
Automated Test Generation
Copilot and Claude can write unit tests for functions you've already written. This is only useful if you actually run the tests — but having AI generate test scaffolding removes the friction that makes most solo builders skip this step. Even basic test coverage reveals brittleness you'd otherwise miss.
UX Error-State Auditing
Describe your UI flows to an AI and ask: "What happens if the user does X before Y? What if they leave the page mid-form? What if the API call fails?" AI can surface failure paths your happy-path testing would never cover. Map each failure path to a visible error state — not a blank screen or console error.

Devon's issues would have been caught by a combination of edge case generation (timezone contexts) and UX error-state auditing (undefined template variable, mobile viewport breakpoints). Neither required sophisticated tooling — just a prompt and 20 minutes.

Building a Pre-Launch Checklist with AI

One of the most practical things you can do with AI before any launch is generate a custom pre-launch testing checklist for your specific product. Generic checklists miss your particular architecture and use cases. AI-generated ones can be tailored.

  • Describe your app in detail to Claude, including the tech stack, key user flows, and any external services (payments, email, auth, APIs). Ask: "Generate a pre-launch testing checklist specific to this app. Include device/browser coverage, edge cases per feature, error state coverage, and security basics."
  • Add context about your user profile. "My users will primarily be on mobile, in multiple timezones, with variable internet connections." The checklist changes significantly when you specify this — Devon would have gotten timezone coverage if he'd included his geographic user profile.
  • Ask specifically about your external dependencies. Every third-party service is a potential failure point. "What can go wrong when Stripe's payment webhook is delayed? When Resend fails to deliver the confirmation email? When the Google Calendar API rate-limits me?" Get specific failure scenarios and verify you handle each one visibly.
  • Run the checklist with a real person. Not a developer. Give the checklist to a friend, roommate, or family member and watch them try to use the product without explaining anything. Don't explain. Watch. What breaks? What confuses them? AI can generate the checklist; only humans can run it authentically.

The goal of this process isn't a perfect product. There is no perfect product. The goal is eliminating the category of bugs that cause immediate churn — the ones that make first-time users feel like they got a half-finished product. Those are almost always the simple, catchable ones that a checklist would surface.

Practical Takeaway

Before your next launch, spend one hour with Claude generating a custom pre-launch checklist. Include your tech stack, your user profile (device, location, connection), and every external service you depend on. Then physically check off every item. If you skip an item, write down why — that discipline alone will prevent most launch-day embarrassments. The checklist is not bureaucracy. It's the difference between a public win and a public apology post.

What AI Testing Misses (And What to Do About It)

AI-generated test cases are only as good as the prompts you write. If you don't mention that your users are on Android, you won't get Android test cases. If you don't mention your payment flow, edge cases in payment handling won't appear. The output quality is tightly coupled to input completeness — which means the developer's blind spots transfer directly to the testing checklist.

More fundamentally: AI can't test for why people don't use your product. It can find functional bugs. It can't tell you that your onboarding is confusing, that the value proposition isn't clear in the first 30 seconds, or that the visual design makes users distrust the product before they even enter their data. Those are human perception problems that require human testers.

The combination that actually works: AI for systematic technical coverage, humans for realistic usage and perception. Budget time for both. A one-hour AI testing session and a one-hour session watching a real person use your product will catch almost everything that matters before a launch. Most people do neither. Doing both puts you dramatically ahead of the field.

Lesson 3 Quiz

The Test Stack — 5 questions
1. Devon's three launch-day bugs (timezone, mobile layout, undefined name) could have been caught using which two AI testing approaches?
Exactly right. Edge case generation catches timezone and device variation. UX error-state auditing catches the undefined name in the template string — ask "what if this variable is empty or null?" and you find it immediately.
Think about which testing categories map to each bug. Timezone = edge case generation. Mobile layout = device edge case. Undefined name = UX error-state audit (what happens when a template variable is null?).
2. You ask AI to generate a pre-launch checklist for your app. To get the most useful output, you should include:
Right. AI checklist quality tracks directly with input specificity. Your tech stack determines which bugs are possible. Your user profile determines which device/location edge cases matter. Your external dependencies determine which failure scenarios to cover. Generic input = generic checklist = missed bugs.
AI output quality is bounded by input specificity. The more context you give about your actual architecture and user, the more targeted and useful the checklist becomes. Generic descriptions produce generic checklists that miss your specific risks.
3. The most important limitation of AI-generated test cases is:
This is the critical limitation. If you don't mention Android users, you get no Android test cases. AI amplifies your thinking — it doesn't compensate for gaps in your thinking. That's why human testing is irreplaceable for discovering problems you haven't imagined.
The limitation isn't technical — it's cognitive. AI works from what you tell it. Your blind spots become its blind spots. It also can't test for user perception and intent — why someone doesn't trust or understand your product.
4. A friend agrees to test your app. You've generated a checklist with AI. What's the best way to structure their testing session?
Right. The value of a human tester is their authentic ignorance — they don't know how it's supposed to work. If you explain it first, you eliminate that value. Watch silently. Take notes on confusion and failure points. Those moments are the signal.
Explaining how it works before testing undermines the test. You want to observe someone who doesn't know how it's supposed to work — because that's exactly who your real users will be.
5. AI testing tools are most effective for catching which category of bugs?
Correct. AI testing is a technical layer — it catches functional bugs that have right-or-wrong answers. Perception, onboarding, and product-market fit are human judgment calls that require real users in real contexts. Different tools for different problems.
AI is strong at the technical layer — bugs that have deterministic right-or-wrong answers. Perception, trust, and product-market fit are human phenomena that can only be observed by watching real humans use the product with real intent.

Lab 3: The Pre-Launch Auditor

Generate a custom testing checklist and identify your highest-risk failure modes

Your Role: Builder Preparing to Launch. The AI's Role: Harsh QA Partner.

Describe your project — what it does, your tech stack, who your users are, and what external services it depends on. The AI will generate a custom pre-launch testing checklist, identify your three highest-risk failure modes, and push you to define what happens in each failure case.

Be specific. Generic descriptions produce generic checklists that won't actually help you.

Describe your app: what it does, your tech stack (frontend, backend, database), your target user profile (device, location, technical level), and every external service or API you depend on (Stripe, email services, third-party APIs, etc.).
Pre-Launch Auditor
AI Lab
Let's build your pre-launch testing checklist. I need specifics to make this useful — a generic "web app with users" description will get you a useless generic checklist. Tell me: what does it do, what's the tech stack, who exactly are your users (device, location, technical comfort), and what external services does it depend on? The more specific you are, the more dangerous the checklist I can generate.
Module 4 · Lesson 4

The Iteration Stack: Reading Signals, Cutting Fast, and Knowing When to Pivot

Post-launch is where most side projects go to slowly die. The ones that survive have a system for reading what's actually happening — not just what feels like it's happening.
How do you use AI to make faster, more honest decisions about whether to persist, pivot, or quit?

Aisha launched a meal-planning tool for college students with dietary restrictions in October 2024. By January, she had 340 signups, a Discord server with 80 members, and a churn rate she didn't want to look at. She knew the product was getting stickier — people who stayed past week two were coming back daily. But she also knew that most people didn't stay past week two.

She had three data streams: Mixpanel event data showing where users dropped off, Discord conversations where engaged users were asking for specific features, and a spreadsheet of feedback emails she'd been half-ignoring because they took too long to synthesize. She was making product decisions based on the Discord conversations — the loudest, most engaged, most unrepresentative slice of her user base.

What she needed wasn't more data. She had plenty of data. She needed a faster way to synthesize it honestly, without the bias toward the loudest voices. That's exactly what AI does well — when you force yourself to feed it everything, not just the data that confirms what you already believe.

The Post-Launch Trap: Loud Users Aren't Representative Users

This is one of the most consistent patterns in early-stage products: the people who talk to you are almost never representative of your actual user base. Discord power users, people who email feedback, people who tweet at you — they're the highly engaged tail. They have strong opinions about features. They love what you're building. They'd be devastated if you shut down.

They are also a terrible sample for product decisions.

The users who silently churn — who sign up, don't come back, and never tell you why — are statistically the majority of your user base in most early products. Their silence is the loudest signal, and it's the one most founders unconsciously ignore because there's nothing pleasant to do with it. You can't engage it. You can't reassure it. You can only try to understand it by looking at behavioral data, and then changing something to see if that changes the pattern.

AI's role in the iteration stage is partly analytical — helping you synthesize data you already have — and partly structural — helping you design better systems to generate usable signal going forward.

Peer Reality Check

Almost everyone building something for the first time overweights qualitative feedback from power users and underweights quantitative data from the broader base. That's not stupidity — it's human nature. Positive conversations feel more real than anonymous event tracking numbers. The fix is to make behavioral data visible and to synthesize it regularly with AI assistance, so it competes for your attention on equal footing with the Discord conversations.

The Iteration Stack: Four AI Tools for Post-Launch

After launch, your AI tool usage should shift significantly. You're no longer building — you're learning. The tools that help you learn are different from the tools that help you build.

Data Synthesis
Claude + Analytics Export
Export your event data as CSV or JSON and paste it into Claude. Ask: "Based on this behavioral data, what are the three most likely reasons users are churning after their first session?" Claude's pattern recognition on messy datasets is genuinely useful — better than staring at a chart.
Feedback Synthesis
ChatGPT + User Emails
Paste batches of user feedback emails (remove names) and ask for thematic analysis: "What are the five most common themes across these messages? Which ones appear in negative feedback but not positive feedback?" This is where Aisha's backlog of emails becomes actionable.
Pivot Analysis
Claude — Structured Debate
Describe your current situation — metrics, qualitative feedback, your gut feeling — and ask Claude to make the case for persistence, pivot, AND shutdown. Then argue back. This structured debate surfaces assumptions you're defending without realizing it.
Experiment Design
ChatGPT — A/B Test Generator
Describe a hypothesis about why users are churning and ask ChatGPT to design a minimal experiment to test it. The output will include what to change, what to measure, what sample size you need, and how long to run it. Removes the guesswork from iteration planning.

Aisha's specific fix: she exported three months of Mixpanel event data, pasted it into Claude, and asked for churn patterns. Claude identified that users who didn't complete their dietary profile in the first session had a 94% churn rate — and the profile step took an average of 7 minutes, which almost nobody completed on mobile. She'd been building recipe features while a 7-minute onboarding form was killing her retention. That was a ten-minute AI analysis session on data she'd had for three months.

The Persist / Pivot / Quit Framework

One of the most uncomfortable moments in any side project is deciding whether what you're building is working well enough to keep going, needs a significant directional change, or should be shut down to free your time for something better. Most people make this decision based on vibes — how exciting it still feels, how much people in their life seem interested, how much they've already invested.

None of those things are actually signals about whether the product is working. Here's a more structured framework for making the persist/pivot/quit call:

  • Persist signals: Retention is improving week-over-week, even from a small base. At least one user segment has strong retention and can articulate why. The growth problem is distribution, not product — people who try it, like it. If these conditions hold, persistence is usually right.
  • Pivot signals: Retention is flat or declining despite product changes. One segment of your users is engaged but it's not the segment you built for. You've discovered an adjacent problem you could solve better with what you've already built. Pivots work when you pivot to something your existing work gives you an advantage at — not when you pivot to something completely unrelated.
  • Quit signals: The core behavioral loop doesn't create value for any identifiable segment, and you've had enough users to know. You've been at it for 6+ months with no meaningful retention signal. The opportunity cost — in time, energy, and what else you could be learning — exceeds the expected value of continuing. Quitting is not failure. It's evidence-based resource reallocation.

Use Claude to run the framework on your actual data. Paste in your retention numbers, the feedback themes you've synthesized, and your gut feeling, then ask: "Based on this, make the case for each of persist, pivot, and quit. What's missing from my data to make a confident call?" The AI won't make the decision for you — but it will surface what you're avoiding looking at.

Practical Takeaway

Set a standing calendar event every two weeks for an "iteration review." In that session: export your behavioral data, paste your user feedback into Claude or ChatGPT for synthesis, and run the persist/pivot/quit framework against your current numbers. Keep a running document of what you learn each session. The most dangerous thing in a side project is the slow drift — when you're not making decisions, time is still passing, and the opportunity cost is real. Make the call consciously, on a schedule, with data. That's what separates people who ship things that work from people who work on things that never ship.

When the Loop Closes: Research to Build to Test to Repeat

The full loop — research, build, test, iterate — isn't a linear sequence you run once. It's a continuous cycle, and each pass should be faster than the last. The research you do after your first launch is more valuable than the research you did before it, because now you have real behavioral data instead of hypotheses. The testing you do after your first bug reports are more targeted because you know where your architecture is brittle.

AI tools accelerate every stage of this cycle, but they accelerate different stages differently. Research: AI compresses background synthesis dramatically but can't replace user conversations. Build: AI generates code fast but shouldn't decide scope. Test: AI surfaces systematic edge cases but can't replace observed human use. Iterate: AI synthesizes signal honestly but can't make the judgment call about what to do next.

The through-line is that AI is a tool that extends your capacity without replacing your judgment. The judgment — about what's worth building, what's actually working, when to quit — is yours. And your judgment gets better with every loop you run. That's the actual compound return of building side projects: not the product itself, but the decision-making muscle you develop by shipping real things and watching real people react to them.

Lesson 4 Quiz

The Iteration Stack — 5 questions
1. Aisha was making product decisions based on her Discord community feedback. What was the primary problem with this approach?
Right. The problem isn't qualitative data — it's sample bias. Power users are the tail of your distribution, not the center. Building for them optimizes for retention of people who already love you, not acquisition of people who haven't committed yet.
The issue isn't the format of the feedback — it's who gives it. Discord power users are self-selected, highly engaged, and don't represent the silent churning majority. Their opinions are real but biased toward features for people who already love the product.
2. Aisha's AI analysis found that users who didn't complete their dietary profile in the first session had a 94% churn rate. This is most useful because:
Exactly. This is the value of behavioral data synthesis — finding a specific, testable bottleneck. It doesn't prescribe the solution (shorter profile? better mobile form? skip-for-now option?), but it makes the problem specific and addressable. That's actionable insight.
The finding doesn't prescribe a specific solution — it identifies a bottleneck. Now she needs to figure out why the profile isn't being completed and what the best fix is. But she's finally working on the right problem instead of adding features to solve the wrong one.
3. You have 200 feedback emails you haven't fully analyzed. What's the best prompt strategy for extracting useful signal from them with AI?
Right approach. The key insight is asking specifically what appears in negative but not positive feedback — that asymmetry reveals genuine pain points rather than general topics. Positive-only analysis tells you what people like; differential analysis tells you what's broken.
The most useful analysis is differential — what themes appear in negative feedback that don't appear in positive feedback? That asymmetry reveals actual problems. Analyzing only positives or only what you've already summarized misses the most actionable signal.
4. Your app has been live for 5 months. Retention is completely flat despite 3 product changes. One specific segment — freelance writers — has strong retention, but you built this for remote workers broadly. Which framework signal does this represent?
Classic pivot signal. You've found genuine retention in an adjacent segment — and pivots work best when you pivot toward something your existing work gives you an advantage at. Repositioning for freelance writers isn't starting over; it's focusing on where the product already works.
Flat overall retention despite changes is a problem, but there's a real signal here: one segment has strong retention. That's a pivot signal — the product works for someone, just not the someone you originally targeted. That's information to act on, not ignore.
5. What is the most honest reason to "quit" a side project according to the persist/pivot/quit framework?
Right. Quitting based on evidence — no retention signal from any segment, and a realistic assessment that the opportunity cost of continuing is too high — is a strategic decision, not a failure. Excitement, competition, and revenue timelines are all secondary to whether the core behavioral loop creates value for someone.
Quitting well isn't about feelings or competition — it's about evidence. The signal that justifies quitting is: no identifiable segment has meaningful retention despite enough users to know, and the cost of continuing (time, opportunity, energy) exceeds the realistic expected return.

Lab 4: The Iteration Strategist

Run the persist/pivot/quit framework against your real project data

Your Role: Founder at a Decision Point. The AI's Role: Honest Advisor Who Will Push Back.

You're going to describe the current state of your project: the metrics you have (or don't have), the qualitative feedback you've received, and your gut feeling about where things stand. The AI will make the case for persist, pivot, AND quit — then challenge you on what you're avoiding looking at.

This only works if you're honest about the numbers. Round numbers are fine. Unknown numbers are fine. Invented optimistic numbers are useless.

Describe your project's current state: how long it's been live, your core retention/usage numbers (even rough estimates), what qualitative feedback you've heard most, and your honest gut feeling about whether it's working. Include what you've been avoiding thinking about.
Iteration Strategist
AI Lab
Alright, let's run the framework. I'm going to make the case for persist, pivot, AND quit based on what you give me — and I'm going to push on whatever you seem to be glossing over. Give me the honest picture: how long has it been live, what are your usage numbers (rough is fine), what feedback patterns are you seeing, and what's the thing you've been not quite looking at directly?

Module 4 Test

AI Tools for Every Stage — 15 questions. Score 80% or higher to pass.
1. The "scratch your own itch" approach to side projects fails when:
The condition is compound: enough others must have the same problem AND not have existing solutions that work well enough. Missing either part is fatal.
Personal experience is actually an advantage. The failure mode is assuming your experience is representative without verifying it.
2. Which AI tool is best suited for the "landscape mapping" research job?
Right. Perplexity synthesizes real-time web information — ideal for understanding who's in your space and what they're doing.
Match tools to jobs. Copilot and Cursor are build tools. Bolt.new is a prototype generator. Landscape mapping needs a research tool with web access.
3. The primary purpose of "assumption mapping" in pre-build research is:
Correct. Assumption mapping is about surfacing what has to be true for your idea to work, and ranking by how likely each assumption is to be wrong — before you've built anything and become attached.
Assumption mapping is a strategic exercise, not a technical or legal one. It forces you to name what you're betting on before you've invested months building it.
4. A research sprint deliverable should tell you all of the following EXCEPT:
Technical feasibility is a build-stage question, not a research-stage question. The research sprint answers market and demand questions — problem reality, solution alternatives, and what hypotheses to test with real users.
Technical feasibility assessment happens at the build stage. Research sprints answer demand and competitive landscape questions. These are different phases with different questions.
5. AI coding tools create scope creep risk because:
Exactly. The friction of writing code manually limited how many features a solo builder would add. Remove that friction without a replacement constraint and scope expands unchecked.
The code isn't inherently hard to delete. The problem is psychological: when adding a feature is cheap and fast, the barrier to adding it drops to near zero. Nothing replaces the natural constraint unless you build it deliberately.
6. Bolt.new is most appropriate for which use case?
Right. Bolt.new's strength is speed-to-prototype. The generated code isn't production-quality, but for rapid concept validation it's much faster than building from scratch.
Bolt.new generates apps from prompts — it's a prototyping tool. For refactoring, security audits, or test writing, you need different tools suited to working inside an existing codebase.
7. The three-question filter before accepting AI-suggested features asks whether users requested it, whether the core job can be done without it, and:
The three questions: did users ask for it? can they live without it? what's the maintenance burden? Every feature you add is a feature you'll debug, update, and explain. That cost is real and compounds.
The three questions are grounded in user validation, necessity, and maintenance cost — not competitive analysis, sprint timelines, or vision alignment. Those matter, but they're not the filter.
8. Devon's three launch-day bugs all had one thing in common. What was it?
Right. Devon tested in his context — his machine, his browser, his timezone, his data. All three bugs lived outside that context. Systematic edge case generation would have expanded the testing surface to catch them.
These weren't external failures or load issues — they were standard bugs that only appeared outside Devon's specific testing context. That's a testing methodology problem, not a technical complexity problem.
9. To generate the most useful AI pre-launch checklist, you should include your target user's geographic location because:
Right. Geographic context drives timezone handling (Devon's bug), localization of dates/currency, and in some markets, legal compliance requirements. Specifying user location makes the checklist specific to actual risks.
Location matters because it determines real technical risks: timezone handling, date/currency formatting, and sometimes regulatory requirements. These are concrete bug categories, not abstract considerations.
10. AI testing tools cannot catch which category of product problem?
Correct. AI testing covers the technical layer — deterministic right-or-wrong bugs. Perception, trust, and value comprehension are human responses that can only be observed with real users in real contexts.
Null errors, missing error states, and race conditions are all technical bugs AI can catch systematically. Trust and value perception are human phenomena — you have to observe real people using the product to understand them.
11. Why are Discord power users a poor sample for most product decisions?
Right. Power users are the tail of the distribution, not the center. Building for them optimizes retention of already-retained users, not acquisition and retention of the typical new user who hasn't committed yet.
The problem is sample bias, not content or device preference. Power users opted in to a community — they're already more engaged than typical users. Their feedback skews toward features that deepen an experience most users haven't fully entered yet.
12. Aisha's key insight from AI analysis of her event data was actionable because it:
Right. The 94% churn rate tied to incomplete profiles gave her a specific, addressable problem. That's the value of behavioral data synthesis — moving from "retention is a problem" to "this specific step is where retention breaks."
The value wasn't validation or benchmarking — it was specificity. She moved from knowing she had a churn problem to knowing exactly where in the funnel it happened and approximately how bad it was. That's the difference between a problem and a target.
13. A "pivot signal" in the persist/pivot/quit framework is best described as:
Right. Pivots work when they're toward something your existing work gives you an advantage at. Finding strong retention in an adjacent segment is exactly this signal — you have a product that works, just not for the people you originally targeted.
Overall decline suggests quit, not pivot. Competition and founder excitement are inputs but not the primary signals. The pivot signal is finding that something you've already built works well for a different segment or problem than you intended.
14. When should AI be used to argue against adding a feature during the build stage?
Right. The habit should be routine, not reactive. Because code is cheap with AI, the natural constraint on scope is gone — you have to deliberately reinstall it by actively arguing against features every time, not just when you're behind.
The point isn't to only cut when you're in trouble. The point is that AI code generation removes the natural scope constraint, so you have to replace it with a deliberate process every single time you consider adding something.
15. The full research-build-test-iterate loop is best understood as:
Right. The loop doesn't end at launch — it continues indefinitely, and each pass benefits from what you learned in the previous one. AI compresses each stage but plays a different role in each: research synthesis, build acceleration, test coverage, and iteration signal processing.
The loop is not linear or pre-launch-only. It's a cycle. Your post-launch research is better than your pre-launch research because you have real data. Your second build sprint is more focused than your first. The loop compounds over time.