Priya is a junior at a state university, finishing a cover letter at midnight for a UX research internship at a product studio in Austin. She types the same prompt into both ChatGPT and Claude: "Write a professional cover letter for a UX research internship, emphasizing user empathy and a background in psychology." The ChatGPT version comes back polished, confident, a little generic โ the kind of letter that reads like it could belong to any of fifty applicants. The Claude version is longer, more nuanced, and includes a paragraph that almost sounds like it's asking the hiring manager a question. Neither letter is wrong. But they're clearly not the same.
Priya doesn't know why they're different, so she combines them manually and submits. She gets an interview. But she's left wondering: was that the best she could have done? If she'd understood why those tools responded differently, she could have made a deliberate choice โ not just a paste job.
Every major AI chatbot โ ChatGPT, Claude, Gemini, Mistral, Llama-based products โ is built on a transformer architecture. That part is roughly the same across the board. The reason they behave differently comes from three distinct layers: what data they trained on, how they were fine-tuned, and what their RLHF (reinforcement learning from human feedback) optimized for.
OpenAI's GPT-4o was trained and fine-tuned with a heavy emphasis on being useful across a wide range of tasks quickly. It was also shaped by massive user feedback loops, which means it got really good at producing things that feel polished and satisfying โ even if they lack depth. Anthropic built Claude with a framework called Constitutional AI, which trained the model to evaluate its own outputs against a set of principles. That makes Claude more likely to hedge, qualify, or push back on things it finds poorly framed. Google's Gemini was trained with especially strong emphasis on factual retrieval and integration with real-time information โ it's a research engine that learned to talk.
Understanding these design philosophies isn't just trivia. It directly predicts how each tool will respond to different prompt structures, tones, and task types.
If you're using one AI for everything โ cover letters, code debugging, brainstorming, fact-checking โ you're almost certainly leaving capability on the table. Not because the other tools are "better," but because different tasks map to different strengths, and the strengths are real and consistent.
Here's a framing that's a little reductive but genuinely useful: think of the major AI tools as having distinct working styles, the way collaborators do.
ChatGPT (GPT-4o) is the fast, confident generalist. It produces output immediately, with high surface polish. It's trained to please, which means it will rarely tell you your idea is bad โ it will build on it. That's great for brainstorming, drafting, and getting something out of zero. It's less great when you actually need someone to push back on your assumptions.
Claude (claude-3.5/3.7 Sonnet, claude-opus) is the careful analyst who writes in full sentences. It asks clarifying questions, adds nuance you didn't request, and sometimes writes more than you wanted. It's been trained to reason about ethics, uncertainty, and framing. If you give Claude a poorly constructed argument, it will often notice and say so โ sometimes helpfully, sometimes annoyingly. It excels at long-form reasoning, complex editing, and tasks where you want a genuinely considered opinion, not just agreement.
Gemini (Google's model) is the researcher with a live internet connection. When you need a synthesis of current information โ recent policy changes, recent science, what happened last week โ Gemini is frequently more reliable than models with training cutoffs. Its prose is sometimes more mechanical, but its factual grounding is a genuine asset for anything that requires being current.
Smaller/open models (Mistral, Llama-based tools like Meta AI, Perplexity's model) are more unpredictable but often faster, cheaper, or available in contexts where the big three aren't. Their consistency depends heavily on the specific deployment, but they're worth knowing about.
| Tool | Strongest at | Watch out for | Prompt style it rewards |
|---|---|---|---|
| GPT-4o | Fast drafts, code, creative work, wide task range | Over-confident, rarely challenges bad premises | Direct, task-focused, specific format requests |
| Claude | Long reasoning, editing, nuanced argument, analysis | Can be verbose, sometimes over-qualifies | Context-rich, conversational, reasoning-forward |
| Gemini | Current info, research synthesis, Google integration | Less creative, can be dry prose | Research-style questions, "as of [date]" framing |
| Mistral/Llama | Speed, local deployment, customization | Inconsistent quality, less instruction-following | Concise, simple, structured prompts |
Here's what's actually happening among people your age right now: the overwhelming majority are using exactly one AI tool, usually whatever they first signed up for โ which for most people was ChatGPT. Not because it's objectively the best tool for every situation, but because it was first, it's easy, and switching feels like extra effort with no obvious payoff.
That's understandable. It's also a mild but real disadvantage. The people getting the most out of AI right now โ in classes, internships, personal projects โ aren't necessarily using more sophisticated prompts. Some of them have just figured out that a task that produces mediocre output on one tool might produce excellent output on another. Priya's cover letter problem wasn't really about prompt engineering. It was about not knowing that Claude was probably the better choice for a nuanced, reflective writing task, while ChatGPT would have been better for quickly generating five structural variations she could pick from.
The practical takeaway from this lesson: Next time you get output you're not satisfied with, before you spend 20 minutes re-prompting, ask yourself: would a different tool do this better by design? Give it five minutes with a competitor and see.
Take a task you use AI for regularly โ summarizing readings, drafting messages, brainstorming ideas. Run the same prompt on two tools. Don't try to make one better than the other. Just observe: what did each one emphasize? What did each one skip? What does that tell you about their training priorities?
Your peer is about to use AI for a task. They don't know which tool to use and they're going to ask you. Your job isn't to say "it depends" โ it's to make a specific recommendation and defend it based on what you know about how these tools are designed.
The AI in this lab will play your peer: direct, a little skeptical, and willing to push back if your reasoning is weak. After you've given your recommendation, justify it. If you're not sure, say why โ but make a call.
Marcus is a sophomore studying communications, and he's using ChatGPT to research a persuasion paper on social proof. He asks: "What are the most important studies on social proof from the last five years?" ChatGPT gives him five citations, complete with author names, journal titles, and publication years. Marcus skims them โ they look legit โ and drops them into his paper. His professor flags three of them as non-existent. The citations were fabricated.
Marcus is furious at ChatGPT. But here's the thing: ChatGPT didn't lie to him in the way a person lies. It did exactly what it was designed to do โ predict the next most plausible token in the sequence. "Author Name. Journal Title. Year. Page numbers." looks exactly like what a real citation looks like. From the model's perspective, it generated a valid-seeming continuation of the text pattern. It had no way to know whether those specific papers existed. Marcus's mistake wasn't trusting AI โ it was using ChatGPT for a task that requires factual accuracy about specific real-world objects.
Every response from any GPT model is a prediction. The model doesn't look things up. It doesn't have a database of facts it checks against. It generates text by predicting what tokens (words, parts of words) should come next given everything it's seen before, weighted by patterns learned during training. This is powerful. It's also a profound source of unreliability for specific factual claims.
GPT-4o is especially likely to produce confident, fluent, polished-sounding text โ because that's what it was rewarded for. Human feedback during RLHF consistently rated fluent, confident responses higher than uncertain or hedging ones, which trained the model to produce exactly that: confident text, whether or not it's accurate. This isn't OpenAI being malicious. It's a known consequence of training for satisfaction.
The practical implication: ChatGPT is excellent for tasks where fluency and pattern are more important than factual precision. First drafts. Brainstorming. Restructuring existing text. Writing code from clear specifications. Creating outlines. All of these are "generate a plausible continuation of this pattern" tasks โ which is exactly what GPT-4o is built for.
GPT-4o's strongest use cases fall into a few clear categories, each with prompting strategies that extract better output:
Fast structural drafting. When you need a first draft, outline, or structure quickly, GPT-4o is often the fastest path. The key prompt move here is to specify format explicitly. "Write a 5-section outline for an essay arguing X, using bullet points with 2 sub-points each" will produce better structure than "help me outline my essay." Format instructions are one of the things GPT-4o follows most reliably.
Creative iteration. GPT-4o is good at generating multiple versions of something. If you ask for five different approaches to the same email opening, it will actually give you meaningfully different versions. Contrast this with Claude, which tends to give you one considered version and explain why it made the choices it did. For creative tasks where you want raw options to pick from, GPT-4o's "give me variety" mode is an asset.
Code and technical scaffolding. GPT-4o writes competent code quickly for standard tasks. The caveat is that it will also confidently write wrong code. Always test. Never trust output for functions that touch real data, authentication, or payments without review.
Tone matching and rewriting. Paste in some existing text and ask GPT-4o to rewrite something new in the same style. It's unusually good at picking up on tone, register, and voice โ better than most other models for this specific task.
Give it explicit constraints: format, length, tone, audience, and any content limits. GPT-4o is trained to follow detailed instructions closely. The more specific your constraints, the less creative latitude it takes โ and the more reliable the output becomes. Vague prompts give it room to hallucinate. Constrained prompts channel its prediction engine in a useful direction.
The dominant way people use ChatGPT right now is: open it, type what they want, and either accept or re-prompt once. Most people treat it like a more capable Google search. That works fine for a lot of tasks. It fails predictably for specific factual claims, for tasks that require genuine nuance or pushback, and for long documents where the model loses context.
The subtle thing most people miss is the confidence calibration problem. ChatGPT will give you the same tone whether it's very certain or completely guessing. It doesn't say "I'm not sure about this" the way Claude sometimes does. That means you have to calibrate yourself โ you have to know which categories of claims to verify externally. Specific citations, specific statistics, specific dates, names of real people in niche fields: always check. General explanations of concepts, structural outlines, rewrites of existing content: generally reliable.
The practical takeaway: Use ChatGPT like a fast, confident collaborator who produces great rough drafts but sometimes makes up sources. The solution isn't to distrust it โ it's to know which outputs need fact-checking and which don't.
Before sending your next ChatGPT prompt, add three constraints you weren't planning to include: (1) a specific output format, (2) a specific audience, and (3) one thing to avoid. Watch what changes. The additional specificity almost always improves output quality โ not because ChatGPT needs more context philosophically, but because it needs a narrower target to aim at.
You have a task that you'd normally give to ChatGPT. Your job is to build a constraint-stacked prompt for it โ specifying format, audience, length, tone, and at least one explicit exclusion. Then explain to your lab partner (the AI here) why each constraint is doing useful work.
The lab AI will evaluate your prompt design: are the constraints real and useful, or are they just padding? It will ask you to justify or revise. This is the kind of critical feedback ChatGPT itself almost never gives you โ which is part of why it's worth practicing here.
Deja is a pre-law junior and she's been using Claude to help stress-test arguments for a moot court competition. She typed her argument into Claude and asked: "Is this argument strong?" She expected either validation or a list of counterarguments she could prepare for. Instead, Claude gave her a response that started with: "This argument has real structural clarity, but there are two premises where I'd push back before recommending it to a hostile examiner." Then it actually pushed back. Hard. On premises she thought were solid.
Her first reaction was annoyance โ she hadn't asked for criticism. Her second reaction, after she worked through the feedback, was that Claude had just identified the two weakest points in her argument, which were exactly the two points the opposing team attacked during the competition. She finished in the top three. The lesson wasn't just "Claude is better for this." It was: Claude rewards prompts that invite genuine analysis rather than prompts that fish for confirmation.
Claude was built by Anthropic using a training approach called Constitutional AI, where the model is taught to evaluate its outputs against a set of explicit principles before committing to them. This creates a model that thinks about what it's saying more than it thinks about whether you'll like what it says. The practical consequence is that Claude is significantly more likely than ChatGPT to:
โ Add unsolicited qualifications when it thinks your premise is shaky
โ Point out ambiguity in your request before answering
โ Produce longer responses because it includes reasoning, not just conclusions
โ Decline or reframe requests it finds ethically problematic rather than just complying
This can feel annoying if you're used to ChatGPT's compliance. But it's a feature, not a bug, if you're using Claude for the right tasks. The trick is knowing how to structure prompts that work with this orientation instead of against it.
Give Claude context and let it reason. Unlike ChatGPT, which responds well to tightly constrained format instructions, Claude actually performs better when you give it context and invite it to reason. A prompt like "Here is my draft argument. I'm presenting to a skeptical audience. Identify the weakest points and explain why they're weak" will get you a more genuinely useful analysis than "List five weaknesses in this argument." The first invites reasoning. The second invites a list.
Ask for steelman and then pushback. Claude is exceptionally good at the intellectual move of "give me the strongest version of the opposing position, then tell me how to respond to it." This is genuinely hard for most AI tools because it requires the model to take a position it doesn't necessarily hold. Claude's training makes it comfortable doing this without compromising on accuracy.
Explicitly invite disagreement. Prompts that say "tell me what's wrong with this" or "where would this fail?" unlock Claude's critical mode better than neutral prompts. Because it's trained to be diplomatically honest rather than dishonestly diplomatic, giving explicit permission for criticism produces more direct responses.
Use Claude for long-document tasks. Claude has one of the longest context windows of any deployed model, and it actually uses context it was given earlier in a conversation more reliably than most competitors. For tasks like reviewing long documents, maintaining consistency across a multi-section piece, or having a sustained analytical conversation, Claude's memory of the conversation tends to be more reliable.
Claude's responses are often longer than you need. It will explain its reasoning when you just wanted the output. The fix: add a direct instruction like "Be direct and concise โ give me the answer without explaining your reasoning unless I ask" or "Give me only the output, no meta-commentary." Claude takes these instructions seriously. It won't feel offended. It will comply.
It's worth being honest about where Claude underperforms, because the peer instinct is often to treat each new AI discovery as universally better. Claude is not the right choice when:
You need speed above quality. Claude's responses are frequently longer and more considered โ that takes time. For rapid-fire brainstorming or quick rewrites, ChatGPT is faster.
You want pure creative compliance. If you want the AI to just write what you asked for without questioning your concept, Claude's tendency to evaluate and push back can slow you down. Creative tasks where the directive is "just do it, I'll judge quality myself" are often faster on ChatGPT.
You need current information. Claude has a training cutoff and (in its default state without tools) no real-time web access. For anything where currency matters โ recent developments, current stats, what happened last month โ Gemini or ChatGPT with browsing is more reliable.
The practical takeaway: Use Claude specifically for tasks where you want a model that will evaluate its own output, challenge weak premises, and produce considered rather than immediate answers. Brief it with context. Invite disagreement. Explicitly request concision if you need it. The output reward for doing this well is real.
[Context] + [Task] + [Standard it should apply] + [Permission to critique] + [Output format]. Example: "I'm applying to a competitive graduate program in urban planning. Here is my statement of purpose. Evaluate it against what top programs say they're looking for. Tell me what's weak, what's missing, and what's strongest โ then give me a revised opening paragraph." That prompt structure uses every dimension of Claude's strengths.
Bring something real: a paragraph from a paper, a professional bio, an argument you've been making, an idea for a project. Use the Claude-optimized prompt structure from the lesson โ context, task, standard, permission to critique, output format. Then push back on the lab AI's feedback if you disagree. See what happens when you defend your choices vs. when you revise them.
The lab AI here is playing the role of a rigorous analyst โ direct, specific, willing to be convinced but not easily. It won't validate for the sake of being nice.
Leo is a junior studying environmental policy, and he's trying to write an analysis of a federal rule that was finalized in late February 2025 โ less than two months ago. He opens ChatGPT and asks about the rule. ChatGPT gives a confident summary that's based on the proposed rule from 2023, not the final rule. The summary is authoritative-sounding and wrong. He opens Claude and asks the same question. Claude hedges: "My training data may not include the final version of this rule as finalized in early 2025 โ I'd recommend verifying with a primary source." At least Claude told him it didn't know. He opens Gemini with web access. Gemini pulls the actual Federal Register summary and gives him the accurate, current version. Right tool, right task.
Leo had been using all three tools for months. But he'd been treating them as interchangeable โ just picking whichever one was already open on his laptop. This moment clarified something he'd understood abstractly but never internalized: the tool you use isn't just a style preference. It's a decision with real consequences for the quality of your work.
Gemini (Google's flagship model, available at gemini.google.com) was built with a genuinely different orientation than the other major tools. Google's core product is information retrieval at scale, and that DNA shows in Gemini's design. The model's strongest features include:
Real-time web access. Unlike ChatGPT and Claude in their default states, Gemini regularly incorporates current web search results. For anything that needs to be current โ recent regulation, current market prices, recent scientific papers, what a company announced last month โ Gemini is typically more reliable than models working only from training data.
Google ecosystem integration. Gemini integrates natively with Google Docs, Gmail, Drive, and Search. If you're working in the Google ecosystem (which most students are), Gemini can access your actual documents, summarize your emails, and work within your existing files. None of the other major tools do this natively without third-party connectors.
Multi-modal input. Gemini handles images, PDFs, and other file types well. Upload a dense policy document and ask it to extract the key provisions. Upload a chart from a paper and ask it to explain what it shows. These tasks are usable on other platforms too, but Gemini's file-handling in practice tends to be reliable.
The prompt strategies that work well with Gemini lean into its research orientation. Ask it to compare current data. Ask it to summarize documents you upload. Ask it to fact-check claims against what it can currently find on the web. Ask it to find the most recent version of something. Prompts that leverage its real-time access are where Gemini outperforms models without it.
Beyond the big three, there are a handful of specialized tools that outperform general-purpose AI in specific contexts. You don't need to use all of them โ but knowing they exist means you're not limited to ChatGPT when a better-fit tool exists.
Perplexity AI is essentially a research assistant that always shows its sources. Every claim is linked to a retrievable document. If you're doing research where you need to trace claims back to primary sources โ and you don't want to manually verify every output โ Perplexity's source-linking is a genuine practical advantage over models that generate without attribution.
GitHub Copilot / Cursor are coding assistants that are directly integrated into development environments. For programming tasks, these tools dramatically outperform asking a general AI assistant about code, because they have context about your entire codebase, can see your errors in real time, and are trained specifically on programming tasks at a depth that general models don't match.
NotebookLM (Google) is designed for working with a specific set of documents you provide. Upload your course readings, upload your research papers, and then ask questions about them specifically. Unlike asking a general model to "remember" a document, NotebookLM grounds all responses in exactly what you gave it โ which dramatically reduces hallucination for document-based research tasks.
| Tool | Best for | Avoid for |
|---|---|---|
| ChatGPT | Fast drafts, brainstorming, creative variants, code scaffolding, tone matching | Fact-specific claims, citation generation, current events without browsing |
| Claude | Long reasoning, argument analysis, editing, complex writing tasks, ethical dilemmas | Speed-critical tasks, pure creative compliance, real-time information needs |
| Gemini | Current events, research synthesis, Google ecosystem tasks, document analysis | Nuanced creative writing, prolonged analytical reasoning |
| Perplexity | Research with verifiable sources, claim checking, academic topic overview | Long-form writing, creative tasks, code |
| NotebookLM | Analyzing documents you provide, course reading synthesis, reducing hallucination | Tasks not grounded in provided documents, creative generation |
Here's the honest version of what "having an AI stack" means at your stage: it's not about subscribing to ten different services and maintaining a complex workflow. Most people your age, navigating real tasks with real time constraints, need something simple enough to actually use consistently. The goal is decision clarity, not comprehensiveness.
A workable three-layer personal stack looks like this:
Layer 1 โ Your daily driver. One tool you use by default for most tasks. For most people, this is ChatGPT or Claude. Pick the one that fits your primary use cases (drafting and brainstorming โ ChatGPT; analysis and editing โ Claude). Don't switch daily drivers on every task โ build fluency with one first.
Layer 2 โ Your research layer. One tool you reach for when currency or sources matter. Gemini or Perplexity. When a task involves claims about what's currently true โ not patterns or concepts, but specific facts โ you go here instead of your daily driver. Make this a habit, not a fallback.
Layer 3 โ Your specialist. One tool for a specific high-frequency task in your own life. If you code, this is Copilot or Cursor. If you study from readings, this is NotebookLM. If you write music, this is something else entirely. One specialist that makes a specific repetitive task meaningfully better.
The peer reality is that most people skip layers 2 and 3 entirely and use ChatGPT for everything โ and then wonder why their AI-assisted research sometimes produces wrong information or why their study sessions don't feel as efficient as they should. The answer usually isn't "get better at prompting." The answer is usually "use the right tool."
The practical takeaway: Before next week, define your three layers. Write them down. Not an aspirational list โ a realistic one based on what you actually do. What tasks do you use AI for most? Which tool fits those tasks best? Which one do you reach for when facts matter? Which one would improve one specific thing you do regularly? That's your stack.
Ask yourself three questions before opening any AI tool: (1) Does this task require current facts? If yes, go to your research layer. (2) Does this task require deep reasoning or critique? If yes, use Claude. (3) Everything else? Daily driver. This takes about three seconds and will meaningfully improve your output quality over weeks of consistent application.
Tell the lab AI about your actual life: what you're studying or working on, what tasks you use AI for most, and what you're currently using. Then together, work through defining your three-layer stack โ daily driver, research layer, specialist. The lab AI will ask you to justify each choice based on the tool's actual design strengths, not just habit or familiarity.
If you default to "ChatGPT for everything," be ready to defend that or revise it. The lab AI will push back on unjustified choices and validate well-reasoned ones. By the end, you should have a written three-layer stack you could actually show someone and explain.