L1
·
Quiz
·
Lab
L2
·
Quiz
·
Lab
L3
·
Quiz
·
Lab
L4
·
Quiz
·
Lab
Module Test
Module 3 · Lesson 1

The Anatomy of a Great Prompt

Every outstanding AI-generated artwork starts with a carefully constructed sentence — not luck.
What separates a prompt that produces magic from one that produces mush?

In August 2022, Jason Allen submitted Théâtre D'Opéra Spatial to the Colorado State Fair Fine Arts Competition — and won first place in the Digital Arts category. The image, generated with Midjourney, sparked a national debate about AI and art. What almost no one discussed was the craft behind it: Allen spent over 80 hours iterating on prompts, upscaling, and refining before he submitted. The prompt itself was hundreds of words long and included specific artistic references, lighting descriptors, and compositional instructions.

The lesson wasn't that AI made art easy. It was that knowing how to direct the AI was itself a skill worth developing.

Why Prompts Are Not Just Descriptions

Most beginners treat a prompt like a Google search query — short, keyword-dense, hoping the AI figures it out. But AI image generators and language models are not search engines. They are instruction-following systems that respond to the precision, structure, and specificity of your input.

A weak prompt hands creative control entirely to the model's defaults. A strong prompt is a creative brief — it communicates subject, style, mood, technical parameters, and negative constraints all at once.

The Six-Layer Anatomy

Research into prompting patterns — including the published workflow guides from Midjourney's own team (2023) and OpenAI's DALL-E documentation — reveals that the most effective prompts tend to stack six distinct layers of information:

Prompt Anatomy — Six Layers
Subject
What is in the image? Be specific. "A woman" is weak. "A 1920s jazz musician holding a trumpet, mid-performance" is strong. Include number, action, identity.
Style
What visual language should it speak? Reference artists, movements, or media. "In the style of Gustav Klimt's golden period" tells the model far more than "decorative."
Mood
What feeling should it evoke? Atmospheric words shape color palette, contrast, and composition. "Melancholic dusk" produces a very different result from "vibrant celebration."
Lighting
How is the scene lit? "Golden hour side-lighting" or "harsh neon from below" will shift every element of the output. Lighting is the fastest way to upgrade a mediocre prompt.
Composition
How is the frame arranged? "Close-up portrait," "wide establishing shot," "bird's-eye view," "rule of thirds with subject off-center" — each sets a cinematic frame.
Technical
What format/quality signals apply? "8K resolution, photorealistic render, sharp focus, Canon 85mm lens" anchors the output in a specific visual standard.
Weak vs. Strong — Side by Side
❌ Weak Prompt
"a cat in a garden"
No style, no mood, no lighting, no composition. The AI fills every gap with its own defaults — you get an average, generic result.
✓ Strong Prompt
"a tabby cat sitting in an overgrown Victorian walled garden at golden hour, oil painting in the style of John Singer Sargent, warm amber light filtering through ivy, shallow depth of field, contemplative mood, detailed brushwork"
Every layer is addressed. Style, mood, lighting, and composition are explicit. The AI has precise creative direction.
The 20% Rule — When Narrative Matters

Jason Allen's Colorado victory wasn't just about a long prompt. He later described how he used a narrative backstory — imagining a specific scene, a specific emotional purpose, a specific viewer — to guide every iterative refinement. The prompt became a script for a vision, not just a list of adjectives.

This is the 20% that separates competent prompters from skilled ones: knowing why you are making something, and letting that intention shape the words you choose.

Key Insight

The prompt is not a command. It is a collaboration brief. You are telling the AI what role it is playing, what world you are in, and what the emotional target is — before a single pixel is generated.

Key Terms
PromptA text input given to an AI model that directs the content, style, and form of its output.
Style ModifierA word or phrase that references a named artist, art movement, or visual medium to anchor the output's aesthetic.
Negative PromptAn instruction telling the AI what to exclude or avoid (e.g., "no text overlay, no blurry background").
IterationThe process of generating, evaluating, and refining prompts through repeated attempts to reach a desired result.

Lesson 1 Quiz

The Anatomy of a Great Prompt — 4 questions
1. Which layer of prompt anatomy is most directly responsible for determining color palette and emotional tone?
Correct. Mood descriptors like "melancholic dusk" or "vibrant celebration" directly influence the AI's color and tonal choices throughout the image.
Not quite. While other layers contribute, the Mood layer most directly shapes emotional tone and color palette in AI image generation.
2. Jason Allen's award-winning Midjourney image at the 2022 Colorado State Fair demonstrated primarily that:
Correct. Allen spent over 80 hours iterating on prompts and refinements — demonstrating that directing AI is itself a practiced creative skill.
Incorrect. Allen's process involved extensive iteration and a detailed, hundreds-of-words-long prompt — far from effortless or simplistic.
3. What is the purpose of a "negative prompt"?
Correct. A negative prompt specifies what the AI should avoid — such as "no text overlay, no blurry background" — giving you more precise control over the output.
Incorrect. Negative prompts are instructions about exclusion — telling the AI what not to include in the generated result.
4. According to the lesson, the "20% rule" that separates competent from skilled prompters is:
Correct. Intentionality — knowing the emotional purpose behind your creation — is what allows skilled prompters to make better word choices and more meaningful refinements.
Incorrect. The 20% rule refers to intentionality: having a clear sense of why you are making something, and using that to guide every prompting decision.

Lab 1 — Build a Six-Layer Prompt

Practice constructing complete prompts using all six anatomy layers

Your Mission

You're going to build a rich, six-layer prompt from scratch — then analyze and improve prompts together with your AI coach. Start by describing an image you'd like to create, and your coach will help you layer in subject, style, mood, lighting, composition, and technical details.

Have at least 3 exchanges with the coach to complete this lab.

Try starting with: "I want to create an image of [something you care about]. Help me build a full six-layer prompt for it."
Prompt Coach
Lab 1 · Six-Layer Anatomy
Welcome to Lab 1. I'm your Prompt Coach for this session. We're going to build a complete, six-layer prompt together — Subject, Style, Mood, Lighting, Composition, and Technical. Tell me about an image you'd like to create, and I'll help you layer it up into something powerful. What subject or scene is on your mind?
Module 3 · Lesson 2

Style Modifiers & Artist References

The fastest way to give your AI image a distinct visual identity is to speak the language of art history.
How do professional prompters borrow from centuries of artistic tradition without copying anyone?

In early 2023, the concept artist and educator Karla Ortiz — one of the plaintiffs in a landmark lawsuit against Stability AI — demonstrated publicly how AI image generators had been trained on her work without consent. During that same period, Adobe launched Firefly, trained exclusively on licensed and public-domain images. Adobe's team published documentation showing that Firefly users achieved dramatically different aesthetic results depending on whether they referenced Art Nouveau, Baroque, or Brutalist style modifiers — even when the subject was identical.

The implication was clear: style modifier vocabulary is a skill set, and those who understood art history had a measurable advantage in directing AI outputs.

What Style Modifiers Actually Do

When you include an artist name or art movement in a prompt, you are activating a cluster of visual patterns the model has learned from thousands of works in that style. You're not just adding a word — you're selecting a visual grammar: a set of rules about color, texture, brushwork, composition, and subject matter.

Midjourney's published prompting guide (2023) refers to these as "style anchors" — the references that keep an output from drifting toward generic averages. Without them, models default to a kind of visual median: competent, unremarkable, nobody's style.

Modifier Categories to Know
Movement
Impressionist, Art Deco, Bauhaus, Ukiyo-e, Surrealist, Baroque
Artist
Vermeer, Moebius, Alphonse Mucha, Zdzisław Beksiński, N.C. Wyeth
Medium
oil on canvas, watercolor wash, charcoal sketch, linocut print, gouache
Era
Victorian, 1970s science fiction, fin-de-siècle, mid-century modern
Rendering
photorealistic, cel-shaded, painterly, hyperdetailed, low-poly
Platform
Artstation trending, cinematic still, editorial photography, concept art
Stacking Modifiers — How It Works

The real skill is not knowing individual modifiers — it's knowing how to stack them without creating conflicting instructions. A 2023 study published by researchers at Carnegie Mellon found that AI image models responded most coherently when style modifiers were ordered from broadest to most specific: movement → artist → medium → rendering.

The same study found that stacking more than four style modifiers produced diminishing returns and increased visual incoherence — the model tried to satisfy too many simultaneous constraints.

❌ Conflicting Stack
"photorealistic hyperdetailed oil painting charcoal sketch Art Nouveau watercolor Baroque cinematic"
Seven competing visual grammars. The model averages them into visual noise — no single style dominates.
✓ Coherent Stack
"Art Nouveau illustration, in the style of Alphonse Mucha, watercolor and ink on parchment, softly painterly"
Four modifiers, all pointing the same direction — decorative, organic, early-20th-century European printmaking.
The Ethics of Artist References

The Karla Ortiz case raised a genuine ethical question: is it appropriate to use a living artist's name as a style reference? The debate continues in courts and communities. Several AI platforms — including Adobe Firefly and Nightcafe — have moved toward discouraging or blocking references to specific living artists without consent.

As a practical guide: referencing art movements (Impressionism, Art Deco) or deceased historical artists is widely accepted. Referencing living working artists is ethically contested. When in doubt, describe the visual qualities directly: "loose impressionistic brushwork with warm tonality" rather than a specific living person's name.

Real-World Application

When the gaming studio Riot Games began exploring AI concept art tools in 2023, their art directors published internal guidelines requiring artists to use movement and medium references only — never specific living artists — to avoid ethical and legal complications while still achieving precise aesthetic direction. The discipline of describing visual qualities rather than copying named styles became a core prompt-writing competency on the team.

Pro Tip

Build your own personal "modifier vocabulary list" — 10–15 style anchors that consistently produce results you love. Knowing that "Syd Mead retro-futurism" or "Constable pastoral landscape" reliably works for you is more valuable than knowing every modifier that exists.

Lesson 2 Quiz

Style Modifiers & Artist References — 4 questions
1. According to a 2023 Carnegie Mellon study, what happens when more than four style modifiers are stacked in a single prompt?
Correct. With too many competing style modifiers, the model tries to average them all and produces visually incoherent results rather than a strong unified style.
Incorrect. More style modifiers beyond four actually hurt output quality — the model can't coherently satisfy too many simultaneous visual grammars.
2. Adobe Firefly was specifically trained differently from Stable Diffusion in that it:
Correct. Adobe built Firefly specifically on licensed and public-domain content to avoid the copyright issues that affected other models, making it legally safer for commercial use.
Incorrect. Adobe Firefly was trained on licensed and public-domain images — a deliberate ethical and legal choice that differentiated it from models trained on scraped web content.
3. When referencing artistic styles ethically, which approach is most widely accepted?
Correct. Referencing historical movements or describing visual qualities directly is widely accepted. Using living artists' names without consent is ethically contested, as the Karla Ortiz case highlighted.
Incorrect. The ethical consensus is to reference movements or historical artists, or to describe visual qualities (e.g., "loose impressionistic brushwork") rather than using living artists' names.
4. Midjourney's documentation calls style modifiers "style anchors" because they:
Correct. Without style anchors, AI models default to a "visual median" — competent but generic. Style modifiers anchor the output to a specific aesthetic tradition.
Incorrect. Style anchors keep outputs from drifting toward the generic average the model would produce without specific aesthetic direction.

Lab 2 — Style Modifier Vocabulary

Explore and test style modifier combinations with your AI coach

Your Mission

Work with your coach to build a personal style modifier vocabulary. Pick a subject or theme you care about, then experiment with different movement, medium, era, and rendering modifiers. Ask for comparisons, explanations, and recommendations.

Complete at least 3 exchanges to finish this lab.

Try starting with: "I want to create [subject]. What style modifier combinations would give me a distinctive, coherent look?"
Style Coach
Lab 2 · Style Modifiers
Welcome to Lab 2. I'm your Style Coach. We're going to build your personal style modifier vocabulary — the specific combinations of art movements, media, eras, and rendering styles that produce results you love. Tell me about a subject or creative project you're working on, and we'll start identifying the right style anchors for it.
Module 3 · Lesson 3

Iteration: Refining Toward Your Vision

Your first prompt is a hypothesis. The real work begins when you look at the result and know what to change.
What does a professional iteration workflow actually look like — and why do most beginners skip it?

When Refik Anadol, the Turkish-American media artist, was commissioned to create Unsupervised for the Museum of Modern Art in New York (displayed January 2023), he and his studio trained a custom model on MoMA's entire collection of over 200 years of art data. But the generative output wasn't simply turned on and displayed — Anadol's team spent months iterating on the input parameters, style weights, and temporal controls that shaped each flowing, morphing visualization.

The final installation was the result of thousands of refinement decisions. Anadol described the process in interviews as "a conversation with the machine" — each output teaching the team what to ask for next. Iteration wasn't a phase of the project — it was the whole project.

Why Your First Output Is Not the Goal

Most beginners treat the first AI output as either a success or a failure. Professionals treat it as information. What did the model interpret literally? What did it invent? What is surprisingly good that you should preserve? What is off that you need to correct?

This shift in mindset — from "hoping the AI gets it right" to "using what the AI gave me to understand what to ask for next" — is the single most important transition in becoming a skilled prompter.

A Professional Iteration Workflow
  • 1Generate a seed output. Write your best initial six-layer prompt and generate. Don't overthink it — you need a starting point to respond to.
  • 2Annotate what worked and what didn't. Be specific. "The lighting is wrong" is not actionable. "The light source is coming from the front when I want it from the left side" is.
  • 3Isolate one variable at a time. Change the mood modifier only. Then the lighting only. Then the composition. Changing everything at once means you don't know what caused the improvement.
  • 4Use inpainting or variation tools. Most platforms (Midjourney, DALL-E 3 via ChatGPT, Adobe Firefly) offer variation generation — producing alternatives that preserve what works while exploring changes.
  • 5Save prompts that produce good results. Build a prompt library. A prompt that worked once is a template for future work.
  • 6Know when to stop. The pursuit of perfection can loop infinitely. Define what "done" looks like before you start iterating so you know when you've arrived.
The Variation Technique

One specific professional technique is diverge-then-converge iteration — used extensively by concept artists working with AI tools at studios like ILM and Sony Pictures Imageworks (as documented in the 2023 Visual Effects Society survey on AI adoption).

The process: generate 4–8 variations with intentionally different style or composition parameters, select the best 2, then begin converging — each iteration narrowing the target. This prevents the "tunnel vision" trap of fixating too early on one visual direction.

Common Mistake

"I changed the prompt completely because the first one didn't work." This is the most common beginner error. If the output is 70% right, preserve the 70% — refine only what is wrong. Wholesale replacement discards all the information your first generation gave you.

Text-to-Image vs. Language Model Iteration

Iteration principles apply equally to text-based AI outputs — writing, code, analysis. In 2023, researchers at Stanford published findings showing that users who gave specific, targeted feedback to language models ("the second paragraph is too formal — rewrite it in a conversational tone") achieved better results in fewer exchanges than users who said "make it better" or rewrote the entire prompt.

The same principle applies to image generation: specific, targeted refinement beats wholesale replacement.

Iteration in Practice

At Penguin Random House's design studio, book cover designers who began incorporating AI tools in 2023 developed internal workflows that required a minimum of three iteration rounds before any AI-generated element could be considered for production use. The rule wasn't arbitrary — internal reviews found that outputs accepted after fewer than three refinement rounds almost always required costly post-production fixes.

Key Terms
IterationThe practice of generating an output, analyzing it, and making targeted changes to improve toward a defined goal.
InpaintingAn AI tool feature that allows you to edit a specific region of a generated image while preserving the rest.
Variation GenerationProducing multiple alternative outputs from a single prompt to explore different interpretations before converging on one direction.
Diverge-then-ConvergeA professional workflow: first explore many directions broadly, then progressively narrow toward the best one.

Lesson 3 Quiz

Iteration: Refining Toward Your Vision — 4 questions
1. Refik Anadol's MoMA installation "Unsupervised" (2023) is most relevant to the topic of iteration because:
Correct. Anadol's team spent months iterating on parameters and controls — he described it as "a conversation with the machine" where each output informed the next request.
Incorrect. The lesson from Anadol's project is that iteration wasn't just a phase — it was the entire creative practice. The final work emerged from thousands of refinement decisions.
2. According to Stanford research (2023), users who achieved the best results from language model refinements did so by:
Correct. "The second paragraph is too formal — rewrite it in a conversational tone" outperforms "make it better" because it gives the model precise, actionable direction.
Incorrect. Stanford's findings showed specific, targeted feedback produced better results in fewer exchanges than wholesale replacement or vague instructions like "make it better."
3. The "diverge-then-converge" iteration technique involves:
Correct. Diverge-then-converge prevents tunnel vision — by exploring many directions early, you discover possibilities you wouldn't have found by refining a single output from the start.
Incorrect. Diverge-then-converge means generating multiple varied outputs first (diverging), then selecting the most promising direction and refining it toward a final result (converging).
4. Why does the lesson recommend changing only one variable at a time during iteration?
Correct. Changing one variable at a time is a scientific approach to iteration — if you change everything at once and the output improves, you don't know what caused the improvement, making it impossible to replicate.
Incorrect. The reason is methodological: changing one variable at a time lets you identify what caused any change in output quality, building knowledge you can apply to future prompts.

Lab 3 — Practice Iteration

Take a weak prompt through three targeted refinement rounds

Your Mission

Start with a weak, underspecified prompt. Work with your coach to diagnose what's missing, then make one targeted change at a time over at least three iterations. Track what each change would improve.

Complete at least 3 exchanges with the coach to finish this lab.

Start with: "Here's a weak prompt I want to iterate on: 'a landscape at night.' Walk me through improving it one step at a time."
Iteration Coach
Lab 3 · Iteration Workflow
Welcome to Lab 3. I'm your Iteration Coach. We're going to practice the discipline of targeted refinement — making one deliberate change at a time and understanding why each change improves the output. Share a weak prompt you'd like to develop, and I'll guide you through the iteration process step by step.
Module 3 · Lesson 4

Prompting for Text, Music & Video

The six-layer anatomy adapts across every creative medium — once you understand the principle, you can apply it anywhere.
How does expert prompting change when you move from images to words, sounds, and motion?

In April 2023, Holly Herndon and Mat Dryhurst launched Holly+ — a public AI model trained on Herndon's voice, designed for collaboration. Rather than resisting AI music tools, they published detailed prompting guides for musicians who wanted to create new vocal performances in Herndon's style with her consent. Their documentation was among the first professional-grade prompt guides for AI music generation, and it revealed a crucial insight: prompting for music requires a completely different vocabulary than prompting for images — genre, tempo, key, instrumentation, emotional arc, and production style each play the role that style modifiers play in visual work.

That same year, OpenAI's Sora video model (unveiled February 2024) demonstrated that camera movement, duration, transition style, and narrative arc became the critical prompting dimensions for video — an entirely new layer that images don't require.

Prompting for Text (Language Models)

When prompting a language model for creative writing, the six-layer anatomy translates as follows: Subject becomes the topic or premise; Style becomes voice, register, and literary influences; Mood becomes emotional tone; Composition becomes structure (paragraph length, POV, narrative arc); Technical becomes format constraints (word count, reading level, whether to include dialogue).

Research published by Anthropic in their 2023 Constitutional AI documentation found that language model outputs improved most dramatically when users specified three elements explicitly: the audience, the purpose, and the tone. Without these, the model defaults to a "neutral journalistic voice" that suits nobody's creative vision.

Text Prompt Anatomy
Audience
Who is reading this? "For a 10-year-old curious about space" vs. "for a graduate seminar in astrophysics" produces completely different outputs from the same subject.
Purpose
What should the text accomplish? Persuade, entertain, instruct, move emotionally? The model needs a functional goal, not just a topic.
Voice / Register
First-person confessional, third-person omniscient, dry academic, warm conversational, lyrical poetic. Reference an author's voice if helpful: "in the style of Joan Didion's essays."
Structure
Three acts, five paragraphs, dialogue-heavy, fragmented vignettes, chronological, reverse-chronological. Structure shapes meaning as much as content does.
Constraints
Word count, reading level, must include / must avoid. "Under 300 words, no jargon, end on a question" is a complete brief.
Prompting for Music (Suno, Udio, Holly+)

AI music tools launched publicly in 2023–2024 — including Suno AI and Udio — responded to a distinct vocabulary. The critical dimensions are: genre (lo-fi hip hop, orchestral film score, Delta blues), tempo and energy (BPM ranges or descriptors like "driving" or "languid"), instrumentation (acoustic guitar, synth bass, string quartet), production era (1970s analog warmth, 2010s EDM production), and emotional arc (builds from melancholic to triumphant).

Holly Herndon and Mat Dryhurst's documentation for Holly+ also introduced the concept of vocal character descriptors — words describing the emotional quality of a voice performance itself, not just the song: "breathy and intimate," "operatic and powerful," "fragmented and hesitant."

Prompting for Video (Sora, Runway Gen-3)

Sora's technical report (February 2024) detailed a new prompting dimension that images don't require: temporal language — instructions about what happens over time. Camera movements (slow pan left, dolly zoom, handheld tracking), action sequences ("the figure walks toward the camera, then turns"), transitions ("cross-fade to dawn"), and duration all become critical prompt elements.

Runway's Gen-3 Alpha documentation (2024) added the concept of cinematographic references as the video equivalent of artist style modifiers: "in the visual language of Wong Kar-wai's In the Mood for Love" activates a specific palette, camera proximity, and temporal pacing that no list of descriptors could fully replicate.

❌ Weak Video Prompt
"a city at night"
No camera movement, no temporal arc, no cinematographic reference. The model defaults to a static shot of generic urban imagery.
✓ Strong Video Prompt
"slow dolly zoom into a rain-soaked Tokyo alley at 2am, neon reflections on wet cobblestones, a solitary figure disappearing around a corner, in the visual style of Blade Runner cinematography, 10 seconds, melancholic atmosphere"
Camera movement, duration, subject action, cinematographic reference, mood, and temporal specificity — all present.
The Transferable Principle

The specific vocabulary changes across media — but the underlying principle never does. In every medium, the difference between a weak prompt and a strong one is the same: specificity of intention + reference to established aesthetic vocabulary + clear constraints. Learn the vocabulary of each medium you work in, and your prompting will transfer.

Key Terms
Temporal LanguagePrompt vocabulary that describes what happens over time in video — camera moves, action sequences, transitions, and duration.
Vocal Character DescriptorWords that describe the emotional quality of an AI voice or vocal performance (e.g., "breathy and intimate") rather than just the song's genre.
Cinematographic ReferenceA film or director name used as a style anchor in video prompts, activating a specific palette, pacing, and camera language.
RegisterThe level of formality, tone, and vocabulary style appropriate to a specific audience and purpose in written text.

Lesson 4 Quiz

Prompting for Text, Music & Video — 4 questions
1. What new prompting dimension does video generation require that image generation does not?
Correct. Video requires temporal language — camera movements, action sequences, transitions, and duration — because the output unfolds over time rather than existing as a single frame.
Incorrect. The key new dimension for video is temporal language: instructions about what happens over time, including camera moves, action arcs, and duration.
2. Holly Herndon and Mat Dryhurst's Holly+ project was significant for AI prompting because:
Correct. Holly+ was notable for providing consented AI collaboration tools and publishing detailed guides that revealed how music prompting requires its own vocabulary: genre, tempo, instrumentation, vocal character, and emotional arc.
Incorrect. Holly+ was significant for offering a consented AI collaboration model and publishing professional prompting guides that mapped the unique vocabulary required for music generation.
3. Anthropic's Constitutional AI documentation (2023) found that language model outputs improved most dramatically when users explicitly specified:
Correct. Without audience, purpose, and tone, language models default to a neutral journalistic voice that serves nobody's creative vision — specifying these three dramatically narrows the output toward what's actually needed.
Incorrect. Anthropic's research found the three most impactful text prompt elements were audience (who is reading), purpose (what should it accomplish), and tone (what emotional register).
4. In video prompting, a "cinematographic reference" functions similarly to what element in image prompting?
Correct. Just as "in the style of Alphonse Mucha" anchors an image's visual grammar, "in the visual language of Wong Kar-wai's In the Mood for Love" activates a specific palette, camera proximity, and pacing in video generation.
Incorrect. A cinematographic reference is the video equivalent of a style modifier — it activates a whole cluster of aesthetic decisions (palette, camera work, pacing) just as an artist reference does for images.

Lab 4 — Cross-Media Prompting

Translate your creative idea into prompts for text, music, and video with your AI coach

Your Mission

Choose one creative concept — a scene, a feeling, a story — and work with your coach to develop three versions of a prompt for it: one for a language model, one for an AI music tool, and one for video generation. Discover how the vocabulary changes across media while the underlying principles stay the same.

Complete at least 3 exchanges to finish this lab.

Try starting with: "I want to create something around the feeling of [your concept]. Help me write prompts for text, music, and video versions of this idea."
Cross-Media Coach
Lab 4 · Multi-Medium Prompting
Welcome to Lab 4 — the cross-media prompting lab. I'm going to help you take a single creative concept and express it across three different AI media: a language model text prompt, an AI music prompt, and a video generation prompt. Each medium needs a different vocabulary, but the same creative intention can translate across all of them. Tell me about a concept, scene, or feeling you'd like to explore — and we'll build all three together.

Module 3 Test

Prompt Like a Pro Artist — 15 questions · 80% to pass
1. Which of the following best describes what a "prompt" is in the context of AI creative tools?
Correct. A prompt is a text input that gives the AI model direction — telling it what to create, in what style, and to what purpose.
Incorrect. A prompt is a text input — a creative brief that directs the AI's subject, style, mood, and technical parameters.
2. Jason Allen spent over 80 hours on his award-winning Midjourney submission primarily doing what?
Correct. Allen's extensive iteration — hundreds of words in his final prompt, numerous refinement rounds — demonstrated that skilled prompting is genuine creative work.
Incorrect. Allen spent over 80 hours iterating on prompts and refining outputs — his process was one of the first public demonstrations of prompting as a serious creative discipline.
3. In the six-layer prompt anatomy, which layer addresses how the frame is arranged — e.g., close-up, bird's-eye view, rule of thirds?
Correct. Composition directs how the scene is framed — shot type, camera angle, and spatial arrangement of elements within the image.
Incorrect. The Composition layer handles framing — establishing shots, close-ups, camera angles, and how subjects are arranged within the frame.
4. According to Midjourney's prompting guide, what is the function of "style anchors"?
Correct. Without style anchors, AI models default to a "visual median" — technically competent but belonging to nobody's specific aesthetic tradition.
Incorrect. Style anchors are style modifier references that keep outputs from defaulting to the generic visual average the model would produce without aesthetic direction.
5. The Carnegie Mellon study on style modifier stacking (2023) found that using more than four style modifiers:
Correct. Beyond four style modifiers, the model tries to average too many competing visual grammars, producing incoherent results rather than a strong unified style.
Incorrect. The study found that stacking more than four style modifiers led to diminishing returns and visual incoherence — less is more when stacking style references.
6. Karla Ortiz's lawsuit against Stability AI raised which ethical concern most directly relevant to AI prompting?
Correct. Ortiz demonstrated that her work had been used in training data without consent — raising the ethical and legal questions that platforms like Adobe Firefly later addressed by using only licensed training data.
Incorrect. Ortiz's case centered on the use of artists' work as AI training data without consent — a key concern that influenced how ethical prompters approach living artist references.
7. Which is the most ethical approach when seeking a specific visual style from an AI image generator?
Correct. Using historical movements (Art Nouveau, Impressionism) or describing visual qualities directly avoids the ethical complications of referencing living artists without their consent.
Incorrect. The ethical consensus is to use movement references, describe visual qualities, or reference historical artists — not to freely use living artists' names without their consent.
8. Refik Anadol described his MoMA "Unsupervised" project process as:
Correct. Anadol's description captures the essential nature of professional iteration — it is a dialogue, not a command, and each output teaches you what to ask for next.
Incorrect. Anadol described the process as "a conversation with the machine" — an iterative dialogue where months of refinements shaped the final installation.
9. What is "inpainting" in the context of AI image tools?
Correct. Inpainting allows targeted editing — you can fix a problematic area (like incorrect hands or wrong lighting on a face) without regenerating the entire image.
Incorrect. Inpainting is a targeted editing tool — it lets you modify a selected region of an existing generated image without changing the rest.
10. The "diverge-then-converge" technique was documented among concept artists at studios like ILM and Sony Pictures Imageworks. What does it prevent?
Correct. By first diverging broadly (generating many varied directions), then converging, artists avoid the creative trap of committing to the first plausible output rather than discovering the best possible one.
Incorrect. Diverge-then-converge prevents tunnel vision — the tendency to over-refine a single early direction before exploring what other approaches might have offered.
11. Penguin Random House's design studio required a minimum of how many iteration rounds before accepting AI-generated elements for production?
Correct. Their internal reviews showed outputs accepted after fewer than three refinement rounds almost always required significant post-production corrections — establishing three rounds as the minimum viable standard.
Incorrect. Penguin Random House required three iteration rounds minimum — fewer rounds consistently produced outputs that needed costly post-production fixes.
12. Anthropic's research on language model prompting found the three most impactful text prompt elements are:
Correct. Without these three, language models default to a neutral journalistic voice. Specifying audience (who's reading), purpose (what it should accomplish), and tone (emotional register) dramatically improves output quality.
Incorrect. Anthropic's research found audience, purpose, and tone to be the three highest-impact elements when prompting language models for creative or communicative writing tasks.
13. Holly Herndon and Mat Dryhurst's Holly+ project introduced which specific new concept to AI music prompting vocabulary?
Correct. Herndon and Dryhurst introduced vocabulary like "breathy and intimate" or "operatic and powerful" to describe voice performance quality — a layer of specificity beyond genre and instrumentation.
Incorrect. Holly+ documentation introduced vocal character descriptors — emotional quality words for voice performances (e.g., "fragmented and hesitant") that add a new layer to music prompt specificity.
14. OpenAI's Sora technical report (February 2024) highlighted what as the critical new prompting dimension for video?
Correct. Because video unfolds over time, temporal language becomes essential — instructions about what the camera does, how subjects move, how scenes transition, and how long everything lasts.
Incorrect. Sora's technical report highlighted temporal language as the critical new dimension for video: camera moves (dolly, pan, zoom), action arcs, transitions, and duration.
15. The core transferable principle across all creative media in prompting is:
Correct. Whether prompting for images, text, music, or video, the same underlying formula applies — the vocabulary changes, but the principle of intentional, specific, aesthetically grounded prompting transfers across every medium.
Incorrect. The transferable principle is: specificity of intention + aesthetic vocabulary references + clear constraints = strong outputs. This applies to every creative medium, with only the specific vocabulary changing.