In August 2022, Jason Allen submitted Théâtre D'Opéra Spatial to the Colorado State Fair Fine Arts Competition — and won first place in the Digital Arts category. The image, generated with Midjourney, sparked a national debate about AI and art. What almost no one discussed was the craft behind it: Allen spent over 80 hours iterating on prompts, upscaling, and refining before he submitted. The prompt itself was hundreds of words long and included specific artistic references, lighting descriptors, and compositional instructions.
The lesson wasn't that AI made art easy. It was that knowing how to direct the AI was itself a skill worth developing.
Most beginners treat a prompt like a Google search query — short, keyword-dense, hoping the AI figures it out. But AI image generators and language models are not search engines. They are instruction-following systems that respond to the precision, structure, and specificity of your input.
A weak prompt hands creative control entirely to the model's defaults. A strong prompt is a creative brief — it communicates subject, style, mood, technical parameters, and negative constraints all at once.
Research into prompting patterns — including the published workflow guides from Midjourney's own team (2023) and OpenAI's DALL-E documentation — reveals that the most effective prompts tend to stack six distinct layers of information:
Jason Allen's Colorado victory wasn't just about a long prompt. He later described how he used a narrative backstory — imagining a specific scene, a specific emotional purpose, a specific viewer — to guide every iterative refinement. The prompt became a script for a vision, not just a list of adjectives.
This is the 20% that separates competent prompters from skilled ones: knowing why you are making something, and letting that intention shape the words you choose.
The prompt is not a command. It is a collaboration brief. You are telling the AI what role it is playing, what world you are in, and what the emotional target is — before a single pixel is generated.
You're going to build a rich, six-layer prompt from scratch — then analyze and improve prompts together with your AI coach. Start by describing an image you'd like to create, and your coach will help you layer in subject, style, mood, lighting, composition, and technical details.
Have at least 3 exchanges with the coach to complete this lab.
In early 2023, the concept artist and educator Karla Ortiz — one of the plaintiffs in a landmark lawsuit against Stability AI — demonstrated publicly how AI image generators had been trained on her work without consent. During that same period, Adobe launched Firefly, trained exclusively on licensed and public-domain images. Adobe's team published documentation showing that Firefly users achieved dramatically different aesthetic results depending on whether they referenced Art Nouveau, Baroque, or Brutalist style modifiers — even when the subject was identical.
The implication was clear: style modifier vocabulary is a skill set, and those who understood art history had a measurable advantage in directing AI outputs.
When you include an artist name or art movement in a prompt, you are activating a cluster of visual patterns the model has learned from thousands of works in that style. You're not just adding a word — you're selecting a visual grammar: a set of rules about color, texture, brushwork, composition, and subject matter.
Midjourney's published prompting guide (2023) refers to these as "style anchors" — the references that keep an output from drifting toward generic averages. Without them, models default to a kind of visual median: competent, unremarkable, nobody's style.
The real skill is not knowing individual modifiers — it's knowing how to stack them without creating conflicting instructions. A 2023 study published by researchers at Carnegie Mellon found that AI image models responded most coherently when style modifiers were ordered from broadest to most specific: movement → artist → medium → rendering.
The same study found that stacking more than four style modifiers produced diminishing returns and increased visual incoherence — the model tried to satisfy too many simultaneous constraints.
The Karla Ortiz case raised a genuine ethical question: is it appropriate to use a living artist's name as a style reference? The debate continues in courts and communities. Several AI platforms — including Adobe Firefly and Nightcafe — have moved toward discouraging or blocking references to specific living artists without consent.
As a practical guide: referencing art movements (Impressionism, Art Deco) or deceased historical artists is widely accepted. Referencing living working artists is ethically contested. When in doubt, describe the visual qualities directly: "loose impressionistic brushwork with warm tonality" rather than a specific living person's name.
When the gaming studio Riot Games began exploring AI concept art tools in 2023, their art directors published internal guidelines requiring artists to use movement and medium references only — never specific living artists — to avoid ethical and legal complications while still achieving precise aesthetic direction. The discipline of describing visual qualities rather than copying named styles became a core prompt-writing competency on the team.
Build your own personal "modifier vocabulary list" — 10–15 style anchors that consistently produce results you love. Knowing that "Syd Mead retro-futurism" or "Constable pastoral landscape" reliably works for you is more valuable than knowing every modifier that exists.
Work with your coach to build a personal style modifier vocabulary. Pick a subject or theme you care about, then experiment with different movement, medium, era, and rendering modifiers. Ask for comparisons, explanations, and recommendations.
Complete at least 3 exchanges to finish this lab.
When Refik Anadol, the Turkish-American media artist, was commissioned to create Unsupervised for the Museum of Modern Art in New York (displayed January 2023), he and his studio trained a custom model on MoMA's entire collection of over 200 years of art data. But the generative output wasn't simply turned on and displayed — Anadol's team spent months iterating on the input parameters, style weights, and temporal controls that shaped each flowing, morphing visualization.
The final installation was the result of thousands of refinement decisions. Anadol described the process in interviews as "a conversation with the machine" — each output teaching the team what to ask for next. Iteration wasn't a phase of the project — it was the whole project.
Most beginners treat the first AI output as either a success or a failure. Professionals treat it as information. What did the model interpret literally? What did it invent? What is surprisingly good that you should preserve? What is off that you need to correct?
This shift in mindset — from "hoping the AI gets it right" to "using what the AI gave me to understand what to ask for next" — is the single most important transition in becoming a skilled prompter.
One specific professional technique is diverge-then-converge iteration — used extensively by concept artists working with AI tools at studios like ILM and Sony Pictures Imageworks (as documented in the 2023 Visual Effects Society survey on AI adoption).
The process: generate 4–8 variations with intentionally different style or composition parameters, select the best 2, then begin converging — each iteration narrowing the target. This prevents the "tunnel vision" trap of fixating too early on one visual direction.
"I changed the prompt completely because the first one didn't work." This is the most common beginner error. If the output is 70% right, preserve the 70% — refine only what is wrong. Wholesale replacement discards all the information your first generation gave you.
Iteration principles apply equally to text-based AI outputs — writing, code, analysis. In 2023, researchers at Stanford published findings showing that users who gave specific, targeted feedback to language models ("the second paragraph is too formal — rewrite it in a conversational tone") achieved better results in fewer exchanges than users who said "make it better" or rewrote the entire prompt.
The same principle applies to image generation: specific, targeted refinement beats wholesale replacement.
At Penguin Random House's design studio, book cover designers who began incorporating AI tools in 2023 developed internal workflows that required a minimum of three iteration rounds before any AI-generated element could be considered for production use. The rule wasn't arbitrary — internal reviews found that outputs accepted after fewer than three refinement rounds almost always required costly post-production fixes.
Start with a weak, underspecified prompt. Work with your coach to diagnose what's missing, then make one targeted change at a time over at least three iterations. Track what each change would improve.
Complete at least 3 exchanges with the coach to finish this lab.
In April 2023, Holly Herndon and Mat Dryhurst launched Holly+ — a public AI model trained on Herndon's voice, designed for collaboration. Rather than resisting AI music tools, they published detailed prompting guides for musicians who wanted to create new vocal performances in Herndon's style with her consent. Their documentation was among the first professional-grade prompt guides for AI music generation, and it revealed a crucial insight: prompting for music requires a completely different vocabulary than prompting for images — genre, tempo, key, instrumentation, emotional arc, and production style each play the role that style modifiers play in visual work.
That same year, OpenAI's Sora video model (unveiled February 2024) demonstrated that camera movement, duration, transition style, and narrative arc became the critical prompting dimensions for video — an entirely new layer that images don't require.
When prompting a language model for creative writing, the six-layer anatomy translates as follows: Subject becomes the topic or premise; Style becomes voice, register, and literary influences; Mood becomes emotional tone; Composition becomes structure (paragraph length, POV, narrative arc); Technical becomes format constraints (word count, reading level, whether to include dialogue).
Research published by Anthropic in their 2023 Constitutional AI documentation found that language model outputs improved most dramatically when users specified three elements explicitly: the audience, the purpose, and the tone. Without these, the model defaults to a "neutral journalistic voice" that suits nobody's creative vision.
AI music tools launched publicly in 2023–2024 — including Suno AI and Udio — responded to a distinct vocabulary. The critical dimensions are: genre (lo-fi hip hop, orchestral film score, Delta blues), tempo and energy (BPM ranges or descriptors like "driving" or "languid"), instrumentation (acoustic guitar, synth bass, string quartet), production era (1970s analog warmth, 2010s EDM production), and emotional arc (builds from melancholic to triumphant).
Holly Herndon and Mat Dryhurst's documentation for Holly+ also introduced the concept of vocal character descriptors — words describing the emotional quality of a voice performance itself, not just the song: "breathy and intimate," "operatic and powerful," "fragmented and hesitant."
Sora's technical report (February 2024) detailed a new prompting dimension that images don't require: temporal language — instructions about what happens over time. Camera movements (slow pan left, dolly zoom, handheld tracking), action sequences ("the figure walks toward the camera, then turns"), transitions ("cross-fade to dawn"), and duration all become critical prompt elements.
Runway's Gen-3 Alpha documentation (2024) added the concept of cinematographic references as the video equivalent of artist style modifiers: "in the visual language of Wong Kar-wai's In the Mood for Love" activates a specific palette, camera proximity, and temporal pacing that no list of descriptors could fully replicate.
The specific vocabulary changes across media — but the underlying principle never does. In every medium, the difference between a weak prompt and a strong one is the same: specificity of intention + reference to established aesthetic vocabulary + clear constraints. Learn the vocabulary of each medium you work in, and your prompting will transfer.
Choose one creative concept — a scene, a feeling, a story — and work with your coach to develop three versions of a prompt for it: one for a language model, one for an AI music tool, and one for video generation. Discover how the vocabulary changes across media while the underlying principles stay the same.
Complete at least 3 exchanges to finish this lab.