In early 2023, indie studio Obsidian Games producer Matt Singh publicly described using Midjourney to generate hundreds of environment concept thumbnails in a single afternoon — a task that previously required three weeks of junior artist time. The images were never shipped; they were creative fuel, narrowing the direction before a single painterly hour was spent. The industry paused to notice.
That same year, Roblox Corporation announced it was embedding generative AI directly into its Studio editor, letting creators describe terrain and objects in plain language. The tools were real, shipping, and reshaping what a one-person studio could produce.
Three platforms dominate game-art AI workflows in 2024: Midjourney, Stable Diffusion (via local installs and platforms like Automatic1111 or ComfyUI), and Adobe Firefly. Each occupies a distinct niche.
Midjourney excels at mood and concept — its outputs are painterly, cinematic, and fast. Version 6, released December 2023, dramatically improved prompt coherence and text rendering. Studios like Ubisoft have acknowledged using it for early-stage concepting, though finished art still goes through human artists for IP-alignment and legal clarity.
Stable Diffusion is open-source and runs locally, making it the choice for studios concerned about IP ownership and data privacy. Custom fine-tuned models (LoRAs) allow teams to train the generator on their own art style so outputs match a franchise's visual language. The indie RPG Sable developer Shedworks used diffusion-based tools to explore palette variations before committing to their cel-shaded look.
Adobe Firefly launched in 2023 with a commercially safe training dataset, making it attractive for studios that need clean IP provenance — particularly those publishing on storefronts with strict content-origin policies.
AI image tools slot into four documented stages of game-art production:
1. Concepting & mood boards. Designers generate dozens of direction thumbnails in minutes. The Fortnite team at Epic has described using AI-generated reference packs to align art direction discussions before any polished work begins.
2. Texture generation. Tools like Stable Diffusion + ControlNet can generate seamless tileable textures from a text prompt or a rough sketch. The result is fed into Substance Painter or Unreal Engine's material editor as a starting point. This cuts texture-creation time from hours to twenty minutes for base layers.
3. Sprite and icon iteration. For mobile games with hundreds of inventory icons, studios generate initial drafts with image AI, then hand-correct for consistency. Pocket-sized studios with one artist can now ship icon sets that would have required an outsourced art team.
4. Environment silhouette blocking. Using ControlNet's depth or edge maps, designers can upload a grey-box screenshot of a level and receive photorealistic or stylised reference versions, helping level designers and art directors communicate visual targets before polished passes begin.
Activision's November 2023 job postings explicitly listed "experience with generative AI texture tools" as a preferred qualification for senior technical artists — the first major AAA publisher to make this expectation explicit in public listings. This signals that AI texture workflows moved from experimental to expected within a single production cycle.
AI-generated art carries documented risks. In February 2023, the US Copyright Office ruled that AI-generated images without meaningful human creative input are not copyrightable — a ruling with direct implications for studios that ship AI art as final assets. Most legal teams now require human creative selection and modification as a documented step.
Consistency is a second limitation: current generators struggle to reproduce a specific character's face reliably across scenes. Franchises with strong character identity — like The Witcher or God of War — still depend entirely on human artists for character work. AI handles environments and props more reliably than characters.
The human artist's role has shifted rather than disappeared. Art directors now spend more time curating, directing, and refining AI outputs than producing from scratch. This is a real skill change, and studios are actively training existing staff in prompt engineering and AI tool integration.
LoRA (Low-Rank Adaptation): A fine-tuning technique that trains a small set of weights on top of a base model, teaching it a specific art style without retraining from scratch. Used to align diffusion outputs to a studio's visual identity.
ControlNet: A plugin for Stable Diffusion that constrains generation using structural inputs — depth maps, edge maps, poses — allowing precise control over composition, making it practical for game-asset workflows.
You are an art director at a small indie studio. Your team of two artists is building a 2D action RPG set in a decaying underwater city. You have access to Midjourney and a locally-running Stable Diffusion install with ControlNet. You need to use AI tools strategically — for concepting and texture drafts — while keeping your artists focused on character work and final polish.
In this lab, discuss prompt strategies, workflow decisions, and tool choices with your AI assistant. Think through how you would actually use these tools on a real project.
When AI Dungeon launched in 2019, it was a curiosity — a GPT-2-powered text adventure that hallucinated freely and charmed players with its incoherence. By 2023, the conversation had matured: Inworld AI raised $50 million to build production-grade NPC dialogue engines used by studios including Niantic, and Nvidia shipped ACE (Avatar Cloud Engine) — a real-time LLM-powered dialogue system demonstrated live at Computex 2023 with a shopkeeper NPC named Jin who responded dynamically to player questions.
The shift was from novelty to infrastructure. Writers and narrative designers now work alongside tools that can generate, vary, and localise dialogue at a scale no human team could match.
Nvidia ACE (Avatar Cloud Engine), demonstrated publicly in June 2023, embeds a fine-tuned LLM into NPC characters. The Jin demonstration showed a shopkeeper capable of remembering prior player statements within a session, offering contextually relevant shop recommendations, and refusing ethically inappropriate requests — all in real time. The technology is licensed as a middleware SDK, not a game-specific feature.
Ubisoft's Ghostwriter tool, revealed in March 2023, is an internal AI system trained on the studio's own writing style guides. It generates first-draft "barks" — short ambient NPC lines like combat callouts or idle chatter — which human writers then edit, select, and approve. Ubisoft explicitly positioned it as a tool to free writers from repetitive first-draft work, not as a replacement for narrative staff.
Square Enix published a 2023 white paper exploring LLM use for branching dialogue trees, describing internal experiments where GPT-4 generated plausible branch variations from a single authored trunk line — a technique that could dramatically expand perceived narrative breadth without proportional writing cost.
LLMs generate plausible text, not authored text. The distinction matters enormously in narrative games. Disco Elysium's dialogue is distinctive precisely because every line was written with a specific psyche in mind. LLM outputs trend toward the median of their training data — competent, inoffensive, and tonally bland unless heavily constrained and guided by human writers.
Narrative designers working with LLM tools report that the real skill is prompt engineering and output curation, not raw generation. A well-constructed system prompt that encodes a character's speech patterns, knowledge limits, emotional state, and relationship to the player can produce usable drafts. Without that scaffolding, the output is generic.
Consistency across a long game is a documented failure point. LLMs lack persistent memory beyond their context window. An NPC that learned the player's name in Act 1 will not remember it in Act 3 without explicit memory-injection systems — an engineering challenge that tools like Inworld AI are specifically designed to address with session and long-term memory modules.
Ubisoft's La Forge research team built Ghostwriter specifically to handle NPC barks — the hundreds of short, contextual lines ambient characters speak during gameplay. Ghostwriter generates multiple variations from a human writer's seed line, and writers choose, discard, or edit. The tool reportedly reduced bark production time by over 50% in internal tests, while keeping all final content under explicit human authorial approval. This is the documented model most studios now consider: AI generates volume, humans apply craft.
One area where LLMs provide clear, uncontested value is localisation drafting. A major AAA title may have 500,000 words of dialogue requiring translation into 12 languages. Human translators working from LLM-generated first drafts — rather than from scratch — can dramatically reduce cost and cycle time. CD Projekt Red and Electronic Arts have both acknowledged using AI assistance in localisation pipelines, with human translators reviewing and correcting all outputs before any text ships.
The same logic applies to dialogue volume expansion. A quest with five authored responses can be expanded to fifty variations using LLM generation plus human curation — giving players the experience of a more responsive world without a proportional increase in writing budget.
NPC Barks: Short, contextual ambient lines spoken by non-player characters during gameplay — combat callouts, idle chatter, environmental reactions. High-volume, low-individual-complexity content that AI tools handle well.
Context Window: The maximum amount of text an LLM can process in a single interaction. Characters or plot details introduced outside the context window are "forgotten" unless explicitly re-injected — a core limitation for long-form narrative AI.
You are a narrative designer on a fantasy RPG. You need to generate bark variations for a gruff blacksmith NPC who distrusts magic users, has a dry sense of humour, and speaks in clipped sentences. You also need to consider how to keep this character's voice consistent if an LLM powers their real-time dialogue.
Practice writing system prompts that encode character voice, and discuss with your assistant how to structure LLM dialogue systems for narrative consistency across a long game.
Hello Games' No Man's Sky, launched in 2016, used algorithmic procedural generation to create 18 quintillion planets — but every algorithm was hand-authored. By 2023, a new generation of tools began using machine-learned models trained on human-designed levels to generate content that felt more authored, less random. The distinction is significant: rule-based proc-gen produces variation within constraints; ML-driven generation learns the shape of good design.
In 2023, Airship Syndicate's Wayfinder used a hybrid system: human designers authored key encounter rooms, and a learned model filled connecting corridors and variation zones — cutting layout production time while preserving authored feel in critical spaces.
Procedural generation exists on a spectrum. At one end: rule-based systems — explicit algorithms that place tiles, enemies, or loot according to designer-written rules. Spelunky, Minecraft, and Dead Cells all use this approach. Outputs are varied and often surprising, but the possibility space is defined entirely by what designers explicitly coded.
At the other end: ML-driven generation, where a neural network is trained on a corpus of existing levels and learns to produce new ones that statistically resemble the training data. The network infers what "good level design" looks like from examples rather than explicit rules. This produces more naturalistic layouts but requires substantial training data and can reproduce biases or patterns from the training corpus that designers don't intend.
The middle ground — and the current practical norm in shipping games — is hybrid systems: ML or learned heuristics guide macro-scale layout decisions (room connectivity, biome transitions, difficulty pacing), while hand-authored modules fill the actual playable spaces. This gives designers control over quality and feel while using AI to handle combinatorial layout work.
PCG via LLMs (Generative AI for level scripting): In 2023, multiple research papers from industry teams demonstrated using GPT-4 to generate Unreal Engine Blueprint logic from natural language descriptions. A designer could describe "a room where the lights flicker when the player enters and an enemy spawns from the ceiling" and receive working Blueprint nodes. This is not shipped as a commercial tool widely yet, but Unreal Engine's own AI-assisted Blueprint features, introduced in beta in late 2023, move in exactly this direction.
Wave Function Collapse (WFC): Not ML, but widely used and often confused with AI — WFC is a constraint-satisfaction algorithm that generates tilemaps by learning adjacency rules from a sample image. Used in games including Bad North and various indie roguelikes, it produces aesthetically coherent maps from minimal designer input. It demonstrates how algorithmic tools labelled "AI" can ship in production with reliability ML systems currently struggle to match.
Reinforcement Learning for playtesting: EA's SEED research lab has published work on training RL agents to play-test levels, identifying softlocks, impossible difficulty spikes, and navigation dead-ends faster than human QA teams. This is AI in level design's QA phase rather than generation, but it directly shapes what gets designed — designers receive automated feedback on whether a layout is traversable before any human plays it.
EA's SEED (Search for Extraordinary Experiences Division) published research in 2022–2023 demonstrating reinforcement learning agents that could complete obstacle courses, navigate procedurally generated levels, and identify stuck-points — running thousands of playthroughs in hours. This RL-driven QA approach has been integrated into internal tooling at EA, allowing level designers to receive automated traversability reports before any human QA session begins.
The central design tension in AI-driven level generation is the authored feel problem: players can often sense when a space was generated rather than designed. Generated dungeons in early roguelikes felt corridor-y and generic precisely because algorithms lacked the intentionality that human designers bring — the sense that each space was placed with a specific experience in mind.
Modern hybrid approaches address this by reserving ML generation for structural scaffolding and using human-authored modules for key experiential moments. The boss arena is always hand-designed. The connecting tunnels between it and the start room can be ML-generated. This preserves the emotional peaks while using AI to handle structural volume.
Designer tools like Promethean AI — used by studios including Respawn Entertainment — take a different approach: the AI suggests asset placement within a human-designed space, learning from the designer's own prior decisions to recommend items, props, and decorations that match the space's established aesthetic. The designer approves or rejects each suggestion. This is AI as a creative collaborator rather than an autonomous generator.
Wave Function Collapse (WFC): A constraint-satisfaction algorithm that generates tilemaps by observing adjacency rules in a sample input and producing outputs that obey those rules. Fast, deterministic, and used in shipped games — often mislabelled as "AI."
Promethean AI: An AI design assistant tool that learns a designer's aesthetic preferences from their existing work and suggests asset placement, environment dressing, and prop combinations. Used by Respawn Entertainment and other AAA studios.
You are a level designer on a roguelite dungeon crawler. Your game needs 50+ unique dungeon layouts per run, but your team of two designers can only hand-author 20 key rooms (boss arenas, story beats, unique encounters). You need to design a hybrid system: hand-authored critical rooms plus AI/procedural generation for connecting spaces, ensuring the result still feels designed rather than random.
Discuss system architecture, the authored-feel problem, and how to set constraints so generated spaces match your game's pacing and aesthetic with your AI assistant.
In 2023, indie developer Thomas Brush (Pinstripe, Neversong) publicly documented his shift to an AI-assisted pipeline for his next project. His toolkit combined Midjourney for environment concept generation, ChatGPT for first-draft dialogue and NPC backstories, and a locally-run Stable Diffusion model fine-tuned on his own past art for texture iterations. The result: a one-person studio producing at the pace of a three-person team on his previous titles. The human still made every final decision. The AI handled the volume.
This is the model now accessible to any serious solo developer. The question is no longer whether to integrate AI tools — it is how to structure the integration so it accelerates production without compromising quality or creating legal exposure.
A practical concept art pipeline for a solo or small team follows three stages: AI generation → human refinement → asset extraction.
Stage 1 — Midjourney (or Stable Diffusion) for mood and direction. Use the generator to produce 20–50 thumbnails of environments, characters, or UI elements. The goal is not finished art — it is direction elimination. You are ruling out what your game does not look like. This process takes an afternoon rather than a week. Select 3–5 images that resonate with your creative vision.
Stage 2 — Photoshop refinement. Import selected images into Photoshop (or Affinity Photo, Krita, etc.) and paint over them. Fix anatomy, adjust colour palettes, add game-specific elements that the generator cannot know. This is where the human artist's judgment creates the actual style. The AI output is a starting point, not an endpoint. Studios that skip this step ship art that looks "AI-generated" — recognisable by players and lacking coherence.
Stage 3 — Asset extraction. From the refined paintings, extract the specific assets needed for the engine: isolated sprites, UI elements, environment tiles. Tools like Adobe's Remove Background or manual masking in Photoshop handle separation. The result enters the engine as a human-refined, AI-assisted asset — legally defensible and visually coherent.
Narrative content benefits from a similar structure: ChatGPT (or similar) for first drafts → human editing for voice and craft.
For NPC dialogue, quest descriptions, journal entries, and ambient world-building text, LLMs can generate a complete first draft in minutes. A writer who would spend three hours producing 500 words of NPC barks can review, cut, and rewrite an LLM draft to the same result in 45 minutes — if they approach it as an editor rather than a generator.
The critical practice is providing the LLM with a detailed character sheet or style guide before generating. A prompt that specifies "this character is a former military engineer who speaks in clipped sentences, distrusts magic, and always mentions practical solutions" produces far more usable output than an open prompt. The human writer's craft lives in the system prompt and the editing pass, not in typing every line from scratch.
For branching dialogue trees, use AI to generate variation branches off a human-authored trunk line. Author the key moments yourself; use AI to fill the combinatorial variation that makes the game feel responsive.
Level design workflow using AI follows: PCG draft → human polish.
Tools like Wave Function Collapse, ML-based generators, or even LLM-driven Blueprint scripting (as explored in Lesson 3) can produce a structural draft of a level — room connectivity, corridor layout, rough enemy placement. This draft is the scaffolding, not the building. A human level designer then plays through the draft, identifies the moments that work, and rebuilds the rest around them.
The time saving is in avoiding the blank-page problem. Starting from a generated draft that is 40% usable is faster than starting from nothing, even if that 40% requires significant rework. The remaining 60% that is reworked is shaped by a human who understands pacing, narrative context, and the specific experience goals of the game — things no current generator can internalize.
Every studio using AI-generated assets needs a documented IP policy. The February 2023 US Copyright Office ruling established that AI-generated images without meaningful human creative input are not copyrightable. This creates a practical risk: if your shipped assets are too close to raw AI output, you may not hold copyright — and competitors can use them freely.
The practical response is documented human modification. Keep source files showing your Photoshop paint-over layers. Maintain a record of which assets began as AI drafts and what human work was applied. This paper trail supports any copyright claim if challenged. It also satisfies the requirements of most commercial storefronts (Steam, Epic Games Store, Apple App Store) that have begun asking about AI content provenance in submission forms.
On attribution: current practice varies. Some studios disclose AI use in press materials voluntarily; others do not. If your game's marketing includes statements about hand-crafted art, be accurate. Player trust and media reputation are at stake, and several indie developers have faced significant backlash for misrepresenting AI-generated content as fully human-made.
The most useful framework for thinking about AI integration in a design workflow is the 80/20 rule: AI handles 80% of the generative volume work, and the human provides the 20% of judgment and refinement that makes the output good.
This framing resists two failure modes. The first is AI avoidance — treating all AI use as a creative compromise and manually generating everything from scratch at significant cost in time and budget. The second is AI abdication — shipping raw AI outputs without human curation, producing inconsistent, stylistically incoherent, and potentially legally vulnerable assets.
The 80% that AI handles well: initial generation, variation production, first drafts, structural scaffolding, research synthesis, and combinatorial exploration. The 20% that only humans handle well: final artistic judgment, brand and franchise consistency, narrative voice and authenticity, emotional calibration, legal and ethical decision-making, and the integration of an asset into a coherent whole experience.
Solo developers and small teams who internalize this division find that AI tools do not replace their creative role — they amplify their output. A two-person art team thinking with the 80/20 model can produce the asset volume of a five-person team, while reserving their human hours for the decisions that actually define the game's identity.
80/20 Rule (AI Workflow): A practical heuristic for AI tool integration — AI handles 80% of generative volume work (drafts, variations, scaffolding), while the human provides the 20% of judgment, refinement, and creative decision-making that determines final quality.
IP Provenance: Documentation of the origin and human modification history of creative assets. Required for copyright protection of AI-assisted work and increasingly requested by commercial platforms in content submission forms.
You are a solo developer planning a 2D action RPG. You have six months to build a vertical slice — a fully playable 20-minute demo demonstrating core art style, gameplay mechanics, and narrative tone. Your budget is $0 for outsourcing. Map out how you will use AI tools across your concept art, narrative, and level design pipelines while ensuring the final product has a coherent identity, defensible IP, and no obvious "AI-generated" aesthetic.
Use this lab to plan your specific workflow, identify which tasks AI handles and which require your direct judgment, and think through the attribution and legal layer.