Photography and AI · Module 4 · Lesson 1

How Generative Fill Actually Works

From diffusion models to masked regions — understanding the engine beneath the brush.

When Adobe demonstrated Generative Fill live at its MAX conference, the audience watched a photographer erase a crowded tourist from the Colosseum and replace him with contextually appropriate stone pavement — in under eight seconds. The crowd gasped. But the mechanism behind that moment had been in development for years, rooted in a branch of machine learning called latent diffusion modeling that had nothing to do with photography at all.

Understanding why generative fill works so convincingly — and where it still fails — requires going one layer deeper than the magic show.

What Is Inpainting?

Inpainting is the process of reconstructing missing or masked regions of an image so that the result is visually coherent with the surrounding content. The term predates AI entirely — conservators used it for centuries to restore damaged paintings. In digital imaging it first appeared in academic computer vision literature in 2000, when Bertalmío, Shapiro, Sapiro and Caselles published their landmark paper "Image Inpainting" demonstrating a PDE-based approach for filling small gaps.

Early digital inpainting was purely algorithmic: the system sampled pixels from nearby regions and tiled or blended them inward. Photoshop's Content-Aware Fill, introduced in CS5 in 2010, used a patch-based algorithm (drawing from research by Barnes et al. on PatchMatch) that searched the image for visually similar texture regions and synthesized a fill. This worked well for simple backgrounds like grass, sky, and water but failed on structured or semantic content — faces, architecture, text, complex objects.

The shift to deep-learning-based inpainting began in earnest around 2018 with NVIDIA's paper "Image Inpainting for Irregular Holes Using Partial Convolutions" (Liu et al., 2018), which showed that convolutional neural networks could fill arbitrarily shaped holes with semantically plausible content. But those early models still had telltale artifacts at mask boundaries and limited resolution.

Latent Diffusion: The Engine Inside Modern Tools

Modern generative fill — as implemented in Adobe Firefly, Stable Diffusion's inpainting pipeline, DALL-E 3's edit mode, and Google's Magic Eraser — is built on latent diffusion models (LDMs). The foundational paper, "High-Resolution Image Synthesis with Latent Diffusion Models" by Rombach et al. (CompVis group, 2022), introduced the core insight: instead of running the expensive diffusion process in pixel space, compress the image into a lower-dimensional latent space using a variational autoencoder (VAE), diffuse there, and decode back to pixels.

For inpainting, the model receives three inputs simultaneously: the original image (encoded to latent), a binary mask (1 = region to fill, 0 = region to preserve), and an optional text prompt. The masked region is set to noise in latent space, and the denoising U-Net iteratively removes that noise, conditioned on both the surrounding unmasked latents and the text embedding. The preserved region exerts structural and color pressure on what gets generated in the filled region, which is why good generative fill respects lighting, perspective, and texture continuity.

Adobe Firefly's implementation adds one more layer: the model was trained exclusively on licensed Adobe Stock images, openly licensed content, and public domain material — a deliberate choice made in response to ongoing litigation against other AI image providers. This training data decision shapes the aesthetic and capability range of Firefly's fills.

Technical Note — The Mask Matters

The shape, feathering, and placement of your selection mask directly influences output quality. Diffusion inpainting models use the mask boundary to infer what should grow inward. A hard, pixel-precise mask on a complex edge (hair, foliage, transparent glass) gives the model contradictory structural signals. Slightly feathering or expanding the mask by 2–4 pixels almost always yields better edge blending — the model needs a transition zone to interpolate between preserved and generated content.

Key Terms and Distinctions

InpaintingFilling a masked interior region of an existing image with AI-generated content that is contextually consistent with surroundings.

OutpaintingExtending an image beyond its original canvas boundaries — the model generates new content outside the original frame, conditioned on the edges of the existing image.

Generative FillAdobe's branded implementation of text-prompt-guided inpainting inside Photoshop, powered by the Firefly model. First released as a public beta in May 2023.

Diffusion GuidanceThe process by which a text prompt (encoded via CLIP or similar) steers the denoising trajectory toward a semantically matching output.

VAE (Variational Autoencoder)The encoder-decoder pair that compresses pixel images into latent representations and reconstructs them. In Stable Diffusion, the VAE operates at an 8× spatial compression ratio.

Real-World Constraint

Generative fill excels at removing objects against consistent backgrounds, extending skies, and adding generic environmental elements. It struggles with highly specific subject identity (a particular person's face, a trademarked logo, a precise architectural detail), with complex transparency, and with regions where physical consistency matters critically — reflections in water, cast shadows from a removed object, or glass refractions. Knowing this shapes the decision of whether AI fill is the right tool or whether manual compositing will be required.

Lesson 1 Quiz

3 questions — free, untracked, retake anytime.

What is the primary technical architecture underlying Adobe Firefly's Generative Fill and most modern AI inpainting tools?

✓ Correct. Latent diffusion models, introduced by Rombach et al. (2022), compress images into latent space via a VAE, run denoising there, and decode back to pixels — this architecture powers Firefly, Stable Diffusion inpainting, and DALL-E edits.

✗ Modern generative fill tools use latent diffusion models. GANs were dominant earlier but produced more artifacts; PatchMatch is the older Content-Aware Fill approach; partial convolutions were a 2018 step forward but are not the current generation.

Adobe deliberately trained Firefly only on licensed Adobe Stock, openly licensed, and public domain images. What was the primary reason for this decision?

✓ Correct. Adobe made this training data choice directly in response to ongoing lawsuits against image AI companies (including Getty Images vs. Stability AI) and to make Firefly commercially safe for professional use.

✗ The decision was legal and commercial, not technical. Adobe needed Firefly to be defensible against copyright claims — lawsuits against Stability AI and others made the industry acutely aware of training data provenance.

Why does slightly feathering or expanding a selection mask by 2–4 pixels typically improve generative fill results at object edges?

✓ Correct. The mask boundary is where the model negotiates between what exists and what must be invented. A transition zone (soft edge) gives it structural information from the existing image to blend against rather than a hard binary cut.

✗ Feathering the mask creates a transition zone at the boundary — the diffusion process can interpolate smoothly between the unmasked content and the generated fill, rather than facing a hard binary edge that produces seam artifacts.

Lab 1 — Inpainting Mechanics

Discuss how latent diffusion inpainting works and how masks affect output quality.

Understanding the Diffusion Inpainting Pipeline

In this lab you'll talk through the technical underpinnings of generative fill with an AI tutor. Ask about how masks are processed, how the VAE works in context, why text prompts guide fill direction, and where the algorithm tends to fail. The goal is to move from surface-level familiarity to genuine conceptual understanding.

Try asking: "Walk me through exactly what happens step-by-step when I use generative fill to remove a person from a photo — from the moment I draw the mask to the final pixel output."

AI Lab Assistant Inpainting Mechanics

Photography and AI · Module 4 · Lesson 2

Masking Strategies and Selection Techniques

The quality of your fill is determined before the AI ever runs — it's determined by your mask.

In 2023, National Geographic publicly reaffirmed its longstanding ban on AI-generated or AI-altered content in its editorial photography, explicitly calling out generative fill as a tool that violates the documentary integrity of its images. The ban underscored something photographers often miss: every masking decision is also an editorial decision. What you choose to include, exclude, or replace is not a neutral technical choice — it is authorship. Developing masking fluency means understanding not just the how, but the when.

The Four Masking Modes in Photoshop

Photoshop's AI-backed selection tools have improved dramatically since the introduction of Select Subject (2018, powered by Adobe Sensei) and the dedicated Object Selection Tool (2019). In the context of generative fill, four modes are most relevant:

1. Object Selection Tool (Rectangular/Lasso refinement): Adobe Sensei analyses the image and identifies distinct objects. You click or drag over a region, the model detects the object boundary, and returns a pixel-level mask. In 2023, Adobe upgraded this with a "Hover" mode where objects highlight as you move your cursor — without any click. This works well for well-defined foreground objects against clean backgrounds.

2. Select Subject: A one-click operation that masks the dominant subject of the entire image. Useful for portraits and product shots, less reliable for group scenes or images with multiple competing subjects. Internally it uses the same Sensei segmentation model as Object Selection.

3. Select and Mask Workspace with AI Refinement: Accessed via the Refine Edge / Select and Mask dialog, this mode allows manual correction with the Refine Edge Brush — which is specifically tuned for hair, fur, and foliage. It uses a separate edge-aware model to detect fine strands and semi-transparent edges that solid selections miss.

4. Quick Mask + Manual Paint: The oldest method, fully manual. You paint a mask directly in Quick Mask mode using brushes. Total control, highest time cost. Remains indispensable for complex compositional edits where automated selection repeatedly misfires.

Mask Expansion, Feathering, and the Fill Boundary Problem

Once a selection is made, three parameters directly affect generative fill quality: expansion, feathering, and smoothing. The Modify > Expand command (or the Expand field in Select and Mask) grows the selection outward — critically important when filling in a removed object, because you want to include the shadow fringe and edge pixels that belong to the removed element. Leaving those edge pixels unmasked causes a ghosting halo in the final output.

Feathering adds a soft gradient to the mask edge — the selection transitions from 100% to 0% across a specified pixel radius. For most generative fills, a feather of 2–5 pixels is appropriate for mid-sized objects. Very large objects on complex backgrounds may benefit from 8–12 pixels of feather. Zero feathering produces a hard cut and almost always creates visible seaming.

Smoothing reduces the jaggedness of selection edges — particularly important when using lasso tools on curved surfaces. A smoothing value of 2–4 pixels prevents the serrated edge artifact that appears when hard jagged masks are used with diffusion models, which generate content at a continuous resolution.

Critical Technique — Mask the Shadow Too

When removing a physical object from a scene, its shadow remains in the image after masking just the object itself. Generative fill cannot logically infer that a shadow belongs to a now-absent object — it will attempt to preserve and integrate it as meaningful scene content. Always extend your mask to include the object's cast shadow. If the shadow falls across a complex surface (carpet texture, pavement cracks, a second subject), mask it with Quick Mask mode for precision — Object Selection will rarely detect a shadow as belonging to the same masking group as the object.

Context Sensitivity: What the Model Uses to Fill

Generative fill models do not randomly invent content — they condition on a context window drawn from the pixels surrounding the masked region. In Stable Diffusion's inpainting pipeline (used by many third-party tools), this context is drawn from the full image resized to 512×512 or 1024×1024 depending on model version. Adobe Firefly processes context differently, operating at higher native resolution with its own architecture.

Practically, this means the spatial relationship between your mask and surrounding content matters. If you mask the center of a plain blue sky, the model sees predominantly sky-blue context and will fill with contextually appropriate sky. If you mask a region that spans a horizon line — half sky, half water — the model must simultaneously satisfy two different surface types and may produce inconsistent results or visible fill seams. In such cases, splitting the mask into two separate fills, one above and one below the horizon, produces significantly better results.

Text prompts, when provided, add a third input dimension: semantic guidance. "Cobblestone road" tells the model what the fill region should depict, and it will attempt to reconcile that description with the structural context. Leaving the prompt blank instructs the model to generate purely from surrounding context — useful for object removal on consistent backgrounds, but risky on complex scenes where you want specific replacement content.

Workflow Note

Always work on a duplicate layer or a Smart Object before applying generative fill in Photoshop. Generative Fill in Photoshop 2024 non-destructively creates a new "Generative Layer" automatically, but in earlier versions and in third-party tools this protection is not guaranteed. The fill is not easily undone once baked into a flattened layer, and you will often want to compare multiple fill variations — Photoshop's Generative Fill generates three variations by default for exactly this reason.

Lesson 2 Quiz

3 questions — free, untracked, retake anytime.

When removing a physical object from a scene using generative fill, what is the most common masking error photographers make?

✓ Correct. The shadow of a removed object remains in the scene and the model cannot infer it belonged to the absent subject — it will treat it as legitimate scene content. The shadow must be masked separately.

✗ The most common error is forgetting to mask the object's cast shadow. The model sees the shadow as valid scene content and tries to preserve it, producing an obviously incorrect fill where a shadow exists without any casting object.

Why does splitting a generative fill that spans a horizon line into two separate fills — one above, one below — typically produce better results?

✓ Correct. When a mask spans two distinct surfaces (sky and water, for example), the model must simultaneously satisfy two different texture and lighting regimes, which often results in inconsistent fill content or seaming artifacts at the boundary.

✗ Splitting the fill is about context consistency. A fill conditioned primarily on sky context generates sky-appropriate content. One conditioned on water context generates water-appropriate content. Together they look natural; forcing the model to do both simultaneously often fails.

National Geographic reaffirmed its ban on AI-altered editorial photography in 2023, specifically citing generative fill. What principle does this most directly reflect?

✓ Correct. National Geographic's position is that altering the content of a documentary photograph — even technically skillfully — violates the viewer's trust that the image represents reality. The capability of the tool is irrelevant; the epistemological claim of the image is what matters.

✗ National Geographic's concern is documentary integrity, not technical quality or legal ambiguity. Adding or removing elements from a news or nature photograph changes the factual claim the image makes to viewers — which is the core ethical issue.

Lab 2 — Masking Strategy Clinic

Describe real or hypothetical masking scenarios and get expert guidance on technique.

Diagnosing and Solving Mask Problems

Present specific masking challenges to the AI tutor and work through the optimal selection strategy. Describe the scene (subject, background complexity, edge type), what you've tried, and what went wrong. The tutor will walk through which Photoshop masking tool to use, what mask parameters to set, and how to sequence multiple fills when a single fill fails. Shadows, reflections, hair edges, and transparent objects are all fair game.

Try asking: "I need to remove a person standing in front of a busy brick wall. They have curly hair and their shadow falls across the wall behind them. What's my masking strategy step by step?"

AI Lab Assistant Masking Strategy

Photography and AI · Module 4 · Lesson 3

Prompt Writing for Generative Fill

The text prompt is not a description — it's a directive. The difference matters enormously.

In April 2023, World Press Photo withdrew a prize from Spanish photographer Álvaro Moriño after an investigation found that AI generation — specifically inpainting-style content synthesis — had materially altered the image. The photograph of a flock of flamingos had areas of background sky that investigators concluded were AI-extended. The prompt that generated those pixels left no metadata trace. It was the invisible prompt — the one nobody sees — that undid the image's credibility.

This case established a precedent: in photojournalism, provenance of every pixel now matters. But in commercial and creative photography, the prompt is simply the most powerful creative control available to the artist. Knowing how to write it well is an essential skill.

How Prompts Guide the Diffusion Process

When you provide a text prompt to generative fill, it is encoded by a text encoder — typically CLIP (Contrastive Language-Image Pre-training) or a successor like OpenCLIP or T5 — into a vector embedding. That embedding is passed as conditioning information into the cross-attention layers of the denoising U-Net at every denoising step. The model uses it to steer the direction of denoising toward images that are semantically consistent with the prompt.

Critically, the prompt competes with the structural context from surrounding pixels. If you mask out a person in front of a fireplace and write "brick wall," the model will attempt to satisfy both the brick wall prompt and the spatial cues from the fireplace surround, mantelpiece, and ambient orange light. The result may be a brick wall that is unnaturally warm and irregularly shaped — the structural context wins on local spatial consistency, the prompt wins on surface texture and subject matter. Understanding this competition is key to writing prompts that work with context, not against it.

Empty prompts (no text) tell the model to rely entirely on context. This is ideal for simple background removal (grass, sky, water, plain walls) and often worse for anything requiring specific replacement content. The choice between empty and text-prompted fill is itself a craft decision.

Prompt Anatomy for Fill Contexts

Effective fill prompts follow a different grammar than text-to-image generation prompts. In full image generation, you describe a complete scene. In fill, you describe only the region being generated — and that region must integrate with a surrounding scene it cannot change. This shifts prompt strategy significantly:

Surface-first language: Begin with the physical surface or material, not the mood. "Weathered concrete wall with horizontal crack lines" is more useful than "gritty urban backdrop" because the model needs to place a texture, not interpret an aesthetic.

Lighting continuity hints: If the surrounding image is lit from the left with warm afternoon sun, include "warm side lighting from the left" in your prompt. The model cannot see the existing light direction — it infers it from pixel gradients in the surrounding context, but a prompt that reinforces this inference produces more consistent shadow and highlight directionality in the fill.

Avoid over-specification on details that must match context: Don't specify "red brick" if the surrounding wall is orange-toned — the model will attempt to generate genuinely red bricks that clash with the existing color environment. Allow color to be controlled by context; use prompts to control structure and surface type.

Negative-space description: Firefly and some other tools accept both positive and negative guidance. "No people, no text, no signage" removes common fill artifacts in urban scenes. In Stable Diffusion, negative prompts are an explicit parameter; in Photoshop's Generative Fill, the prompt field accepts natural-language phrasing that includes exclusions.

Tested Prompt Patterns

Object removal on grass: Leave blank or use "grass lawn, natural daylight" — context usually sufficient.
Extending architectural background: "Continuation of [material], matching perspective, no new objects" — explicitly signals structural continuation.
Adding environmental elements: "Low morning fog drifting through pine trees, soft diffused light" — specific, physical, lighting-aware.
Removing distracting signage: "Bare painted wall, flat surface, slight ambient shadow" — tells model what IS there, not just what isn't.

Iterating Through Fill Variations

Adobe Photoshop's Generative Fill generates three variations by default per prompt, accessible via the Properties panel. Before accepting any fill, examine all three — they often differ significantly in texture, object placement, and boundary integration. The best strategy is to generate once, review all three, then generate again with a refined prompt if none are acceptable. Do not simply accept the first variation.

When variations are consistently unsatisfactory in the same way (always too dark, always placing an unwanted object near the edge, always producing a seam at the bottom), the problem is usually the mask, not the prompt. Expand, feather, or shift the mask boundaries and regenerate before changing the prompt text. Mask geometry and prompt text address different failure modes — diagnosing which is causing the problem is a key professional skill.

Some practitioners keep a prompt log — a simple text document recording which prompt/mask combinations produced acceptable results for common fill scenarios (sky extensions, foliage gaps, interior wall removal, crowd thinning). Because generative fill uses stochastic sampling, you cannot reproduce an exact output, but you can reproduce the conditions that reliably produce acceptable outputs.

Metadata and Disclosure

Adobe Firefly embeds Content Credentials (C2PA standard) metadata into images when generative fill is used. This includes a machine-readable record that AI content was added and a cryptographic signature. Export to JPEG or PNG preserves this metadata by default in Photoshop 2024. Some publications — including the Associated Press, which published updated AI guidelines in August 2023 — now require photographers to declare any AI-assisted alterations. Understanding that your prompt choices create a traceable record changes the ethical calculus of how you use the tool.

Lesson 3 Quiz

3 questions — free, untracked, retake anytime.

When writing a prompt for generative fill to replace a removed person in front of an orange-toned brick wall, which approach is most likely to produce well-integrated results?

✓ Correct. Specifying color in the prompt when the surrounding environment already implies a particular color tone creates conflict. Prompts should guide surface structure and type; the model's contextual conditioning from surrounding pixels handles color harmonization more reliably.

✗ If you specify "red brick" when surrounding walls are orange-toned, the model will generate genuinely red bricks that clash with the scene. Use prompts to describe surface structure and let the surrounding pixel context control color matching.

The World Press Photo investigation into Álvaro Moriño's flamingo image in 2023 led to what specific precedent for photojournalism?

✓ Correct. The case established that even technically invisible AI modifications — content that blends seamlessly with surrounding pixels — constitute a material alteration of a documentary image and violate the standards of photojournalism competitions.

✗ The precedent was about provenance and documentary integrity: any AI-generated pixels in a photojournalistic image, whether visible or not, constitute a violation. The invisibility of the modification made it more troubling, not less.

When Photoshop's Generative Fill consistently produces fills that are too dark across all three variations, what is the most likely root cause to investigate first?

✓ Correct. Consistent tonal bias across multiple variations almost always indicates a mask/context issue, not a prompt issue. If the surrounding unmasked pixels are dark (shadow areas, underexposed regions, dark walls), they will tonally condition the fill toward darkness regardless of prompt content.

✗ When all variations share the same flaw, suspect the mask and context, not the prompt or server quality. Dark surrounding pixels create dark context conditioning — the model is working correctly given what it sees. Adjust the mask boundaries to include more correctly-exposed surrounding content.

Lab 3 — Prompt Engineering for Fill

Write and refine fill prompts for specific scenes with expert feedback.

Crafting Prompts That Work With Context

Describe specific fill scenarios to the AI tutor and workshop your prompt text together. The tutor will evaluate your drafts, explain why certain phrasings work or fail, suggest specific wording improvements, and help you build a repertoire of reliable prompt patterns for common photographic scenarios. Bring real problems: sky extensions, architecture removals, crowd thinning, product background cleanup.

Try asking: "I'm filling in a gap where I removed a chain-link fence from a garden scene. The surrounding area is lush green grass with some dappled afternoon sunlight. What prompt should I use?"

AI Lab Assistant Fill Prompt Engineering

Photography and AI · Module 4 · Lesson 4

Advanced Applications: Outpainting, Sky Replacement, and Crowd Control

Moving beyond removal — using generative fill to build, extend, and transform the image world.

When Adobe launched automated Sky Replacement in Photoshop 2021 (version 22.0), it used a luminosity masking algorithm rather than generative AI — the sky was detected by tonal range, a replacement sky image was composited in, and lighting effects were applied to the foreground to simulate the color cast of the new sky. It was technically impressive but demonstrably artificial: if the replacement sky's cloud formations were lit from the right and the foreground was lit from the left, no automatic correction could reconcile the contradiction.

By 2023, generative outpainting offered a different approach: rather than replacing a detected region with a library image, the model synthesized continuation of the existing sky — extending whatever real sky existed outward. The tool worked with the photograph's actual lighting conditions rather than against an imported image's incompatible lighting. For the first time, expanding a frame felt like photography rather than compositing.

Outpainting: Extending the Frame

Outpainting in Photoshop is accessed by expanding the canvas beyond the image boundary (Image > Canvas Size) and then selecting the empty area for generative fill. The model conditions on the edge pixels of the existing image and extends them outward. This can be used to change an image's aspect ratio (expanding a 3:2 image to 16:9 for widescreen use without cropping), to add visual breathing room around a tightly composed shot, or to recover a poorly framed image where a critical element is partially cut off at the edge.

Key constraint: outpainting performs well when edges are consistent in texture and structure (plain sky, uniform wall, flat ground). It struggles when edges contain complex, partially-visible elements — a partially cropped face at the image edge, a building whose structural geometry is cut mid-window, or a waterfall whose direction implies continuation not compatible with the edge geometry. In these cases, the model must invent the continuation of a structure it cannot see the full context of, and the result is often geometrically inconsistent.

The practical workflow for frame expansion involves multiple incremental outpainting passes rather than one large expansion. Expanding 15–20% per pass, then using that new content as context for the next pass, produces significantly better results than attempting a 100% expansion in a single step. The incremental approach ensures each new passage has context from the previous generation rather than only from the original image edge.

Sky Replacement via Generative Fill

Sky replacement using generative fill differs from Photoshop's dedicated Sky Replacement tool in a fundamental way: instead of compositing a separate image, you mask the existing sky and prompt the model to generate a new one in its place. This means the generated sky is synthesized to match the color temperature, horizon color gradient, and structural boundary of the specific image — it emerges from the photograph's context rather than being imported into it.

Effective sky replacement via generative fill requires a precise sky mask. The boundary between sky and non-sky content is often the most complex edge in landscape photography: tree branches, power lines, rooftop details, and atmospheric haze all create semi-transparent transitions. Photoshop's Select Sky command (introduced in Photoshop 2021) uses a dedicated segmentation model for sky detection and is more reliable than general-purpose Object Selection for this task. After auto-selection, refinement with the Refine Edge Brush at tree canopy boundaries is nearly always necessary.

Prompt recommendations for sky fills: specify time of day, weather condition, and cloud type specifically. "Golden hour, scattered cirrus clouds, deep blue zenith" produces dramatically different results than the generic "dramatic sky." Because the model generates sky conditioned on the horizon line of the existing image, dramatic lighting in the prompt will be synthesized to match the horizon gradient already present — a technically elegant behavior that makes sky replacement far more convincing than traditional compositing methods.

Technique — Crowd Thinning and Event Photography

One of the most commercially valuable applications of inpainting in editorial and event photography is crowd thinning — selectively masking individual people from a scene to create a cleaner, less cluttered image. The technique requires masking each person individually (not in a group mask) so the model can fill each gap with contextually appropriate background. Group masking of multiple people in a complex scene forces the model to invent too large a region at once, producing visible tiling artifacts. Process one person per fill operation, starting with those furthest from the camera (smallest in frame), and work toward the foreground — this ensures each subsequent fill has more realistic context from the previous fills already in place.

Ethical Boundaries in Commercial and Editorial Use

The Associated Press updated its AI policy in August 2023 to explicitly prohibit altering the editorial content of news photographs using AI tools, including "the addition, alteration, or removal of content within the frame." The same policy permits AI tools for adjusting overall technical quality — noise reduction, sharpening, color correction — that do not alter what the image depicts.

In commercial photography — advertising, product photography, real estate, stock — the constraints are reversed. Clients routinely commission generative fill to remove distracting elements, extend scenes to fit new aspect ratios, replace overcast skies with blue skies for real estate listings, and add seasonal elements to lifestyle imagery. The ethical question in commercial contexts is not documentary integrity (there is no documentary claim) but disclosure: consumers seeing a real estate listing with an AI-generated blue sky replacing a real overcast sky are receiving implicitly misleading information about typical weather conditions. Several countries are developing advertising standards guidance on AI-altered commercial imagery.

Professional photographers navigating this landscape need to operate under different standards simultaneously depending on client type and use context. A photojournalist and a product photographer may use identical tools under completely different ethical frameworks — understanding which framework applies is as important as mastering the tools themselves.

Workflow Summary — The Complete Generative Fill Process

1. Diagnose whether fill or manual compositing is the right tool for this edit. 2. Duplicate the layer or ensure you're working non-destructively. 3. Create the selection using the most appropriate Photoshop tool for the edge type. 4. Expand the selection 2–4px, apply appropriate feathering and smoothing. 5. Include cast shadows and reflections in the mask. 6. Write a context-aware prompt: surface-first, lighting-consistent, color-agnostic. 7. Generate three variations; evaluate all before accepting. 8. If all variations share the same flaw, modify the mask — not the prompt. 9. Document your prompt for future reference. 10. Check Content Credentials metadata and declare AI use per client or publication requirements.

Lesson 4 Quiz

3 questions — free, untracked, retake anytime.

When using outpainting to significantly expand an image's canvas, why is performing multiple incremental passes (15–20% per step) better than a single large expansion?

✓ Correct. Incremental outpainting means each new generation is conditioned on both the original image edge AND all the previously generated content. A single large expansion is conditioned only on the narrow original edge, forcing the model to invent large regions with insufficient structural context.

✗ The incremental approach is about context quality. Each small pass generates content that then becomes context for the next pass — the model has progressively more surrounding information to work with. A single large pass is conditioned only on a thin strip of original pixels.

The Associated Press's 2023 AI policy permits AI use for noise reduction and sharpening but prohibits generative fill for content alteration. What distinguishes these two uses?

✓ Correct. The AP's distinction rests on whether the edit changes the factual claim of the image. Noise reduction makes existing content clearer — it doesn't add or remove depicted elements. Generative fill changes what is and isn't in the scene, which alters the photograph's documentary truth.

✗ The AP's principle is about the documentary claim of the image. Technical adjustments (noise, sharpness, color) reveal or clarify existing content. Generative fill adds, removes, or replaces depicted content — it changes what the image claims was present in front of the camera.

For crowd thinning in event photography using generative fill, why should you mask and fill people individually rather than masking the entire crowd at once?

✓ Correct. A large group mask removes much of the scene's structural context in one pass — the model must invent a large area from a thin perimeter of reference. Individual fills keep the generation region small and surrounded by real context, producing far more convincing results. Processing from background to foreground ensures each fill also has prior generated content as additional context.

✗ The issue is context area versus masked area ratio. When you mask a large crowd simultaneously, you remove large swaths of the scene that would normally provide context. The model has too little surrounding information relative to the region it must generate, resulting in visible tiling and structural inconsistencies.

Lab 4 — Advanced Applications Workshop

Plan outpainting, sky replacement, and crowd thinning workflows for real-world scenarios.

Planning Complex Generative Fill Workflows

Bring your most complex generative fill challenges to this lab. You might be planning a sky replacement for a real estate shoot, figuring out how to extend a portrait for a new aspect ratio, or working through the ethics of AI alteration for a specific client context. The AI tutor can walk you through step-by-step workflows, compare tool choices (Sky Replacement vs. generative fill, Photoshop vs. Lightroom Denoise), and discuss the ethical and disclosure considerations that apply to your specific use context.

Try asking: "I need to turn a 3:2 landscape portrait into a 16:9 widescreen image for a hotel website banner. The original has a stone cliff on the left and open sea on the right. Walk me through the outpainting workflow."

AI Lab Assistant Advanced Workflows

Module 4 Test

15 questions · 80% to pass · Covers all four lessons

1. Latent diffusion models compress images before running the diffusion process. What is the spatial compression ratio used by the VAE in the original Stable Diffusion architecture?

✓ Correct. The VAE in the original Stable Diffusion architecture uses an 8× spatial compression ratio — a 512×512 pixel image becomes a 64×64 latent representation.

✗ The original Stable Diffusion VAE compresses at 8× spatial ratio — a 512×512 image becomes a 64×64 latent tensor. This efficiency is the core innovation of LDMs over pixel-space diffusion.

2. Adobe Photoshop's Content-Aware Fill, introduced in CS5 (2010), used which underlying algorithm?

✓ Correct. Content-Aware Fill used PatchMatch (Barnes et al.) — an approximate nearest-neighbor algorithm that finds visually similar texture patches elsewhere in the image and synthesizes a fill by blending them.

✗ Photoshop's original Content-Aware Fill (2010) used PatchMatch — a patch-based algorithm that samples similar texture regions from elsewhere in the image. Partial convolutions came in 2018; latent diffusion in 2022.

3. In the generative fill workflow, which three inputs does a latent diffusion inpainting model receive simultaneously?

✓ Correct. The inpainting model receives: the full image encoded into latent space, a binary mask indicating which regions to regenerate, and an optional text prompt for semantic guidance. These three inputs together determine the fill output.

✗ Inpainting models take three primary inputs: the full image in latent space, a binary mask (1 = fill, 0 = preserve), and an optional text prompt. The text prompt provides semantic steering while the masked latents and surrounding context provide structural conditioning.

4. Photoshop's "Hover" selection mode for the Object Selection Tool — where objects highlight as you move your cursor without clicking — was added in which year?

✓ Correct. The hover-to-highlight object detection capability was introduced in 2023 as part of Photoshop's ongoing AI-powered selection improvements powered by Adobe Sensei.

✗ The hover mode (objects highlight on mouseover without clicking) was added in 2023. The Object Selection Tool itself debuted in 2019, and Select Subject has been available since 2018.

5. When a generative fill spanning a horizon line consistently produces a visible seam at the boundary between sky and water, what is the recommended corrective approach?

✓ Correct. Splitting the fill means each operation is conditioned on a single consistent surface type. The sky fill is conditioned on sky context; the water fill on water context. This avoids the conflict that occurs when a single fill must satisfy two incompatible surface textures simultaneously.

✗ Prompting "seamless" or adding feather doesn't solve the fundamental issue: a single fill spanning two different surface types receives contradictory contextual conditioning. Split the fill at the horizon — one for sky, one for water — so each fill is conditioned on a single consistent surface type.

6. NVIDIA's 2018 paper "Image Inpainting for Irregular Holes Using Partial Convolutions" represented which specific advance over prior methods?

✓ Correct. Partial convolutions allowed deep learning inpainting to handle irregular mask shapes — not just rectangular holes. Previous deep learning approaches required rectangular masks; the partial convolution architecture treated masked and unmasked regions differently in each convolution operation.

✗ The key advance in Liu et al. (2018) was handling arbitrarily shaped (irregular) masks with convolutional networks. Prior deep learning inpainting required rectangular holes. Partial convolutions modified the convolution operation to treat masked and unmasked pixels differently, enabling irregular hole filling.

7. What standard does Adobe use to embed machine-readable provenance metadata into images processed with Firefly generative fill?

✓ Correct. Adobe embeds Content Credentials following the C2PA (Coalition for Content Provenance and Authenticity) standard. This includes a cryptographically signed manifest indicating AI content was added, enabling downstream verification.

✗ Adobe uses the C2PA (Coalition for Content Provenance and Authenticity) standard, branded as Content Credentials, to embed provenance metadata into Firefly-processed images. This creates a cryptographically verifiable record of AI involvement.

8. In the context of writing prompts for generative fill, what does "surface-first language" mean?

✓ Correct. Surface-first language means starting your prompt with the physical material ("weathered concrete," "grass lawn," "polished marble") rather than a mood ("gritty," "natural," "elegant"). The model needs to place a texture, not interpret an aesthetic.

✗ Surface-first language means describing the physical material or texture that should fill the region before any mood, aesthetic, or style qualifiers. "Weathered brick wall with horizontal crack lines" is surface-first; "gritty urban backdrop" is mood-first and gives the model less precise structural guidance.

9. The World Press Photo investigation that withdrew Álvaro Moriño's prize in 2023 examined a photograph of what subject?

✓ Correct. The image that prompted the investigation was a photograph of flamingos. Investigators concluded that background sky regions had been AI-synthesized in a way that materially altered the image's content.

✗ The award-winning image was of flamingos. The investigation concluded that background sky regions contained AI-generated content — a finding that led to the prize withdrawal and established an important precedent for documentary photography standards.

10. For crowd thinning in event photography, you should process individuals in what order?

✓ Correct. Processing background subjects first means that when you fill a foreground subject's mask, the background behind them has already been correctly filled — providing accurate context for the foreground fill operation.

✗ Background first is the correct order. Filling background figures first means that when you process foreground figures (whose masks may partially obscure the background behind them), the background content is already realistically filled and available as context for the foreground fills.

11. Adobe's Sky Replacement tool (introduced Photoshop 2021/v22.0) and generative fill sky replacement differ primarily in that:

✓ Correct. Sky Replacement composites a separate pre-made sky image onto the photograph (adjusting some color/light effects). Generative fill synthesizes a new sky conditioned on the photograph's own edge pixels and horizon gradient — the generated sky emerges from the image's context rather than being imported.

✗ The fundamental difference is source: Sky Replacement imports an external sky image and composites it in. Generative fill synthesizes a new sky conditioned on the existing image's horizon line, color temperature, and atmospheric context — making it inherently more consistent with the specific photograph's lighting conditions.

12. Which Photoshop selection tool is specifically recommended for sky masking due to a dedicated segmentation model trained for sky detection?

✓ Correct. Photoshop's Select Sky command (introduced with the Sky Replacement feature in 2021) uses a dedicated model specifically trained on sky-to-non-sky boundaries and outperforms the general-purpose Object Selection Tool for this specific task.

✗ The Select Sky command uses a model specifically trained on the complex, semi-transparent boundary between sky and non-sky content — trees, rooflines, power lines. For sky-specific masking, it outperforms the general-purpose Object Selection Tool, which wasn't trained for this type of boundary.

13. If you provide no text prompt at all when using Adobe Generative Fill, what does the model rely on to determine what to generate?

✓ Correct. An empty prompt removes semantic text guidance entirely, leaving the model to condition on structural and tonal information from surrounding pixels only. This works well for simple, consistent backgrounds but poorly for complex replacements requiring specific subject matter.

✗ No prompt means no text conditioning. The model fills the region based entirely on what it infers from the surrounding pixel context — textures, colors, structural patterns at the mask boundary. This is often the best choice for removing objects from simple, consistent backgrounds like plain sky or grass.

14. National Geographic's policy on AI-altered photography is best described as:

✓ Correct. National Geographic has a blanket ban on AI-generated or AI-altered content in its editorial photography, which it reaffirmed publicly in 2023 specifically calling out generative fill as incompatible with its documentary standards.

✗ National Geographic's policy is a complete ban — no AI-generated or AI-altered content in editorial photography, regardless of medium, authorship, or disclosure. The policy was specifically reaffirmed in 2023 in response to the proliferation of generative fill tools.

15. When all three Photoshop Generative Fill variations consistently produce fills with the same tonal or structural flaw, the correct diagnostic response is to:

✓ Correct. Consistent flaws across multiple variations indicate systematic conditioning bias from the mask and context, not stochastic variation in generation. The mask determines what context the model sees — changing it changes the conditioning. Changing only the prompt cannot fix structural context problems.

✗ When all variations share the same flaw, that flaw is systematic — it's coming from the context the mask geometry is feeding to the model, not from random variation in generation. Prompt changes address semantic content; mask changes address the structural and tonal context the model works from. Consistent flaws require mask diagnosis.