Photography and AI · Module 3 · Lesson 1

How AI Upscaling Actually Works

From nearest-neighbor guessing to neural networks that hallucinate believable detail — the physics and math behind making small images large.

In 2017, researchers at Google Brain published a paper called "Pixel Recursive Super Resolution" that contained a quietly startling demonstration. They took a tiny 8×8 pixel portrait of a face — essentially a smear of color — and asked their neural network to produce a plausible 32×32 reconstruction. The network had never seen the original 32×32 image. What it produced was not a blur. It was a face: nose, eyes, lip line, skin texture — all synthesized from statistical patterns learned across millions of training images. The original photograph's subject did not necessarily look like the reconstruction. The AI had not recovered lost data. It had invented plausible data. That distinction — recovery versus invention — defines the entire modern field of AI-driven upscaling.

The Fundamental Problem with Enlarging Images

A digital photograph is a fixed grid of pixels. Each pixel stores red, green, and blue intensity values. When you double an image's dimensions, you need four times as many pixels — but the original file only gave you one quarter of that information. Something must fill the gap. Classical approaches to this problem are called interpolation algorithms, and they operate by estimating unknown pixel values from known neighbors.

Nearest-neighbor interpolation simply duplicates the closest known pixel, which is fast but produces the blocky "pixelation" effect visible when you zoom too far into a JPEG. Bilinear interpolation averages the four surrounding pixels, producing smoother but often blurry transitions. Bicubic interpolation — the default in Photoshop's legacy enlarge function — samples 16 surrounding pixels using a weighted curve, which preserves edge sharpness better but still cannot add genuine detail that wasn't in the original capture.

All classical methods share one limitation: they work only with the pixels they have. They are mathematical extrapolations within a closed system. They cannot know that a blurry patch near someone's eye is probably an eyelash, or that a grey horizontal smear at the bottom of a building is likely a window ledge. That knowledge requires visual understanding — and visual understanding is precisely what deep learning models provide.

Convolutional Neural Networks and Super-Resolution

The modern era of AI upscaling began in 2014 when Chao Dong and colleagues at CUHK published SRCNN (Super-Resolution Convolutional Neural Network), the first end-to-end deep learning approach to single-image super-resolution. SRCNN was trained by taking high-resolution images, downsampling them to create low-resolution versions, and then training the network to reconstruct the originals. After processing millions of such pairs, the network learned to recognize patterns associated with edges, textures, and structures — and to synthesize plausible high-frequency detail when that detail had been removed.

The key breakthrough came with Generative Adversarial Networks (GANs) applied to super-resolution. In 2017, Ledig et al. introduced SRGAN, which pitted a generator network (trying to produce convincing upscaled images) against a discriminator network (trying to detect which images were real). The adversarial training loop pushed the generator to produce images that looked photographically real at the perceptual level, not merely images that scored well on pixel-by-pixel accuracy metrics. This is why SRGAN outputs often look sharper and more detailed than bicubic enlargements — even when the added detail is technically fabricated.

Subsequent architectures — ESRGAN (Enhanced SRGAN, 2018), Real-ESRGAN (2021), and Topaz Labs' proprietary models — refined this approach further, training on real-world degraded images rather than synthetically downsampled ones to handle JPEG compression artifacts, noise, and lens blur simultaneously with upscaling.

What "4× Upscaling" Actually Means

When a tool advertises "4× AI upscaling," it means the output image has linear dimensions four times larger than the input — meaning 16 times as many pixels in total. A 1000×750 photograph becomes a 4000×3000 image. The AI must synthesize 15 out of every 16 pixels in the final image from statistical inference, not from recorded light data.

This has significant implications for photographic truth. The texture of skin in an AI-upscaled portrait is not the texture of that person's skin — it is the texture that the network's training data says skin at that scale probably looks like. For most commercial and editorial uses, this distinction is inconsequential. For forensic analysis, scientific imaging, or legal evidence, it is critical. Courts in several jurisdictions have begun requiring disclosure when submitted photographs have been processed by AI upscaling tools, precisely because the evidentiary chain of custody is broken once synthetic pixels are introduced.

Key Distinction

Classical interpolation estimates from existing data. AI upscaling synthesizes from learned patterns. The first is math. The second is inference. Understanding which process was applied to an image matters for how that image can be used and trusted.

Key Terms

Super-Resolution (SR)The class of techniques that increase image resolution beyond the original capture, recovering or synthesizing high-frequency detail.

GANGenerative Adversarial Network — two neural networks trained in opposition; the generator creates images while the discriminator judges their realism, improving both.

Bicubic InterpolationClassical upscaling using a 4×4 pixel neighborhood and a smooth weighting curve; produces soft enlargements without genuine detail synthesis.

Perceptual LossA training objective that measures image similarity using features extracted by a pre-trained network rather than raw pixel differences, enabling more visually realistic outputs.

Lesson 1 Quiz

3 questions — free, untracked, retake anytime.

What is the fundamental limitation shared by all classical interpolation methods (nearest-neighbor, bilinear, bicubic)?

✓ Correct. Classical methods extrapolate within the closed system of existing pixels. They cannot infer that a blurry patch is probably an eyelash or that a gradient is part of a building ledge — that requires learned visual understanding.

✗ The key constraint is about data, not hardware or file format. All classical methods are bounded by the pixel information that already exists in the file — they cannot synthesize genuinely new detail from training knowledge.

SRGAN introduced a major advance over earlier CNN-based super-resolution. What was it?

✓ Correct. SRGAN's generator-discriminator loop trained the network to produce images that looked photographically convincing at a perceptual level, not merely images that minimized mean squared pixel error. This produced sharper, more textured outputs.

✗ SRCNN (2014) first applied convolutional networks to SR. SRGAN's specific advance was the adversarial training objective — pitting generator against discriminator — which optimized for perceptual realism rather than pixel-level accuracy.

In a 4× AI upscale, approximately what fraction of the final image's pixels are synthesized by the AI rather than recorded by the camera?

✓ Correct. A 4× linear upscale increases total pixel count by 16×. The original image provides only 1 in 16 of those pixels — the remaining 15 out of 16 must be inferred or synthesized by the model.

✗ Think about it geometrically: 4× linear dimensions means 4×4 = 16× total pixels. The original image provides 1/16 of them, so the AI synthesizes 15/16 — roughly 94% of the final image's pixel values.

Lab 1: Understanding Upscaling Algorithms

Explore the mechanics of interpolation vs. AI synthesis with your AI assistant.

What's Happening Under the Hood

In this lab you'll interrogate the AI assistant about the technical mechanics of image upscaling — from classical interpolation math to GAN training dynamics. Push into specifics: ask about loss functions, training data requirements, failure modes, and why perceptual metrics matter more than PSNR for photographic work.

Try asking: "Why does minimizing mean squared error produce blurry upscales even when it gets the pixel values technically 'right'? And how does perceptual loss fix this?"

AI Lab Assistant Photography & AI · Upscaling Mechanics

Photography and AI · Module 3 · Lesson 2

AI Photo Restoration: Faces, Damage, and the Ethics of Reconstruction

DeepMind's work on historical archives, the colorization of wartime photography, and what it means to "restore" an image that no longer exists in its original form.

In 2020, MyHeritage launched Deep Nostalgia, a feature powered by D-ID's video synthesis technology. Within days of its launch, millions of users were animating photographs of deceased relatives — making century-old portraits blink, turn their heads, and smile. The tool processed images of historical figures including Anne Frank, Abraham Lincoln, and Nikola Tesla. Holocaust survivors' children watched their parents' wartime photographs move for the first time. The response was simultaneously described as profoundly moving and deeply unsettling. Critics noted that the animated faces were statistical composites — the movement patterns were borrowed from a library of living faces performing scripted gestures. None of the generated motion reflected anything the photographed person had actually done. The line between restoration and fabrication had effectively dissolved.

That same year, the British newspaper The Guardian used AI colorization tools to publish colorized versions of World War I photographs alongside their original greyscale counterparts. The colorization was performed by algorithms trained on millions of color photographs — the colors assigned to uniforms, mud, skin, and sky were plausible generalizations, not historically verified hues. Several historians objected that colorization produces a false intimacy, making distant historical events feel contemporary in ways that subtly distort how viewers interpret them.

The Architecture of Face Restoration

Face restoration is a specialized branch of image super-resolution that takes advantage of the strong prior knowledge we have about human faces. Because faces share consistent geometric structure — eyes above nose above mouth, bilateral symmetry, predictable proportions — AI models can be trained on face-specific datasets and achieve dramatically better results on portrait photographs than general-purpose upscalers achieve on arbitrary images.

The dominant approach, developed in tools like GFPGAN (Generative Facial Prior GAN) released by Tencent's ARC Lab in 2021, works by extracting a "facial prior" from a pre-trained face generation model (specifically StyleGAN2, trained on FFHQ — 70,000 high-quality human face photographs). When a degraded face image is fed into GFPGAN, the model uses the StyleGAN2 prior as a reference — essentially asking: "given this blurry, noisy face, which latent code in StyleGAN's learned face space does it most likely correspond to?" The corresponding high-fidelity face from that latent space is then blended with the input to produce the restoration.

The practical result is remarkable: photographs with severe JPEG compression, motion blur, or significant degradation can be sharpened into portraits with convincing skin texture, visible pores, and clear iris detail. But the identity fidelity is limited by the prior — the restored face reflects the distribution of faces in FFHQ, which skews toward contemporary subjects photographed in studio conditions. Restoring a daguerreotype of a person born in 1840 will produce a face shaped partly by 21st-century face norms embedded in the training data.

Photo Colorization: Science or Interpretation?

Automatic colorization of black-and-white photographs uses models trained to predict per-pixel color values (in the Lab color space) from greyscale luminance data. The pioneering work by Zhang et al. at UC Berkeley in 2016 — "Colorful Image Colorization" — framed colorization as a classification problem rather than a regression problem, predicting a probability distribution over possible colors rather than a single "best" color. This produced more vibrant, less muddy results because the model embraced color uncertainty rather than averaging it away.

Modern colorization tools including DeOldify (open-source, used by MyHeritage and numerous news organizations) use a NoGAN training approach developed by Jeremy Howard and Jason Antic, training the colorizer partly on video frames to improve temporal consistency. The results are often photorealistic enough that colorized historical photographs published without disclosure are frequently mistaken for original color photography.

This creates a documentation problem. A colorized photograph of a soldier's uniform in a particular shade of olive green carries implicit information about the actual color of that uniform — but the color was inferred by a model trained largely on post-war photographs. No historian has verified it against surviving fabric samples or period accounts. The visual authority of photography — its indexical claim to have recorded reality — is extended to colors that were never photographically recorded at all.

Ethical Frameworks for Restoration Work

Professional archival organizations including the Society of American Archivists and the International Council of Museums (ICOM) have both issued guidance documents in recent years addressing AI-enhanced reproduction of historical materials. The consensus position is built around three principles: transparency (all AI processing must be disclosed and documented in metadata), reversibility (original unprocessed files must be preserved separately), and proportionality (the degree of AI intervention should be the minimum necessary to achieve legitimate conservation goals).

Photography-specific ethics boards, including those governing photojournalism at Reuters, AP, and World Press Photo, draw a sharp line between restoration (removing damage that obscures what was originally captured) and enhancement (adding detail that was never captured). Under these standards, removing dust and scratches from a scanned negative is permissible; using GFPGAN to sharpen a slightly out-of-focus portrait is not.

Historical Record

When AP ran its first AI-enhanced archival photographs in 2023, it published a parallel "before" version alongside every processed image, with explicit metadata tagging. This dual-publication standard has since been adopted by several major news archives as a baseline disclosure practice.

Lesson 2 Quiz

3 questions — free, untracked, retake anytime.

GFPGAN achieves strong face restoration results by leveraging which specific external resource?

✓ Correct. GFPGAN uses a pre-trained StyleGAN2 face generation model as a "facial prior." It asks which high-quality face in StyleGAN's learned space best matches the degraded input, then blends that reference with the input to produce the restoration.

✗ GFPGAN uses a learned generative prior — specifically the face latent space from StyleGAN2, trained on FFHQ (70,000 real face photographs). This gives it a strong statistical model of what human faces should look like at high resolution.

Why did UC Berkeley's 2016 colorization research frame colorization as a classification problem rather than regression?

✓ Correct. When color is ambiguous (e.g., is a car red or blue?), a regression model averages both possibilities and produces a muddy grey-brown. A classification approach embraces the uncertainty, picks a committed color, and produces more vibrant, believable results.

✗ The key insight was about handling ambiguity. A grass patch could be any of several greens; a car could be many colors. Regression averages these possibilities and produces desaturated mud. Classification commits to a color bin, yielding vivid, plausible colorizations.

Under the ethical standards used by major photojournalism organizations (Reuters, AP, World Press Photo), which of these operations would be considered impermissible AI enhancement?

✓ Correct. The ethical line falls between removing damage (permissible) and adding detail that was never captured (impermissible). An out-of-focus portrait was never in focus — using AI to create sharpness that never existed in the original capture is fabrication, not restoration.

✗ Photojournalism standards distinguish between removing artifacts of the archival/scanning process (permissible) and synthesizing detail that was never captured (impermissible). A slightly out-of-focus image was always out of focus — AI sharpening invents detail that never existed.

Lab 2: Restoration Ethics and Practice

Interrogate the boundaries between legitimate restoration and problematic fabrication.

Drawing the Line

This lab focuses on the ethical and practical decision-making involved in AI photo restoration. You'll work through real-world scenarios with the AI assistant: when is it appropriate to use face restoration tools? How should AI-processed archival images be disclosed? What are the risks of colorizing historical photographs without historical verification?

Try asking: "I'm digitizing a collection of 1940s family photographs for a museum exhibition. Some faces are blurry. Walk me through the ethical decision-making process for whether and how to apply AI face restoration."

AI Lab Assistant Photography & AI · Restoration Ethics

Photography and AI · Module 3 · Lesson 3

Tools of the Trade: Real-ESRGAN, Topaz, and the Upscaling Ecosystem

A practical survey of the leading AI upscaling and restoration tools — their architectures, strengths, failure modes, and the workflow decisions that separate professional results from amateur artifacts.

In 2021, the visual effects studio Corridor Crew — whose YouTube channel documenting VFX work reaches over 14 million subscribers — published a breakdown of their use of AI upscaling in production. They had used Topaz Video AI to restore and upscale archival broadcast footage originally shot on 480i standard-definition video to 4K for inclusion in high-budget documentary productions. The results, shown in a side-by-side comparison, were striking enough that the video went viral within professional cinematography circles. What was notable was not just the quality of the output, but the disclaimer Corridor Crew attached: the upscaled footage, they noted, could not be treated as equivalent to 4K originals. The AI had interpolated and synthesized from degraded source material, and any fine detail in the output was statistically plausible rather than factually recorded.

Real-ESRGAN: Training on Real Degradation

Real-ESRGAN (released by Xintao Wang et al. in 2021) solved a problem that plagued earlier super-resolution models: they were trained on synthetically degraded images — clean photographs that had been artificially blurred or compressed — and then deployed on real-world images degraded by cameras, scanners, printing, and re-photography. The distribution mismatch between training data and deployment data caused artifacts: halos around edges, over-smoothed textures, and synthetic-looking "AI sharpening" effects visible to experienced eyes.

Real-ESRGAN introduced a high-order degradation model — a complex pipeline that simulates realistic combinations of blur, noise, JPEG compression, rescaling, and re-compression, applied in random orders to training images. This more closely matched the varied and unpredictable degradations found in real-world photographs. The model trained on this data generalized far better to diverse input types.

Real-ESRGAN is open-source (MIT licensed) and forms the backbone of many consumer-facing upscaling applications. It runs efficiently on consumer GPU hardware and processes a typical 12-megapixel photograph in 2× mode in under 30 seconds on a mid-range gaming card.

Topaz Photo AI and Video AI

Topaz Labs has built arguably the most commercially successful ecosystem of AI-powered photography tools, including Topaz Photo AI (combining upscaling, sharpening, and denoising) and Topaz Video AI (super-resolution and frame interpolation for video). Unlike open-source alternatives, Topaz trains proprietary models on curated datasets and offers multiple models optimized for specific input types: a separate model for faces, one for general photographic subjects, one for subjects with fine linear detail (text, feathers, fabric weave), and specialized models for noise reduction at specific ISO values.

This specialization produces noticeably better results on matched input types. A portrait processed with Topaz's face-optimized model will typically outperform Real-ESRGAN on that specific use case, while Real-ESRGAN may perform comparably or better on landscape or architectural photography where the face-specific training of Topaz's model is irrelevant.

Topaz Video AI introduced Proteus, a model with adjustable parameters for recover, detail, sharpness, and noise reduction — allowing professionals to dial in exactly how much AI synthesis they want applied, rather than accepting a one-size-fits-all output. This parametric control has made Topaz the preferred tool for high-end VFX and restoration workflows.

Adobe, Lightroom, and Camera Raw Integration

In 2023, Adobe integrated AI super-resolution directly into Lightroom and Camera Raw through its Super Resolution feature, which uses a proprietary model based on research from Adobe Research's computational photography team. The feature operates on RAW files before demosaicing, giving it access to the full sensor data rather than a processed JPEG derivative. Adobe's approach produces 4× upscaled DNG files that can be further processed in the full Lightroom workflow.

Adobe's model is specifically calibrated to avoid over-sharpening — the characteristic "AI look" where textures appear unnaturally crisp and edges have a subtle glow. Whether this conservative approach produces better or worse results than more aggressive models like Topaz depends entirely on the use case: for photojournalism and documentary photography, Adobe's subtler output is often preferred. For commercial product photography or fine art printing, photographers frequently choose more aggressive tools.

Common Failure Modes and How to Identify Them

All AI upscaling tools produce characteristic failure modes that experienced photographers learn to recognize and work around:

Hallucinated texture: The model generates plausible-looking detail in areas of low information. Smooth surfaces like blank walls, clear sky, or water may develop subtle artificial grain patterns. This is usually harmless but visible under pixel-peeping scrutiny.

Face averaging: Face-restoration models regress toward the statistical center of their training distribution. Distinctive features — unusual nose shapes, asymmetric facial features, non-standard proportions — may be subtly normalized toward a more "average" face. Elderly faces are particularly susceptible; their wrinkles and age spots are statistically uncommon in most training datasets.

Text corruption: Small text in photographs — signs, labels, captions — is frequently garbled by general-purpose upscalers that interpret letterforms as texture rather than structured information. Specialized text-preservation models exist but are less commonly integrated into mainstream tools.

Edge halos: High-contrast edges may develop a faint bright halo as the model attempts to sharpen them. This is an artifact of the adversarial training process and can be partially mitigated by reducing the "sharpness" or "recover" parameters where available.

Workflow Best Practice

Professional retouchers recommend always working non-destructively: keep the original file untouched, apply upscaling to a copy, and compare the result at 100% zoom in multiple areas before accepting the output. Particular scrutiny should go to fine linear detail (hair, eyelashes, text), smooth gradients, and any area with distinctive texture that should not look generic.

Lesson 3 Quiz

3 questions — free, untracked, retake anytime.

What specific problem did Real-ESRGAN's high-order degradation model solve that affected earlier super-resolution models?

✓ Correct. Training on clean→blurry synthetic pairs and deploying on real-world images with printer damage, scanner noise, re-photography blur, and JPEG recompression caused distribution mismatch artifacts. Real-ESRGAN's complex degradation pipeline closed this gap.

✗ The core problem was distribution mismatch: synthetic training degradations didn't match real-world degradation patterns. When earlier models saw images degraded by printing, scanning, re-photography, or multiple JPEG compressions, the mismatch caused halos, over-smoothing, and artifacts.

What is the key advantage of Adobe's Super Resolution feature processing RAW files before demosaicing?

✓ Correct. RAW files before demosaicing contain the raw sensor measurements — no JPEG compression artifacts, no white balance baking, no sharpening. The AI has maximum information to work from, producing cleaner, more accurate upscales.

✗ The advantage is informational: before demosaicing, the model sees the raw sensor data — every photon measurement the sensor recorded, without any lossy compression or interpolation applied. This gives it more genuine signal to work from than a processed JPEG.

Why are elderly faces particularly susceptible to the "face averaging" failure mode in AI face restoration?

✓ Correct. AI models regress toward the statistical center of their training data. Most face datasets (including FFHQ) are heavily weighted toward younger adult faces in controlled conditions. Aged facial features are rare in this distribution, so the model normalizes them away.

✗ The issue is statistical representation: training datasets for face restoration are disproportionately made up of young-to-middle-aged faces. When the model encounters aged features it hasn't seen often, it treats them as noise or degradation and smooths them toward the more common face types it was trained on.

Lab 3: Choosing the Right Upscaling Tool

Navigate real-world tool selection decisions with your AI assistant.

Tool Selection in Practice

Different upscaling scenarios call for different tools and settings. In this lab you'll work through specific use cases with the AI assistant: a wildlife photographer wants to enlarge a cropped bird image for a magazine spread; a documentary filmmaker needs to upscale 1980s VHS footage; a wedding photographer wants to upscale low-light ceremony shots taken at high ISO. Explore which tools fit which jobs — and why.

Try asking: "Compare Real-ESRGAN and Topaz Photo AI for upscaling a wildlife photograph where fine feather detail matters. Which model architecture is likely better and why?"

AI Lab Assistant Photography & AI · Tool Selection

Photography and AI · Module 3 · Lesson 4

Evaluation Metrics, Professional Standards, and What Comes Next

PSNR, SSIM, LPIPS, and why the numbers don't tell the whole story — plus the emerging professional standards that will define how AI-upscaled images are used, disclosed, and trusted.

In a 2022 study published in ACM Transactions on Graphics, researchers at MIT and Adobe Research ran a controlled experiment: they showed 200 experienced photographers pairs of images — one bicubic upscale and one AI upscale from the same source — and asked which looked like a better-quality photograph. The AI upscale won overwhelmingly in perceived quality ratings. Then they asked a different question: which image was more accurate to the original? On this metric, the bicubic upscales often scored higher. The images that looked better were not always the images that were more faithful. This finding formalized a tension that practitioners had observed informally for years: AI upscaling optimizes for perceptual convincingness, not photographic truth.

The Metrics Landscape

PSNR (Peak Signal-to-Noise Ratio) is the oldest and most widely used image quality metric. It measures the ratio between the maximum possible pixel value and the average squared pixel error between two images. Higher PSNR indicates a closer pixel-by-pixel match to the reference. However, PSNR correlates poorly with human perception of image quality — a slightly blurred image may score very high PSNR because its pixels are close to the reference in value, even though it looks worse to a human viewer.

SSIM (Structural Similarity Index Measure), developed by Wang et al. in 2004, improved on PSNR by measuring luminance, contrast, and structural similarity separately and combining them. SSIM correlates better with human quality judgments than PSNR, but still struggles with high-frequency detail: two images with identical structural similarity scores may look very different in texture areas.

LPIPS (Learned Perceptual Image Patch Similarity), introduced by Zhang et al. at Berkeley in 2018, measures image similarity using features extracted from a deep neural network (specifically, intermediate layers of AlexNet or VGG). Because these features capture perceptual content rather than raw pixel values, LPIPS correlates much more strongly with human quality assessments — particularly in the texture and detail regions where PSNR and SSIM fail. Ironically, SRGAN-based models often score worse on PSNR than bicubic interpolation (because they synthesize detail that doesn't match the reference pixel-by-pixel) while scoring much better on LPIPS (because that synthesized detail looks perceptually plausible).

Emerging Professional and Legal Standards

In 2023, the World Press Photo Foundation updated its contest rules to require explicit disclosure of any AI processing applied to submitted photographs, including AI upscaling and noise reduction. Photographs with AI-synthesized content that was not present in the original capture are disqualified. Applying denoising to remove ISO grain is permitted; using an AI model that synthesizes new grain patterns to make an image look more textured is not.

The C2PA (Coalition for Content Provenance and Authenticity) — a standards body including Adobe, Microsoft, Intel, BBC, and Reuters — has developed the Content Credentials specification, an embedded metadata standard that records all processing operations applied to an image, including which AI models were used, when, and with what parameters. Lightroom, Photoshop, and several camera manufacturers (including Leica and Nikon) have begun implementing C2PA Content Credentials in production products. The goal is to create an unbroken chain of provenance from capture to final publication.

In legal contexts, the UK's Intellectual Property Office issued guidance in 2023 clarifying that AI-upscaled photographs may have different copyright status than their source images — specifically when the AI model was trained on copyrighted images without license. This remains an area of active litigation internationally, with cases involving AI training datasets, output ownership, and the copyright status of AI-synthesized content proceeding through courts in the US, EU, and UK simultaneously.

The Frontier: Diffusion Models and Reference-Based Upscaling

The next generation of upscaling tools moves beyond GAN-based approaches toward diffusion model architectures. Models like StableSR (2023) and SeeSR (2024) use diffusion processes — the same underlying technology as Stable Diffusion and Midjourney — to iteratively denoise a randomly initialized high-resolution image conditioned on the low-resolution input. This produces outputs with extraordinary textural richness and realism. The tradeoff is that diffusion-based upscalers are even more generative than GAN-based models: they may produce beautiful high-resolution images that diverge significantly from what any hypothetical "true" high-resolution version would have shown.

Reference-based super-resolution takes a different approach: instead of relying solely on a general learned prior, the model is also given a reference photograph of the same subject taken at different conditions (different angle, different lighting, but the same subject). The model uses both the degraded input and the reference to produce a more faithful restoration. This approach is particularly promising for archival work where multiple photographs of the same subject exist at different quality levels.

The Core Tension

Every advance in AI upscaling and restoration increases the plausibility of synthesized content and decreases the visual markers that previously distinguished AI-processed images from originals. The field is moving faster than the standards bodies that govern its use. Photographers working professionally today need to understand not just how these tools work, but what obligations — ethical, legal, and professional — attach to using them.

Key Terms

PSNRPeak Signal-to-Noise Ratio — a pixel-level accuracy metric that correlates poorly with human visual quality perception.

LPIPSLearned Perceptual Image Patch Similarity — a deep-feature-based similarity metric that correlates strongly with human quality assessments.

C2PACoalition for Content Provenance and Authenticity — standards body developing embedded metadata specifications for AI image processing disclosure.

Diffusion UpscalingSuper-resolution using diffusion model architectures; produces highly realistic outputs but with greater generative divergence from the source than GAN-based methods.

Lesson 4 Quiz

3 questions — free, untracked, retake anytime.

Why do GAN-based upscalers like SRGAN often score lower on PSNR than bicubic interpolation, even when they look better to human viewers?

✓ Correct. PSNR measures raw pixel-level fidelity to a reference. SRGAN invents convincing texture that wasn't in the reference — so its pixels differ from the reference more than a blurry bicubic output does, even though it looks better. This is why LPIPS, not PSNR, is the preferred metric for GAN-based super-resolution.

✗ Think about what PSNR measures: squared pixel error vs. a reference. SRGAN adds texture that looks real but isn't in the reference — so its pixel values diverge more from the reference than a blurry bicubic that stays "safely" near the reference by being averaged and smooth.

What is the primary purpose of the C2PA Content Credentials standard in the context of AI image processing?

✓ Correct. C2PA Content Credentials embed a cryptographically signed record of all processing steps — capture device, software operations, AI models used, timestamps — creating an auditable chain from original capture to published image.

✗ C2PA is a provenance and transparency standard, not a restriction or compression tool. It creates embedded metadata that documents what was done to an image, by which tools, and when — so that publishers, editors, and viewers can understand the full processing history.

How does reference-based super-resolution differ from standard single-image super-resolution?

✓ Correct. Reference-based SR provides the model with actual high-quality visual evidence about the specific subject — not just statistical priors about what subjects generally look like. This is especially powerful in archival contexts where multiple photographs of the same person or place exist at different quality levels.

✗ Reference-based SR augments the usual single-image input with an actual photograph of the same subject taken under different conditions. Instead of relying entirely on what the training data says faces/buildings/etc. generally look like, the model can draw on real visual evidence about this specific subject.

Lab 4: Metrics, Standards, and Professional Decisions

Apply evaluation frameworks and professional standards to real upscaling decisions.

Evaluating Quality and Navigating Obligations

In this final lab you'll work through the evaluation and disclosure decisions that professionals face when deploying AI upscaling. Explore how to interpret PSNR vs. LPIPS scores in practice, how C2PA workflows operate, what the World Press Photo disclosure requirements actually demand, and how to advise clients on AI upscaling when legal or archival obligations are involved.

Try asking: "I'm a photojournalist and my editor wants me to use AI upscaling to enlarge underexposed protest photographs for front-page use. Walk me through my ethical and disclosure obligations under current professional standards."

AI Lab Assistant Photography & AI · Standards & Evaluation

Module 3 Test

15 questions · 80% to pass · covers all four lessons

1. Which classical interpolation method samples a 4×4 neighborhood of 16 pixels using a weighted curve to estimate new pixel values?

✓ Bicubic interpolation uses 16 surrounding pixels (4×4 grid) and a smooth cubic weighting function, producing softer results with better edge retention than bilinear's 4-pixel average.

✗ Bicubic interpolation is the 16-pixel, cubic-weighted method. Nearest-neighbor duplicates one pixel; bilinear averages four; Lanczos uses a sinc-based kernel over a larger neighborhood.

2. Google Brain's 2017 "Pixel Recursive Super Resolution" paper demonstrated that when a neural network reconstructs a face from an 8×8 pixel source, the output is best described as:

✓ Correct. The network synthesized a plausible face; it did not recover the subject's actual appearance. This is the core distinction between AI recovery and AI invention that the paper demonstrated.

✗ The Google Brain paper's key point was that the network invented a plausible face — it had no way to recover the original subject's appearance from 64 pixels. The output reflected training data statistics, not photographic truth.

3. SRCNN, the first end-to-end deep learning super-resolution model, was published in which year?

✓ Correct. SRCNN was published in 2014 by Chao Dong et al. at CUHK, marking the beginning of deep learning approaches to single-image super-resolution.

✗ SRCNN (Super-Resolution Convolutional Neural Network) was published in 2014 by Chao Dong and colleagues at the Chinese University of Hong Kong — the first end-to-end trained deep learning approach to SR.

4. MyHeritage's Deep Nostalgia feature, which animated historical portrait photographs, was powered by technology from which company?

✓ Correct. D-ID provided the video synthesis technology that MyHeritage used for Deep Nostalgia, borrowing motion patterns from a library of living face recordings to animate static photographs.

✗ Deep Nostalgia was powered by D-ID's video synthesis technology, not by Topaz, Adobe, or DeepMind. D-ID specializes in AI-driven talking head and face animation technology.

5. The FFHQ dataset used to train StyleGAN2 (and leveraged by GFPGAN) contains approximately how many high-quality face photographs?

✓ Correct. The Flickr-Faces-HQ (FFHQ) dataset contains 70,000 high-quality face images at 1024×1024 resolution, collected from Flickr and processed for alignment and consistency.

✗ FFHQ (Flickr-Faces-HQ) contains 70,000 face images at 1024×1024. This dataset was created by NVIDIA to train StyleGAN and has become the standard benchmark for face generation and restoration research.

6. DeOldify, the open-source colorization tool used by several news organizations, employs which training innovation to improve temporal consistency?

✓ Correct. DeOldify's NoGAN training approach, developed by Jeremy Howard and Jason Antic, trains the colorizer on video frames to improve temporal and spatial consistency of color assignments across an image.

✗ DeOldify uses a NoGAN approach that incorporates video frame training to improve temporal consistency. This makes color assignments more stable across similar regions — important when the same object appears in different lighting conditions across frames.

7. Real-ESRGAN improved on earlier super-resolution models by addressing training/deployment distribution mismatch. What was the specific mechanism?

✓ Correct. Real-ESRGAN's degradation pipeline applies random combinations of multiple degradation types in random orders — mimicking the complex, unpredictable degradations real photographs accumulate through printing, scanning, re-photography, and re-compression.

✗ Real-ESRGAN's key innovation was its degradation model: instead of simple blur→downscale, it applied randomized combinations of multiple degradation types (blur, noise, JPEG compression, rescaling) in random orders, producing training data that matched real-world degradations far better.

8. Topaz Photo AI's Proteus model differentiates itself from standard upscaling models primarily through:

✓ Correct. Proteus's parametric control system lets professionals decide exactly how aggressively the AI should intervene, rather than accepting a fixed output. This is particularly valued in VFX and high-end restoration work where precise control over synthesis is essential.

✗ Proteus's defining feature is its adjustable parameters — recover, detail, sharpness, and noise reduction sliders that let users precisely control how much AI synthesis is applied, rather than accepting a one-size-fits-all output.

9. LPIPS was introduced in 2018 by Zhang et al. at Berkeley. What makes it a better quality metric than PSNR for evaluating GAN-based super-resolution?

✓ Correct. LPIPS extracts features from intermediate layers of a pre-trained network (AlexNet or VGG). These features capture perceptual content — texture, structure, semantic meaning — rather than raw numerical pixel differences, producing quality scores that match human judgments much better.

✗ LPIPS's advantage is that it measures image similarity in feature space — using a neural network's internal representations of visual content — rather than raw pixel differences. This correlates much more closely with how humans perceive image quality, especially in textured areas.

10. Under the 2023 World Press Photo contest rules, which processing operation would result in a photograph's disqualification?

✓ Correct. Adding AI-synthesized grain texture creates content that was not present in the original capture — this crosses the line into fabrication under World Press Photo's rules, even if the effect looks subtle or "natural."

✗ World Press Photo's rules prohibit adding AI-synthesized content not present in the original capture. Synthetic grain texture is fabricated — it didn't exist in what the sensor recorded. The other options either preserve or minimally adjust captured content.

11. A major ethical concern raised by historians regarding The Guardian's colorization of World War I photographs was:

✓ Correct. Historians objected that colorization creates false intimacy — making distant events feel contemporary — while implying color accuracy that the algorithm cannot provide. The colors were plausible statistical guesses, not verified historical facts.

✗ The historians' objection was about epistemic authority: colorized photographs carry implicit claims of color accuracy that the algorithm cannot support. Colors assigned by a neural network trained on modern photographs are statistically reasonable guesses, not verified historical hues.

12. The C2PA Content Credentials specification uses what technology to ensure the processing record embedded in image metadata cannot be tampered with undetected?

✓ Correct. C2PA Content Credentials are cryptographically signed. Any modification to the image or its metadata after signing invalidates the signature, making tampering detectable. This is the foundation of its provenance guarantee.

✗ C2PA uses cryptographic signatures (not blockchain or watermarks) to secure the provenance record. The signature is embedded with the metadata; if either the image or the metadata is altered, the signature verification fails, signaling that the record was tampered with.

13. How do diffusion model-based upscalers like StableSR differ fundamentally from GAN-based models like Real-ESRGAN in their generation process?

✓ Correct. Diffusion upscalers start from noise and iteratively refine it toward a high-resolution image conditioned on the LR input. This multi-step stochastic process produces extraordinarily rich textures but also introduces more generative variability and potential divergence from the source.

✗ Diffusion upscalers work iteratively: they start with random noise at the target resolution and progressively denoise it, at each step conditioning on the low-resolution input. This is fundamentally different from GAN architectures that map input to output in a single forward pass.

14. Which three principles form the consensus ethical framework for AI-enhanced archival images, as advocated by the Society of American Archivists and ICOM?

✓ Correct. Transparency, reversibility, and proportionality are the three pillars: disclose all AI processing in metadata, always preserve the original unprocessed file, and apply only as much AI intervention as the conservation goal genuinely requires.

✗ The SAA/ICOM framework centers on three principles: transparency (all AI processing documented in metadata), reversibility (original unprocessed files preserved separately), and proportionality (AI intervention limited to the minimum necessary).

15. A 2022 ACM Transactions on Graphics study found that experienced photographers often preferred AI upscales over bicubic enlargements on perceptual quality ratings, but bicubic upscales sometimes scored higher on accuracy ratings. What does this reveal about AI upscaling?

✓ Correct. This is the core tension of the field: AI upscaling is designed and trained to produce images that look good to human viewers, not images that accurately represent what was captured. For most uses this is fine; for forensic, scientific, or evidentiary contexts, it matters enormously.

✗ The study reveals a fundamental tension: AI upscaling optimizes for looking convincingly real (perceptual quality) rather than being accurate to the source (fidelity). These goals diverge, and which matters depends entirely on how the image will be used.