In 2017, researchers at Google Brain published a paper called "Pixel Recursive Super Resolution" that contained a quietly startling demonstration. They took a tiny 8Γ8 pixel portrait of a face β essentially a smear of color β and asked their neural network to produce a plausible 32Γ32 reconstruction. The network had never seen the original 32Γ32 image. What it produced was not a blur. It was a face: nose, eyes, lip line, skin texture β all synthesized from statistical patterns learned across millions of training images. The original photograph's subject did not necessarily look like the reconstruction. The AI had not recovered lost data. It had invented plausible data. That distinction β recovery versus invention β defines the entire modern field of AI-driven upscaling.
A digital photograph is a fixed grid of pixels. Each pixel stores red, green, and blue intensity values. When you double an image's dimensions, you need four times as many pixels β but the original file only gave you one quarter of that information. Something must fill the gap. Classical approaches to this problem are called interpolation algorithms, and they operate by estimating unknown pixel values from known neighbors.
Nearest-neighbor interpolation simply duplicates the closest known pixel, which is fast but produces the blocky "pixelation" effect visible when you zoom too far into a JPEG. Bilinear interpolation averages the four surrounding pixels, producing smoother but often blurry transitions. Bicubic interpolation β the default in Photoshop's legacy enlarge function β samples 16 surrounding pixels using a weighted curve, which preserves edge sharpness better but still cannot add genuine detail that wasn't in the original capture.
All classical methods share one limitation: they work only with the pixels they have. They are mathematical extrapolations within a closed system. They cannot know that a blurry patch near someone's eye is probably an eyelash, or that a grey horizontal smear at the bottom of a building is likely a window ledge. That knowledge requires visual understanding β and visual understanding is precisely what deep learning models provide.
The modern era of AI upscaling began in 2014 when Chao Dong and colleagues at CUHK published SRCNN (Super-Resolution Convolutional Neural Network), the first end-to-end deep learning approach to single-image super-resolution. SRCNN was trained by taking high-resolution images, downsampling them to create low-resolution versions, and then training the network to reconstruct the originals. After processing millions of such pairs, the network learned to recognize patterns associated with edges, textures, and structures β and to synthesize plausible high-frequency detail when that detail had been removed.
The key breakthrough came with Generative Adversarial Networks (GANs) applied to super-resolution. In 2017, Ledig et al. introduced SRGAN, which pitted a generator network (trying to produce convincing upscaled images) against a discriminator network (trying to detect which images were real). The adversarial training loop pushed the generator to produce images that looked photographically real at the perceptual level, not merely images that scored well on pixel-by-pixel accuracy metrics. This is why SRGAN outputs often look sharper and more detailed than bicubic enlargements β even when the added detail is technically fabricated.
Subsequent architectures β ESRGAN (Enhanced SRGAN, 2018), Real-ESRGAN (2021), and Topaz Labs' proprietary models β refined this approach further, training on real-world degraded images rather than synthetically downsampled ones to handle JPEG compression artifacts, noise, and lens blur simultaneously with upscaling.
When a tool advertises "4Γ AI upscaling," it means the output image has linear dimensions four times larger than the input β meaning 16 times as many pixels in total. A 1000Γ750 photograph becomes a 4000Γ3000 image. The AI must synthesize 15 out of every 16 pixels in the final image from statistical inference, not from recorded light data.
This has significant implications for photographic truth. The texture of skin in an AI-upscaled portrait is not the texture of that person's skin β it is the texture that the network's training data says skin at that scale probably looks like. For most commercial and editorial uses, this distinction is inconsequential. For forensic analysis, scientific imaging, or legal evidence, it is critical. Courts in several jurisdictions have begun requiring disclosure when submitted photographs have been processed by AI upscaling tools, precisely because the evidentiary chain of custody is broken once synthetic pixels are introduced.
Classical interpolation estimates from existing data. AI upscaling synthesizes from learned patterns. The first is math. The second is inference. Understanding which process was applied to an image matters for how that image can be used and trusted.
In this lab you'll interrogate the AI assistant about the technical mechanics of image upscaling β from classical interpolation math to GAN training dynamics. Push into specifics: ask about loss functions, training data requirements, failure modes, and why perceptual metrics matter more than PSNR for photographic work.
In 2020, MyHeritage launched Deep Nostalgia, a feature powered by D-ID's video synthesis technology. Within days of its launch, millions of users were animating photographs of deceased relatives β making century-old portraits blink, turn their heads, and smile. The tool processed images of historical figures including Anne Frank, Abraham Lincoln, and Nikola Tesla. Holocaust survivors' children watched their parents' wartime photographs move for the first time. The response was simultaneously described as profoundly moving and deeply unsettling. Critics noted that the animated faces were statistical composites β the movement patterns were borrowed from a library of living faces performing scripted gestures. None of the generated motion reflected anything the photographed person had actually done. The line between restoration and fabrication had effectively dissolved.
That same year, the British newspaper The Guardian used AI colorization tools to publish colorized versions of World War I photographs alongside their original greyscale counterparts. The colorization was performed by algorithms trained on millions of color photographs β the colors assigned to uniforms, mud, skin, and sky were plausible generalizations, not historically verified hues. Several historians objected that colorization produces a false intimacy, making distant historical events feel contemporary in ways that subtly distort how viewers interpret them.
Face restoration is a specialized branch of image super-resolution that takes advantage of the strong prior knowledge we have about human faces. Because faces share consistent geometric structure β eyes above nose above mouth, bilateral symmetry, predictable proportions β AI models can be trained on face-specific datasets and achieve dramatically better results on portrait photographs than general-purpose upscalers achieve on arbitrary images.
The dominant approach, developed in tools like GFPGAN (Generative Facial Prior GAN) released by Tencent's ARC Lab in 2021, works by extracting a "facial prior" from a pre-trained face generation model (specifically StyleGAN2, trained on FFHQ β 70,000 high-quality human face photographs). When a degraded face image is fed into GFPGAN, the model uses the StyleGAN2 prior as a reference β essentially asking: "given this blurry, noisy face, which latent code in StyleGAN's learned face space does it most likely correspond to?" The corresponding high-fidelity face from that latent space is then blended with the input to produce the restoration.
The practical result is remarkable: photographs with severe JPEG compression, motion blur, or significant degradation can be sharpened into portraits with convincing skin texture, visible pores, and clear iris detail. But the identity fidelity is limited by the prior β the restored face reflects the distribution of faces in FFHQ, which skews toward contemporary subjects photographed in studio conditions. Restoring a daguerreotype of a person born in 1840 will produce a face shaped partly by 21st-century face norms embedded in the training data.
Automatic colorization of black-and-white photographs uses models trained to predict per-pixel color values (in the Lab color space) from greyscale luminance data. The pioneering work by Zhang et al. at UC Berkeley in 2016 β "Colorful Image Colorization" β framed colorization as a classification problem rather than a regression problem, predicting a probability distribution over possible colors rather than a single "best" color. This produced more vibrant, less muddy results because the model embraced color uncertainty rather than averaging it away.
Modern colorization tools including DeOldify (open-source, used by MyHeritage and numerous news organizations) use a NoGAN training approach developed by Jeremy Howard and Jason Antic, training the colorizer partly on video frames to improve temporal consistency. The results are often photorealistic enough that colorized historical photographs published without disclosure are frequently mistaken for original color photography.
This creates a documentation problem. A colorized photograph of a soldier's uniform in a particular shade of olive green carries implicit information about the actual color of that uniform β but the color was inferred by a model trained largely on post-war photographs. No historian has verified it against surviving fabric samples or period accounts. The visual authority of photography β its indexical claim to have recorded reality β is extended to colors that were never photographically recorded at all.
Professional archival organizations including the Society of American Archivists and the International Council of Museums (ICOM) have both issued guidance documents in recent years addressing AI-enhanced reproduction of historical materials. The consensus position is built around three principles: transparency (all AI processing must be disclosed and documented in metadata), reversibility (original unprocessed files must be preserved separately), and proportionality (the degree of AI intervention should be the minimum necessary to achieve legitimate conservation goals).
Photography-specific ethics boards, including those governing photojournalism at Reuters, AP, and World Press Photo, draw a sharp line between restoration (removing damage that obscures what was originally captured) and enhancement (adding detail that was never captured). Under these standards, removing dust and scratches from a scanned negative is permissible; using GFPGAN to sharpen a slightly out-of-focus portrait is not.
When AP ran its first AI-enhanced archival photographs in 2023, it published a parallel "before" version alongside every processed image, with explicit metadata tagging. This dual-publication standard has since been adopted by several major news archives as a baseline disclosure practice.
This lab focuses on the ethical and practical decision-making involved in AI photo restoration. You'll work through real-world scenarios with the AI assistant: when is it appropriate to use face restoration tools? How should AI-processed archival images be disclosed? What are the risks of colorizing historical photographs without historical verification?
In 2021, the visual effects studio Corridor Crew β whose YouTube channel documenting VFX work reaches over 14 million subscribers β published a breakdown of their use of AI upscaling in production. They had used Topaz Video AI to restore and upscale archival broadcast footage originally shot on 480i standard-definition video to 4K for inclusion in high-budget documentary productions. The results, shown in a side-by-side comparison, were striking enough that the video went viral within professional cinematography circles. What was notable was not just the quality of the output, but the disclaimer Corridor Crew attached: the upscaled footage, they noted, could not be treated as equivalent to 4K originals. The AI had interpolated and synthesized from degraded source material, and any fine detail in the output was statistically plausible rather than factually recorded.
Real-ESRGAN (released by Xintao Wang et al. in 2021) solved a problem that plagued earlier super-resolution models: they were trained on synthetically degraded images β clean photographs that had been artificially blurred or compressed β and then deployed on real-world images degraded by cameras, scanners, printing, and re-photography. The distribution mismatch between training data and deployment data caused artifacts: halos around edges, over-smoothed textures, and synthetic-looking "AI sharpening" effects visible to experienced eyes.
Real-ESRGAN introduced a high-order degradation model β a complex pipeline that simulates realistic combinations of blur, noise, JPEG compression, rescaling, and re-compression, applied in random orders to training images. This more closely matched the varied and unpredictable degradations found in real-world photographs. The model trained on this data generalized far better to diverse input types.
Real-ESRGAN is open-source (MIT licensed) and forms the backbone of many consumer-facing upscaling applications. It runs efficiently on consumer GPU hardware and processes a typical 12-megapixel photograph in 2Γ mode in under 30 seconds on a mid-range gaming card.
Topaz Labs has built arguably the most commercially successful ecosystem of AI-powered photography tools, including Topaz Photo AI (combining upscaling, sharpening, and denoising) and Topaz Video AI (super-resolution and frame interpolation for video). Unlike open-source alternatives, Topaz trains proprietary models on curated datasets and offers multiple models optimized for specific input types: a separate model for faces, one for general photographic subjects, one for subjects with fine linear detail (text, feathers, fabric weave), and specialized models for noise reduction at specific ISO values.
This specialization produces noticeably better results on matched input types. A portrait processed with Topaz's face-optimized model will typically outperform Real-ESRGAN on that specific use case, while Real-ESRGAN may perform comparably or better on landscape or architectural photography where the face-specific training of Topaz's model is irrelevant.
Topaz Video AI introduced Proteus, a model with adjustable parameters for recover, detail, sharpness, and noise reduction β allowing professionals to dial in exactly how much AI synthesis they want applied, rather than accepting a one-size-fits-all output. This parametric control has made Topaz the preferred tool for high-end VFX and restoration workflows.
In 2023, Adobe integrated AI super-resolution directly into Lightroom and Camera Raw through its Super Resolution feature, which uses a proprietary model based on research from Adobe Research's computational photography team. The feature operates on RAW files before demosaicing, giving it access to the full sensor data rather than a processed JPEG derivative. Adobe's approach produces 4Γ upscaled DNG files that can be further processed in the full Lightroom workflow.
Adobe's model is specifically calibrated to avoid over-sharpening β the characteristic "AI look" where textures appear unnaturally crisp and edges have a subtle glow. Whether this conservative approach produces better or worse results than more aggressive models like Topaz depends entirely on the use case: for photojournalism and documentary photography, Adobe's subtler output is often preferred. For commercial product photography or fine art printing, photographers frequently choose more aggressive tools.
All AI upscaling tools produce characteristic failure modes that experienced photographers learn to recognize and work around:
Hallucinated texture: The model generates plausible-looking detail in areas of low information. Smooth surfaces like blank walls, clear sky, or water may develop subtle artificial grain patterns. This is usually harmless but visible under pixel-peeping scrutiny.
Face averaging: Face-restoration models regress toward the statistical center of their training distribution. Distinctive features β unusual nose shapes, asymmetric facial features, non-standard proportions β may be subtly normalized toward a more "average" face. Elderly faces are particularly susceptible; their wrinkles and age spots are statistically uncommon in most training datasets.
Text corruption: Small text in photographs β signs, labels, captions β is frequently garbled by general-purpose upscalers that interpret letterforms as texture rather than structured information. Specialized text-preservation models exist but are less commonly integrated into mainstream tools.
Edge halos: High-contrast edges may develop a faint bright halo as the model attempts to sharpen them. This is an artifact of the adversarial training process and can be partially mitigated by reducing the "sharpness" or "recover" parameters where available.
Professional retouchers recommend always working non-destructively: keep the original file untouched, apply upscaling to a copy, and compare the result at 100% zoom in multiple areas before accepting the output. Particular scrutiny should go to fine linear detail (hair, eyelashes, text), smooth gradients, and any area with distinctive texture that should not look generic.
Different upscaling scenarios call for different tools and settings. In this lab you'll work through specific use cases with the AI assistant: a wildlife photographer wants to enlarge a cropped bird image for a magazine spread; a documentary filmmaker needs to upscale 1980s VHS footage; a wedding photographer wants to upscale low-light ceremony shots taken at high ISO. Explore which tools fit which jobs β and why.
In a 2022 study published in ACM Transactions on Graphics, researchers at MIT and Adobe Research ran a controlled experiment: they showed 200 experienced photographers pairs of images β one bicubic upscale and one AI upscale from the same source β and asked which looked like a better-quality photograph. The AI upscale won overwhelmingly in perceived quality ratings. Then they asked a different question: which image was more accurate to the original? On this metric, the bicubic upscales often scored higher. The images that looked better were not always the images that were more faithful. This finding formalized a tension that practitioners had observed informally for years: AI upscaling optimizes for perceptual convincingness, not photographic truth.
PSNR (Peak Signal-to-Noise Ratio) is the oldest and most widely used image quality metric. It measures the ratio between the maximum possible pixel value and the average squared pixel error between two images. Higher PSNR indicates a closer pixel-by-pixel match to the reference. However, PSNR correlates poorly with human perception of image quality β a slightly blurred image may score very high PSNR because its pixels are close to the reference in value, even though it looks worse to a human viewer.
SSIM (Structural Similarity Index Measure), developed by Wang et al. in 2004, improved on PSNR by measuring luminance, contrast, and structural similarity separately and combining them. SSIM correlates better with human quality judgments than PSNR, but still struggles with high-frequency detail: two images with identical structural similarity scores may look very different in texture areas.
LPIPS (Learned Perceptual Image Patch Similarity), introduced by Zhang et al. at Berkeley in 2018, measures image similarity using features extracted from a deep neural network (specifically, intermediate layers of AlexNet or VGG). Because these features capture perceptual content rather than raw pixel values, LPIPS correlates much more strongly with human quality assessments β particularly in the texture and detail regions where PSNR and SSIM fail. Ironically, SRGAN-based models often score worse on PSNR than bicubic interpolation (because they synthesize detail that doesn't match the reference pixel-by-pixel) while scoring much better on LPIPS (because that synthesized detail looks perceptually plausible).
In 2023, the World Press Photo Foundation updated its contest rules to require explicit disclosure of any AI processing applied to submitted photographs, including AI upscaling and noise reduction. Photographs with AI-synthesized content that was not present in the original capture are disqualified. Applying denoising to remove ISO grain is permitted; using an AI model that synthesizes new grain patterns to make an image look more textured is not.
The C2PA (Coalition for Content Provenance and Authenticity) β a standards body including Adobe, Microsoft, Intel, BBC, and Reuters β has developed the Content Credentials specification, an embedded metadata standard that records all processing operations applied to an image, including which AI models were used, when, and with what parameters. Lightroom, Photoshop, and several camera manufacturers (including Leica and Nikon) have begun implementing C2PA Content Credentials in production products. The goal is to create an unbroken chain of provenance from capture to final publication.
In legal contexts, the UK's Intellectual Property Office issued guidance in 2023 clarifying that AI-upscaled photographs may have different copyright status than their source images β specifically when the AI model was trained on copyrighted images without license. This remains an area of active litigation internationally, with cases involving AI training datasets, output ownership, and the copyright status of AI-synthesized content proceeding through courts in the US, EU, and UK simultaneously.
The next generation of upscaling tools moves beyond GAN-based approaches toward diffusion model architectures. Models like StableSR (2023) and SeeSR (2024) use diffusion processes β the same underlying technology as Stable Diffusion and Midjourney β to iteratively denoise a randomly initialized high-resolution image conditioned on the low-resolution input. This produces outputs with extraordinary textural richness and realism. The tradeoff is that diffusion-based upscalers are even more generative than GAN-based models: they may produce beautiful high-resolution images that diverge significantly from what any hypothetical "true" high-resolution version would have shown.
Reference-based super-resolution takes a different approach: instead of relying solely on a general learned prior, the model is also given a reference photograph of the same subject taken at different conditions (different angle, different lighting, but the same subject). The model uses both the degraded input and the reference to produce a more faithful restoration. This approach is particularly promising for archival work where multiple photographs of the same subject exist at different quality levels.
Every advance in AI upscaling and restoration increases the plausibility of synthesized content and decreases the visual markers that previously distinguished AI-processed images from originals. The field is moving faster than the standards bodies that govern its use. Photographers working professionally today need to understand not just how these tools work, but what obligations β ethical, legal, and professional β attach to using them.
In this final lab you'll work through the evaluation and disclosure decisions that professionals face when deploying AI upscaling. Explore how to interpret PSNR vs. LPIPS scores in practice, how C2PA workflows operate, what the World Press Photo disclosure requirements actually demand, and how to advise clients on AI upscaling when legal or archival obligations are involved.