For 180 years a photograph served as evidence. In court, in journalism, in family albums, in history books β if there was a photograph, something had happened, and the photograph was a reasonable approximation of it.
That contract is now void. Anyone can generate a photorealistic image of any event, person, or place from a text prompt. News wire services have had to build new authentication pipelines. Courts are grappling with image evidence in ways they haven't since photomontage in the 1920s. Families are discovering forged photos of deceased relatives in their feeds.
This course is about photography in the age of generative AI β for the photographer, the consumer, and the citizen. It covers how AI generates images, how to detect AI-generated images, how professional photographers are using (and resisting) AI tools, the new landscape of copyright and consent, and what photographic truth will mean in a decade when any image can be generated.
If you finish every module, here's who you become:
At the Google hardware event on October 4, 2017, the company unveiled the Pixel 2 and, quietly, a software capability called HDR+ that would change how the industry understood cameras. The phone's 12-megapixel sensor was, on paper, unremarkable. Yet reviewers at DxOMark awarded it the highest camera score any smartphone had ever received at that time. The secret was not glass or silicon. It was a machine-learning pipeline that fused up to fifteen rapid-fire exposures, aligned them at the sub-pixel level, and merged them using a noise model trained on millions of image pairs. No hardware trick could match it. Photography had crossed a threshold: the lens was no longer the limiting factor β the algorithm was the product.
Traditional photography is an optical and chemical β or optical and electronic β process. Light passes through a lens, strikes a sensor, and a relatively direct encoding of that light becomes your image. The photographer controls aperture, shutter speed, and ISO to manage exposure. Computational photography breaks that one-to-one relationship. The final image is no longer a single recording of a moment β it is the output of a software pipeline that may combine dozens of frames, consult a scene-understanding model, and apply tone-mapping decisions trained on human aesthetic preferences.
Google's HDR+ pipeline, first described in a 2016 SIGGRAPH paper by Hasinoff et al., captures a burst of short-exposure frames in a ring buffer even before you press the shutter. When you tap the button, the system selects the best alignment anchor, applies per-pixel motion estimation to handle camera shake and subject motion, merges frames using a frequency-domain technique that suppresses noise while preserving detail, and then applies a learned tone curve. The result has both the highlight recovery of traditional HDR and the low-noise character of a much larger sensor.
Apple's approach, introduced with the A12 Bionic chip in the iPhone XS (2018), added a dedicated Neural Engine β a set of matrix-multiplication accelerators optimized for running neural-network inference at low power. This let iOS run semantic segmentation in real time: the camera could identify sky, skin, foliage, and architecture separately and apply different processing to each zone. What photographers once spent hours doing in Lightroom β selectively adjusting luminance by region β the phone now did in milliseconds, invisibly.
The most dramatic public demonstration of AI-in-camera came with Google's Night Sight feature, released for Pixel phones in November 2018. The underlying research, published by Liba et al. at Google, described a system that could produce hand-held, low-light photographs indistinguishable in quality from long-exposure tripod shots. Night Sight extended the HDR+ burst pipeline with a motion-metering step that estimated how much subject and camera motion existed in the scene, then dynamically chose the number of frames to merge and the per-frame exposure length to maximize signal without introducing motion blur.
The training data for the machine-learning components was collected by capturing paired images of the same scene: one taken with a tripod and a long exposure (the "ground truth"), the other taken hand-held with a burst of short exposures (the input). A convolutional neural network learned the mapping from noisy input bursts to clean, well-exposed output. By the time the feature shipped, it had been trained on tens of thousands of such pairs across indoor, outdoor, candlelight, and streetlight conditions.
Samsung's rival Bright Night mode and later Expert RAW application followed a similar philosophy. Huawei's P20 Pro, which topped DxOMark rankings for much of 2018, used a dual-camera system with a dedicated monochrome lens specifically to feed more luminance data into its AI merger. The monochrome sensor captures roughly three times more light than a Bayer color sensor, and the AI used that extra signal to reduce noise on the color output. Each company had a different architecture, but all shared the same insight: intelligence about the scene produces better images than brute-force optics.
Understanding that your phone or mirrorless camera is running an AI pipeline β not just recording light β changes how you evaluate and interpret your images. A blurry foreground object that the AI misclassified, an artificially smooth skin tone from over-aggressive noise reduction, or a sky that looks "too perfect" are all artifacts of the computational layer. Knowing this pipeline exists lets you decide when to trust it, when to shoot RAW to bypass it, and when the aesthetic choices it makes align with your creative intent.
In this lab you will interrogate the AI about the computational photography pipeline. Ask about HDR+, Night Sight, semantic segmentation, or any part of the lesson. Try to understand the trade-offs: when does computational processing help, and when does it hurt creative intent?
Complete at least 3 exchanges to mark the lab done.
When Sony announced the Alpha 9 II in September 2019, the press release buried what turned out to be the most significant line: the camera's autofocus system could now recognize and track human eyes using a deep-learning model running on-chip at up to 60 frames per second. Sony called it Real-time Eye AF. Within months, wildlife photographers discovered the same system, extended in the Alpha 7R IV firmware, could lock onto the eyes of birds in flight β a task that had previously required either a dedicated tracking assistant or years of practised manual focus technique. The mechanical problem of focusing had become a semantic problem of understanding what mattered in a frame.
Autofocus technology has evolved through three recognizable generations. Contrast-detection AF, used in early digital compacts, works by moving the lens until the contrast in a target region is maximized. It is reliable but slow, because it must hunt through a focus range to find the peak. Phase-detection AF (PDAF), which dominated DSLRs and later migrated onto sensors as on-sensor PDAF, measures the phase difference between two halves of the lens pupil β essentially computing the direction and magnitude of defocus in a single pass. It is fast but requires dedicated masked sensor pixels and performs less well in low contrast or low light.
The third generation adds a subject-recognition layer on top of phase detection. Instead of simply asking "where is the sharpest point?", the camera asks "where is the subject?" and then asks the PDAF system to focus there. Subject recognition is a neural-network classification and detection task. For human subjects, networks trained on portrait datasets learn to locate faces, then refine to eye landmarks. The eye is preferred over the face because the eye provides the sharpest perceptual anchor for a portrait viewer β a principle photographers have applied manually for decades.
Canon's Dual Pixel AF, introduced in the EOS 70D in 2013, gave every active pixel two photodiodes β effectively turning the entire sensor into a phase-detection array rather than relying on a sparse grid of dedicated PDAF pixels. This dramatically improved contrast coverage across the frame. When Canon added its Deep Learning subject-recognition layer in the EOS R3 (2021), the system could detect and track human subjects, animals, and vehicles, switching categories automatically based on scene content.
Running a convolutional neural network for object detection at 60 fps on a camera body requires substantial on-chip compute. Sony's BIONZ XR processor, introduced in the Alpha 1 (2021), uses a dedicated AI processing unit separate from the main imaging pipeline. At 50 megapixels and 30 fps, the system must run inference on a full-resolution frame every 33 milliseconds while simultaneously writing compressed data to two CFexpress cards. The power budget is tight: camera bodies cannot use active cooling, so the AI processing unit must achieve its throughput within a fixed thermal envelope.
The training data challenge for animal and bird eye detection is formidable. Sony partnered with wildlife photographers to accumulate labeled datasets of bird eyes in flight across hundreds of species. The difficulty is that a bird's eye, in flight against a complex background, may occupy fewer than 40 pixels on a full-frame sensor. The network must be robust to partial occlusion, motion blur, extreme angles, and species variation. Nikon's subject-recognition AF, introduced in the Z9 (November 2021), similarly required multi-year dataset collection before it could reliably track birds, insects, aircraft, and trains as distinct categories.
Fujifilm took a different approach with its X-H2S (2022): rather than training a single monolithic detection network, it uses a cascade of lightweight classifiers that first decide the subject category, then route to a specialist network optimized for that category. This reduces average inference latency while maintaining accuracy, a technique borrowed from mobile-device computer-vision literature.
AI autofocus systems make creative assumptions that photographers sometimes need to override. An eye-detection system will always prioritize the nearest eye β but a portrait photographer might want the far eye sharp to convey psychological depth. A tracking system trained on the largest moving object in the frame may latch onto a bystander rather than the intended subject when the scene is crowded. The camera is applying a statistical best-guess about human photographic intent, derived from its training data, which may not match the specific artistic intent of the photographer holding it.
Understanding these assumptions allows photographers to work with or against them deliberately. Knowing that Sony's Real-time Tracking uses color, luminance, and detected face data simultaneously means that blocking the face momentarily β during a blink or a turn β can cause the tracker to fall back on color/luminance matching, which may shift to the wrong subject. Advanced users learn to pre-register the subject by half-pressing before the critical moment, filling the camera's tracking memory with enough data to maintain the lock through a brief face occlusion.
Every AI autofocus system embeds aesthetic assumptions from its training data. Learning to recognize when the camera's AI agrees with your creative intent β and when it doesn't β is one of the core skills of working with modern capture tools. The camera is not neutral: it has been trained to make a particular kind of photograph.
Use this lab to dig into autofocus AI β how different camera manufacturers implement subject detection, what training data decisions shape behavior, and how photographers can work with or around the system's assumptions.
Complete at least 3 exchanges to mark the lab done.
The Huawei P30 Pro, announced at a Paris event in March 2019, shipped with a scene-recognition system that could classify the content of the viewfinder into one of nineteen categories β including food, sunset, night portrait, text document, and pet β and automatically apply a distinct processing pipeline for each. Reviewers at The Verge and DPReview noted that pointing the camera at a plate of food would trigger an immediate shift in saturation and contrast visible in the live preview. This was not a filter. It was the imaging pipeline itself reconfiguring in real time. The camera had an opinion about what a food photograph should look like before the photographer did.
Scene recognition in cameras uses multi-label image classification β a convolutional neural network trained to assign one or more scene-category labels to a live preview frame. The network runs on a downsampled version of the scene (typically 224Γ224 pixels or similar) to reduce computation. Its output is a probability distribution over category labels. When a category probability crosses a threshold, the imaging pipeline switches to that category's preset: a set of tone-curve parameters, noise-reduction aggressiveness, sharpening radius, saturation multipliers, and sometimes HDR merge count.
Samsung's Scene Optimizer, introduced in the Galaxy S10 (2019), extended this to 30 categories and, critically, would boost the saturation of sky regions independently of the rest of the frame. Photography critics noted that the feature produced skies that appeared artificially vivid compared to how the scene looked to the naked eye. Samsung defended this as "rendering the scene as humans perceive it at their best" β an explicit acknowledgment that the camera was applying an aesthetic preference derived from its training data about what people rated as attractive sky photographs.
The training data question here is significant. Scene-recognition networks are typically trained on large image databases rated by human evaluators. If those evaluators systematically prefer vivid, saturated colors in outdoor scenes β as many crowdsourced aesthetic-rating studies suggest β the network will learn to apply processing that produces those characteristics. The camera's defaults encode the average aesthetic preference of its training evaluators, which may not match the individual photographer using the device.
The visibility of AI aesthetic decisions became a public controversy in 2019 when it emerged that several Android phones were applying scene-recognition enhancement to photographs in ways that were not clearly disclosed. A July 2019 report by Android Authority documented that the Pixel 3, the OnePlus 7 Pro, and several Samsung Galaxy models all applied sharpening, saturation boosts, or texture enhancement to food photographs even when the user believed they were shooting in a neutral "photo" mode. The processing was not applied to the RAW file β which remained unaltered β but was deeply baked into the JPEG pipeline, making it effectively invisible unless the photographer specifically compared JPEGs against RAW exports.
This created a genuine transparency problem. Professional food photographers working on assignment discovered that their clients' brand colors could shift noticeably between the phone's JPEG output and what the art director had approved based on studio reference images. The fix was to shoot in manual-override or RAW mode β but this required understanding that the AI was intervening in the first place.
Google subsequently added disclosure language to the Camera app's "Top Shot" and processing notifications. Apple introduced a "Photographic Styles" system in iOS 15 (2021) that makes aesthetic preferences explicit and user-configurable rather than hidden β allowing photographers to set a preferred tone and contrast rendering that the AI would then maintain across all scene categories, rather than overriding their preference per scene.
Scene recognition is not confined to smartphones. Olympus (now OM System) introduced an AI-based "Intelligent Auto" mode in the OM-D E-M1 Mark III (2020) that classifies scenes and selects not just processing parameters but shooting parameters: aperture, shutter speed, ISO, and even whether to engage focus bracketing for macro scenes. The system uses on-sensor detection to identify faces, landscapes, close-up subjects, and backlit subjects, and automatically adjusts exposure compensation to protect highlight detail or lift shadow detail per category.
Nikon's Scene Recognition system, which has appeared in bodies since the D7000 era but expanded significantly with the Z-series, cross-references scene classification with a database of millions of images to evaluate likely exposure errors. When the classifier detects a backlit portrait, the system knows from its training data that metering systems typically underexpose the subject in these situations, and it biases the exposure accordingly. This is effectively a learned correction for well-known camera limitations β the AI has studied the failure modes of the system it is part of.
Scene-recognition AI produces its most visible artifacts when the classification is wrong or when its aesthetic preferences conflict with yours. A scene partially matching both "food" and "indoor" categories may get inconsistent processing between shots. Understanding that your camera is running a classifier β and learning to recognize its categories' visual signatures β lets you anticipate and control the output rather than being surprised by it.
In this lab, explore how camera AI scene recognition works, what aesthetic preferences are baked into it, and when those preferences help versus hinder professional photographers. Think about the transparency problem and creative control.
Complete at least 3 exchanges to mark the lab done.
Samsung's Galaxy S23 Ultra launched in February 2023 with a 200-megapixel sensor β the HP2 manufactured by Samsung Semiconductor. In practice, the camera defaulted to shooting 12-megapixel photographs. The reason: the imaging pipeline used a technique called pixel binning, combining 16 sensor pixels into one output pixel, improving low-light performance by aggregating light from a larger effective area. When users switched to the full 200-megapixel mode, Samsung's AI-powered detail-enhancement layer could then operate on the full pixel data β synthesizing sub-pixel detail using a super-resolution neural network trained to reconstruct high-frequency texture from the sensor's raw output. The number on the spec sheet had become a canvas for AI, not a direct measurement of captured detail.
Sensor manufacturers face a fundamental tension between resolution and sensitivity. Smaller pixels capture less light, producing noisier images in low light. Larger pixels capture more light but yield fewer of them per unit sensor area, reducing resolution. Pixel binning resolves this tension dynamically: a sensor with very small, densely packed pixels defaults to combining β binning β groups of adjacent pixels into a single output pixel, producing an effective resolution lower than the sensor's physical pixel count but with the sensitivity of the larger combined area. In good light, those bins can be broken apart and each pixel read individually for maximum resolution.
The AI layer enters when the camera must transition between these modes or when it needs to produce a clean high-resolution output from the full pixel array. At 200 megapixels, a smartphone sensor produces enormous amounts of noise. Super-resolution neural networks β adapted from research by teams at Google, Adobe, and academic labs β learn to predict what sharp, clean, full-resolution detail should look like, given a noisy high-resolution input and a clean but lower-resolution reference (the binned version of the same capture). The result is a sharpened, denoised 200-megapixel image where some of the apparent detail is synthesized rather than directly recorded.
Google's Super Res Zoom, first described in a 2018 Google AI Blog post and paper by Wronski et al., used a different approach to AI resolution enhancement. Rather than relying on optical zoom, the system exploits the natural sub-pixel motion between frames in a hand-held burst. Each frame in a multi-frame capture is shifted by a fraction of a pixel relative to the others due to hand tremor. A super-resolution algorithm can use these sub-pixel offsets to reconstruct spatial frequencies above the sensor's native Nyquist limit. The result is a zoom image with detail that exceeds what the optical system alone could resolve, without any additional optical hardware.
This technique was used to power the Pixel 3's 2Γ "lossless zoom" and later extended to higher magnifications in the Pixel 4 and beyond. The 2020 Pixel 5 offered 7Γ zoom without a dedicated telephoto lens, using Super Res Zoom to fill in detail that the single 12-megapixel wide-angle sensor could not physically resolve at that magnification. Independent image-quality analyses by sites including DPReview and PetaPixel confirmed that while the results were impressive for the hardware, close examination revealed that fine textures such as hair, fabric weave, and text at extreme zoom showed reconstruction artifacts β patterns that were plausible but not necessarily what was actually in the scene.
This last point is philosophically important. Traditional zoom photography records what was there. AI super-resolution generates a prediction of what was probably there. The prediction is usually correct β the AI has seen millions of examples of hair, fabric, and text and has learned their statistical patterns. But when the scene contains something genuinely unusual or when detail falls between known categories, the network may hallucinate texture that looks correct but is not accurate. For documentary photography, this is a material concern.
Super-resolution AI is not limited to smartphones. Olympus (OM System) introduced a sensor-shift high-resolution mode in the OM-D E-M5 Mark II (2015) that physically moves the sensor in half-pixel increments across eight positions, capturing eight frames and stacking them into a composite with twice the linear resolution. In the OM-1 (2022), this process was combined with AI processing that corrects for subject motion between frames β previously a fatal flaw for the technique. The AI detects which pixels are affected by subject motion in each frame and excludes them from the stack at those positions, filling in from neighboring frames instead.
Phase One's IQ4 150MP digital back, used in professional medium-format studio photography, incorporates AI noise reduction and sharpening algorithms that operate on the raw sensor data before demosaicing. The company's Capture One software suite includes a dedicated AI masking layer that can detect subjects, sky, and background regions from a single image with accuracy comparable to manual selection β dramatically reducing post-processing time on high-volume shoots. For commercial photographers, this represents a substantial workflow change: the intelligence of the capture and processing pipeline now extends deep into what was once purely a manual curation and retouching domain.
When a super-resolution or AI-merge system synthesizes detail it cannot directly observe, it is making a learned prediction about the scene. For most consumer photography, this prediction is visually convincing. For documentary, forensic, scientific, or legal photography, synthesized detail that was not directly recorded by the sensor is a significant accuracy concern. Knowing when to prioritize recording fidelity over visual quality is a professional judgment the photographer must make β the camera cannot make it for you.
Use this lab to go deeper on AI super-resolution β the technology, its limits, and when its output is trustworthy versus potentially fabricated. Consider professional use cases where fidelity matters more than appearance.
Complete at least 3 exchanges to mark the lab done.