In September 2022, an artist named Greg Rutkowski discovered something unsettling. His distinctive fantasy painting style — developed over years of professional work — had become one of the most-used prompt keywords on the AI image generator Stable Diffusion. Users were typing "in the style of Greg Rutkowski" to generate thousands of images that mimicked his aesthetic. He had never been asked. He had never agreed. His name had become a dial on a machine he did not build, feeding on art he had created.
The AI had been trained on a dataset called LAION-5B — 5.85 billion image-text pairs scraped from the open web. Rutkowski's paintings were in there. So were the works of hundreds of thousands of other living artists, photographers, and illustrators. None of them were asked.
To understand the consent problem, you first need to understand how AI art and writing generators are built. These systems — called generative AI or foundation models — are trained by feeding them enormous collections of existing human-made content. The AI doesn't memorize it like a filing cabinet. Instead, it learns statistical patterns: which shapes tend to appear near which colors, which words tend to follow which other words, what "dramatic lighting" or "impressionist brushwork" looks like across thousands of examples.
The training sets for major systems are staggering in size. OpenAI's GPT-4 was trained on data that included Common Crawl (a massive ongoing scrape of the web), WebText2, books, and Wikipedia. Stability AI's Stable Diffusion used LAION-5B. Midjourney has not fully disclosed its training data. Google's Imagen and Adobe's Firefly have taken different approaches — Firefly was trained only on Adobe Stock images, licensed content, and public-domain works, specifically to avoid this controversy.
The core question that courts, artists, and lawmakers are now wrestling with: is scraping publicly visible content and training an AI on it the same as copying it? And if it is copying — does copyright law protect those original creators?
Human artists have always learned by studying other artists' work. An art student might spend months copying old masters in a museum. A novelist might read hundreds of books before finding their voice. This kind of influence and learning is considered completely normal and legal.
So why is AI training different? Several reasons have been argued:
1. Scale. No human could study 5 billion images. The AI doesn't just glance at your work — it processes it repeatedly across billions of gradient updates. The sheer volume of ingested work is unlike anything in human creative history.
2. Commercial use. When a student copies a painting for practice, they don't usually sell it. When a company trains an AI on your work and then charges subscribers to generate images in your style, money is being made — money that flows to the company, not to you.
3. Market substitution. Critics argue AI-generated images in an artist's style can replace commissions that would otherwise go to the original artist. If someone can get a "Greg Rutkowski-style" image for $0.01 from a generator, they may not hire Greg Rutkowski.
As of 2023, the name "Greg Rutkowski" appeared in more than 93,000 prompts on Lexica.art, an AI art search engine. By comparison, artist Frida Kahlo — who died in 1954 — appeared in about 22,000. Living artists' styles were being replicated at industrial scale.
Some AI companies responded to artist complaints by creating opt-out mechanisms. Stability AI introduced a tool called Have I Been Trained? (built in collaboration with spawning.ai) allowing artists to search for their images in the LAION dataset and request removal. Similarly, OpenAI and others promised opt-out processes for future training runs.
Artists largely rejected this framing. Their argument: the default should be opt-in, not opt-out. You shouldn't have to discover your work was used, hunt down the right form, and submit a removal request — the company should have asked you first. The burden of consent shouldn't fall on the creator after the fact.
This opt-in vs. opt-out debate is now central to proposed legislation in the US, UK, and European Union, and it maps directly onto a broader question about whose interests AI development has historically prioritized.
If someone used photos of your artwork from your Instagram to train an AI — without asking — would you want to opt out after the fact, or would you want to have been asked first? Why does the order matter?
You're going to argue both sides of the consent debate with an AI discussion partner. Start by picking a side: either defend the AI companies' position (opt-out is fine, this is like the web already works) or defend the artists' position (opt-in is the only ethical approach). Then see if you can be convinced to switch sides.
This lab is designed for younger learners — the AI will explain any legal terms that come up and keep the conversation grounded in real examples.
On January 17, 2023, Getty Images — one of the world's largest stock photo agencies — filed a lawsuit in the UK against Stability AI, the company behind Stable Diffusion. Getty alleged that Stability AI had scraped more than 12 million photographs from Getty's website without permission, along with their associated metadata and captions, to train Stable Diffusion.
The evidence was striking: Stable Diffusion was generating images that included corrupted versions of the Getty Images watermark — the distinctive "Getty Images" text that appears on licensed photos. This suggested the AI had learned from watermarked images directly, making the infringement claim highly visible. Getty filed a parallel lawsuit in US federal court in February 2023.
Stability AI denied the claims, but the lawsuits remain ongoing as of this writing. The UK case, in particular, could set precedent for how European law treats AI training data.
When an AI generates an image with a smeared, corrupted Getty watermark, it provides strong evidence that the training data included Getty's watermarked photos. It's a bit like a student who copies from a textbook and accidentally includes the publisher's logo on their own test paper. It's hard to deny you saw the source.
Also in January 2023, three visual artists — Sarah Andersen, Kelly McKernan, and Karla Ortiz — filed a class-action lawsuit in US federal court against Stability AI, Midjourney, and DeviantArt. Their claim: these companies had committed copyright infringement on a "massive scale" by training their AI systems on billions of images scraped from the internet without consent or compensation.
The lawsuit argued that AI-generated images are essentially "derivative works" — legally, something created from someone else's copyrighted material. If that's true, then generating an image "in the style of" a living artist might violate their copyright.
In October 2023, US District Judge William Orrick allowed parts of the lawsuit to proceed against Stability AI — specifically the direct infringement claims — while dismissing some other claims. The case continues. Notably, the judge dismissed the DMCA claims against Midjourney at that stage, finding the connections too indirect, but left room to re-file with more specific evidence.
Writers got involved too. In mid-2023, a group of prominent authors — including John Grisham, Jonathan Franzen, George R.R. Martin, Jodi Picoult, and Elin Hilderbrand — filed a class-action lawsuit against OpenAI, alleging that their books had been used without permission to train ChatGPT and GPT-4.
The Authors Guild, which coordinated the action, argued that OpenAI had ingested entire published books from sources including "shadow libraries" like Books3, a dataset that included over 196,000 books scraped from piracy sites. OpenAI's position was that training constitutes "fair use" — a legal doctrine that allows limited use of copyrighted material without permission under certain conditions.
Around the same time, authors Mona Awad and Paul Tremblay filed separate lawsuits against OpenAI, and comedian Sarah Silverman joined a suit against both OpenAI and Meta. These were among the first cases specifically targeting large language model training rather than image generation.
Fair use is a legal doctrine in US copyright law that allows someone to use copyrighted material without permission in certain situations — commentary, criticism, parody, education, and research are common examples. Whether AI training qualifies as fair use is the central legal question in most of these lawsuits. Courts weigh four factors: the purpose of the use, the nature of the original work, how much was taken, and the effect on the market for the original.
Alleges scraping of 12+ million watermarked photographs without license.
Andersen, McKernan, Ortiz v. Stability AI, Midjourney, DeviantArt.
Parallel proceeding in Delaware federal court.
John Grisham, George R.R. Martin, and others allege book ingestion without consent.
Direct infringement claims against Stability AI survive; some claims dismissed with leave to re-file.
As of this module's publication, none of these cases have reached final verdicts. They are being closely watched by the entire creative and tech industry because the outcomes will determine the legal rules for AI training for years to come — not just in the US, but as international precedent.
You're going to apply the four-factor "fair use" test to AI training data. US copyright law asks courts to weigh four things: (1) the purpose of the use, (2) the nature of the original work, (3) how much was taken, and (4) the effect on the market. Present your analysis factor by factor and the AI will help you refine your reasoning — like a law professor's office hours, but less intimidating.
This lab is designed so younger learners can engage — you don't need any prior legal knowledge. The AI will guide you through each factor.
The US Copyright Office (USCO) has been clear since at least 2023: purely AI-generated content cannot be copyrighted. Copyright in the US requires human authorship. A machine is not a human, and it cannot hold a copyright.
This position was reinforced in a pivotal February 2023 decision. The USCO reviewed a graphic novel called Zarya of the Dawn, created by Kristina Kashtanova. She had written the text and arranged the story — but used Midjourney to generate the images. The USCO's ruling: Kashtanova could copyright the text and the selection and arrangement of pages, but not the individual AI-generated images themselves. They were generated by a machine, not authored by a human, so they received no copyright protection.
Kristina Kashtanova owned: the words she wrote + how she arranged the pages + the overall story structure she designed.
Kristina did NOT own: any of the individual images, because she typed prompts and Midjourney produced images — the creative decisions in each image were made by the AI, not by her.
An even starker case: computer scientist Stephen Thaler created an AI system he calls the "Creativity Machine" and asked it to generate a painting called "A Recent Entrance to Paradise." He then tried to register the copyright — listing the AI as the author and himself as the owner (under a work-for-hire theory).
The USCO refused. Thaler sued. In August 2023, a federal judge upheld the USCO's decision: "Human authorship is a bedrock requirement of copyright." An AI cannot be an author. The painting entered the public domain — owned by no one.
Thaler has appealed. The case is ongoing, but as of this writing, every court to consider the question has agreed: AI alone cannot generate copyrightable work.
So what if a human is involved? The courts are working this out too. The USCO's current guidance suggests that copyright protection scales with human creative input:
A human selects images from hundreds of AI outputs, arranges them into a specific sequence, adds original text, and makes intentional design decisions. The human creative choices are substantial and specific.
A human types a short prompt ("a cat sitting on the moon, photorealistic") and accepts the first image the AI generates. The AI made all the visual decisions. The human's contribution was minimal.
This creates a new kind of creative strategy question: if you want to own what you make with AI, you need to be substantially involved in the creative decisions. The more you direct, curate, arrange, edit, and shape the work, the more your copyright claim strengthens. The more you simply prompt and accept, the less protection you have.
This has real consequences for professional creators. A graphic designer who uses AI tools as part of a complex creative process may be in a very different legal position than someone who sells AI-generated prints directly from prompts.
The UK and EU are taking different approaches. The UK Copyright, Designs and Patents Act 1988 actually has a provision for "computer-generated works" — giving copyright to the person who arranged for the work to be generated. This might protect AI-generated content in the UK where US law wouldn't. The EU is still developing its approach under the AI Act. Copyright law is national — the same work might be protectable in one country and public domain in another.
If you are a young creator using AI tools — whether for illustration, writing, music, or other creative work — here is what the current legal landscape means practically:
1. Document your creative process. If you want to claim copyright, keep records of the decisions you made, the iterations you directed, the edits you applied. The more your creative fingerprint is visible, the stronger your claim.
2. Understand that AI outputs alone may be free to copy. If someone takes a purely AI-generated image you made and uses it commercially, you may have no legal recourse under current US law.
3. The law is moving. What's true today may not be true in five years. Several proposed bills in the US Congress would change these rules. Pay attention — your generation will live with whatever gets decided now.
Describe a creative project you've done (or might do) that uses AI tools — a graphic novel, a music video, an illustrated story, a website design, a poem collection. The AI will ask you questions about your process and help you figure out which parts are likely copyrightable and which parts might not be — based on the real legal framework from Lesson 3.
This is especially useful for young creators who are starting to use AI tools professionally or for school projects. Understanding what you do and don't own matters.
The European Union passed the EU AI Act in 2024 — the world's first comprehensive legal framework for artificial intelligence. Among its provisions: companies deploying "general-purpose AI" systems (like GPT-4 or Stable Diffusion) must publish a summary of training data used to train those systems.
This transparency requirement is significant. Currently, most AI companies treat their training data as proprietary — they don't tell you what was in it. The EU Act would force disclosure, at least in summary form, which would allow artists and authors to find out if their work was used.
The Act also requires that AI-generated content be clearly labeled as such when it could be mistaken for human-made work — a separate but related issue for the creative industries. Companies must comply with EU AI Act provisions starting in stages from 2024 through 2027.
In the US, multiple bills are competing in Congress. Two notable ones:
The NO FAKES Act (2023) — "Nurture Originals, Foster Art, and Keep Entertainment Safe." This bill targets AI-generated likenesses: if an AI is used to generate a realistic fake of a specific person without their consent — their face, their voice — the person would have a legal right to sue. This is aimed at deepfakes but also at AI-generated music that mimics specific artists' voices.
The TRAIN Act (2023) — "Transparency and Responsibility for Artificial Intelligence Networks." This bill would require AI companies to disclose what copyrighted works were used in training — similar to the EU requirement. It was introduced by Senators Brian Schatz and John Kennedy (bipartisan) but has not yet passed.
Neither bill had become law as of this module's writing. The US legislative process moves slowly, and the AI industry lobbies heavily. However, the direction of policy debate is clear: transparency and consent are becoming baseline expectations.
Some AI companies, anticipating regulation, have begun making deals with content creators rather than waiting for laws to force them:
Getty Images + Nvidia (2023): Getty licensed its image library to Nvidia for training AI models — a commercial deal where Getty gets paid and Nvidia gets legal training data. This is the "opt-in, paid" model that artist advocates have been calling for.
Shutterstock + OpenAI (2023): Shutterstock licensed its content to OpenAI for training, and in exchange created a contributor fund to pay photographers and illustrators whose work was used. The fund distributes payments based on a formula — not everyone agrees it's adequate, but it's a step toward compensation.
The Associated Press + OpenAI (2023): The AP licensed its archive of news text to OpenAI in exchange for access to OpenAI technology. A model that works for both parties — though critics note the AP's archive is enormous and the terms haven't been fully disclosed.
These deals represent a possible future: a licensing ecosystem where AI companies pay for training data rather than scraping it. But they cover a tiny fraction of the creative work that has already been ingested into existing models.
The music streaming disputes of the 2010s offer a useful parallel. When Spotify launched, many musicians argued they were being paid fractions of a penny per stream while Spotify built a billion-dollar company on their work. After years of lawsuits, lobbying, and negotiation, the Music Modernization Act of 2018 created new licensing rules and better royalty structures. It was messy and imperfect, but it happened. Many observers think AI and creative work will follow a similar arc — conflict, negotiation, and eventually some kind of structured licensing framework.
The legal landscape is unresolved, but here are practical realities for creators entering this field:
1. Watermark or document your original work. If your art is ever used to train an AI without permission, having dated documentation of your original creative process strengthens any future claim.
2. Read platform terms carefully. Some platforms — including some AI art tools — include clauses in their terms of service claiming rights to use your outputs or inputs for training. Understand what you're agreeing to before you upload.
3. The "style" question remains legally unsettled. Copying an artist's specific, original work is infringement. But "style" itself has historically not been copyrightable in the US. AI generators that mimic style without reproducing specific images may be operating in a legal gray zone — but that doesn't mean it's ethically settled.
4. Advocacy matters. The organizations pushing for creator rights — the Authors Guild, the Graphic Artists Guild, the American Society of Media Photographers — are active in shaping policy. If you care about this, pay attention to them.
As of this module's writing, courts have not issued final rulings in the major AI copyright cases. The US Copyright Office is conducting a formal study and may issue guidance that changes practice. Congress could pass legislation. The EU AI Act requirements will create new pressures globally. This area is moving fast — check current news for updates beyond this module's publication.
You've seen the real debates, the lawsuits, the legislation. Now it's your turn: if you were advising Congress or the EU on AI copyright rules, what would you propose? The AI will push back on your ideas, point out tradeoffs you might not have considered, and help you think through the consequences of different policy approaches.
There's no single right answer here — this is genuinely unsettled territory. Your goal is to think rigorously, not to arrive at a predetermined conclusion. This lab is designed so younger learners can engage fully — bring your instincts and let the conversation sharpen them.