Module 5 · Lesson 1

Where Did the Training Data Come From?

AI image and text generators learned from millions of creative works — but who gave permission?

When a machine studies your art without asking, does that feel fair — and does the law agree?

🎓

A Note for Younger Learners This module deals with real legal disputes, concepts about ownership, and ongoing debates that even adults argue about. We'll break everything down step by step — no law degree needed. If a word is new, look for the definition box nearby. And remember: the goal isn't to memorize the law, it's to understand why it matters to you as a creator.

In September 2022, an artist named Greg Rutkowski discovered something unsettling. His distinctive fantasy painting style — developed over years of professional work — had become one of the most-used prompt keywords on the AI image generator Stable Diffusion. Users were typing "in the style of Greg Rutkowski" to generate thousands of images that mimicked his aesthetic. He had never been asked. He had never agreed. His name had become a dial on a machine he did not build, feeding on art he had created.

The AI had been trained on a dataset called LAION-5B — 5.85 billion image-text pairs scraped from the open web. Rutkowski's paintings were in there. So were the works of hundreds of thousands of other living artists, photographers, and illustrators. None of them were asked.

How Training Data Actually Works

To understand the consent problem, you first need to understand how AI art and writing generators are built. These systems — called generative AI or foundation models — are trained by feeding them enormous collections of existing human-made content. The AI doesn't memorize it like a filing cabinet. Instead, it learns statistical patterns: which shapes tend to appear near which colors, which words tend to follow which other words, what "dramatic lighting" or "impressionist brushwork" looks like across thousands of examples.

The training sets for major systems are staggering in size. OpenAI's GPT-4 was trained on data that included Common Crawl (a massive ongoing scrape of the web), WebText2, books, and Wikipedia. Stability AI's Stable Diffusion used LAION-5B. Midjourney has not fully disclosed its training data. Google's Imagen and Adobe's Firefly have taken different approaches — Firefly was trained only on Adobe Stock images, licensed content, and public-domain works, specifically to avoid this controversy.

The core question that courts, artists, and lawmakers are now wrestling with: is scraping publicly visible content and training an AI on it the same as copying it? And if it is copying — does copyright law protect those original creators?

Training DataThe collection of existing content (images, text, code, audio) that an AI model is shown during training so it can learn patterns.

LAION-5BA dataset of 5.85 billion image-text pairs scraped from the internet, used to train Stable Diffusion and other AI image generators. Built by a German nonprofit.

ScrapingAutomatically downloading large amounts of content from websites using software, rather than visiting them manually.

The Scale Makes It Different

Human artists have always learned by studying other artists' work. An art student might spend months copying old masters in a museum. A novelist might read hundreds of books before finding their voice. This kind of influence and learning is considered completely normal and legal.

So why is AI training different? Several reasons have been argued:

1. Scale. No human could study 5 billion images. The AI doesn't just glance at your work — it processes it repeatedly across billions of gradient updates. The sheer volume of ingested work is unlike anything in human creative history.

2. Commercial use. When a student copies a painting for practice, they don't usually sell it. When a company trains an AI on your work and then charges subscribers to generate images in your style, money is being made — money that flows to the company, not to you.

3. Market substitution. Critics argue AI-generated images in an artist's style can replace commissions that would otherwise go to the original artist. If someone can get a "Greg Rutkowski-style" image for $0.01 from a generator, they may not hire Greg Rutkowski.

📌 Real Stat

As of 2023, the name "Greg Rutkowski" appeared in more than 93,000 prompts on Lexica.art, an AI art search engine. By comparison, artist Frida Kahlo — who died in 1954 — appeared in about 22,000. Living artists' styles were being replicated at industrial scale.

The "Opt-Out" Question

Some AI companies responded to artist complaints by creating opt-out mechanisms. Stability AI introduced a tool called Have I Been Trained? (built in collaboration with spawning.ai) allowing artists to search for their images in the LAION dataset and request removal. Similarly, OpenAI and others promised opt-out processes for future training runs.

Artists largely rejected this framing. Their argument: the default should be opt-in, not opt-out. You shouldn't have to discover your work was used, hunt down the right form, and submit a removal request — the company should have asked you first. The burden of consent shouldn't fall on the creator after the fact.

This opt-in vs. opt-out debate is now central to proposed legislation in the US, UK, and European Union, and it maps directly onto a broader question about whose interests AI development has historically prioritized.

💡 Think About It

If someone used photos of your artwork from your Instagram to train an AI — without asking — would you want to opt out after the fact, or would you want to have been asked first? Why does the order matter?

Module 5 · Lesson 1

Check Your Understanding

Three questions on training data and consent.

💬

Tip for Younger LearnersThere are no trick questions here. If you get one wrong, read the explanation — that's where a lot of the learning happens.

1. What was LAION-5B, and why is it significant to the AI copyright debate?

✓ Correct. LAION-5B was scraped from the web without individual artist permission, making it the foundation of a major consent dispute. Stable Diffusion was trained on it.

Not quite. LAION-5B was a massive dataset scraped from the web — no licensing, no payment, no permission. That's exactly why it became controversial.

2. Why do many artists argue that "opt-out" systems offered by AI companies are insufficient?

✓ Exactly right. The consent argument centers on who bears the burden: requiring artists to hunt down uses of their work and opt out puts the burden on creators, not companies.

The real issue is about default settings. Opt-out means you're already in — you have to find out and remove yourself. Artists argue the ask should come first.

3. Which major AI art tool took a different approach by training ONLY on licensed and public-domain content to avoid copyright issues?

✓ Right. Adobe Firefly was specifically trained on Adobe Stock, licensed content, and public-domain works — a deliberate choice to differentiate it from competitors on the consent issue.

Adobe Firefly is the answer. Adobe made a deliberate business and ethical decision to train only on content it had rights to use, making it an outlier in the industry.

Module 5 · Lab 1

The Consent Conversation

Explore the opt-in vs. opt-out debate with an AI discussion partner.

Your Mission

You're going to argue both sides of the consent debate with an AI discussion partner. Start by picking a side: either defend the AI companies' position (opt-out is fine, this is like the web already works) or defend the artists' position (opt-in is the only ethical approach). Then see if you can be convinced to switch sides.

This lab is designed for younger learners — the AI will explain any legal terms that come up and keep the conversation grounded in real examples.

💬 Try starting with: "I think opt-out systems are actually fair — here's why…" OR "I think companies should have to ask permission first — here's my argument."

AI Discussion Partner

Consent & Training Data

Hey! Welcome to Lab 1. We're talking about one of the hottest debates in AI right now: should AI companies have to ask permission before training on an artist's work — or is it enough to let artists opt out afterward?

Pick a side and give me your best argument. I'll push back, play devil's advocate, and help you think it through. Don't worry if you're not sure — that's what the conversation is for. 🎨

Module 5 · Lesson 2

The Lawsuits Begin

Artists, Getty Images, and authors fight back — in federal court.

What happens when the legal system has to decide whether AI training is copyright infringement?

⚖️

Legal Terms AheadWe'll be talking about real lawsuits in this lesson. You don't need to know how courts work — we'll explain each legal idea as it comes up. The main thing to know: these cases haven't all been decided yet, and the outcomes will shape how AI is built for decades.

The Getty Images Lawsuit (January 2023)

On January 17, 2023, Getty Images — one of the world's largest stock photo agencies — filed a lawsuit in the UK against Stability AI, the company behind Stable Diffusion. Getty alleged that Stability AI had scraped more than 12 million photographs from Getty's website without permission, along with their associated metadata and captions, to train Stable Diffusion.

The evidence was striking: Stable Diffusion was generating images that included corrupted versions of the Getty Images watermark — the distinctive "Getty Images" text that appears on licensed photos. This suggested the AI had learned from watermarked images directly, making the infringement claim highly visible. Getty filed a parallel lawsuit in US federal court in February 2023.

Stability AI denied the claims, but the lawsuits remain ongoing as of this writing. The UK case, in particular, could set precedent for how European law treats AI training data.

🔍 Why the Watermark Matters

When an AI generates an image with a smeared, corrupted Getty watermark, it provides strong evidence that the training data included Getty's watermarked photos. It's a bit like a student who copies from a textbook and accidentally includes the publisher's logo on their own test paper. It's hard to deny you saw the source.

The Artists' Class Action (January 2023)

Also in January 2023, three visual artists — Sarah Andersen, Kelly McKernan, and Karla Ortiz — filed a class-action lawsuit in US federal court against Stability AI, Midjourney, and DeviantArt. Their claim: these companies had committed copyright infringement on a "massive scale" by training their AI systems on billions of images scraped from the internet without consent or compensation.

The lawsuit argued that AI-generated images are essentially "derivative works" — legally, something created from someone else's copyrighted material. If that's true, then generating an image "in the style of" a living artist might violate their copyright.

In October 2023, US District Judge William Orrick allowed parts of the lawsuit to proceed against Stability AI — specifically the direct infringement claims — while dismissing some other claims. The case continues. Notably, the judge dismissed the DMCA claims against Midjourney at that stage, finding the connections too indirect, but left room to re-file with more specific evidence.

Class ActionA lawsuit where one person or a small group sues on behalf of a much larger group of people who all have the same complaint. It's a way to bring thousands of individual claims together at once.

Derivative WorkIn copyright law, a new work based on an existing copyrighted work. Making a derivative work requires permission from the original copyright holder — unless an exception like "fair use" applies.

The Authors' Lawsuit Against OpenAI (2023)

Writers got involved too. In mid-2023, a group of prominent authors — including John Grisham, Jonathan Franzen, George R.R. Martin, Jodi Picoult, and Elin Hilderbrand — filed a class-action lawsuit against OpenAI, alleging that their books had been used without permission to train ChatGPT and GPT-4.

The Authors Guild, which coordinated the action, argued that OpenAI had ingested entire published books from sources including "shadow libraries" like Books3, a dataset that included over 196,000 books scraped from piracy sites. OpenAI's position was that training constitutes "fair use" — a legal doctrine that allows limited use of copyrighted material without permission under certain conditions.

Around the same time, authors Mona Awad and Paul Tremblay filed separate lawsuits against OpenAI, and comedian Sarah Silverman joined a suit against both OpenAI and Meta. These were among the first cases specifically targeting large language model training rather than image generation.

📚 What Is "Fair Use"?

Fair use is a legal doctrine in US copyright law that allows someone to use copyrighted material without permission in certain situations — commentary, criticism, parody, education, and research are common examples. Whether AI training qualifies as fair use is the central legal question in most of these lawsuits. Courts weigh four factors: the purpose of the use, the nature of the original work, how much was taken, and the effect on the market for the original.

A Timeline of Key Legal Events

Jan 2023 — Getty Images sues Stability AI (UK)

Alleges scraping of 12+ million watermarked photographs without license.

Jan 2023 — Artists' class action filed (US)

Andersen, McKernan, Ortiz v. Stability AI, Midjourney, DeviantArt.

Feb 2023 — Getty adds US federal lawsuit

Parallel proceeding in Delaware federal court.

Mid-2023 — Authors Guild v. OpenAI

John Grisham, George R.R. Martin, and others allege book ingestion without consent.

Oct 2023 — Judge allows artists' case to partly proceed

Direct infringement claims against Stability AI survive; some claims dismissed with leave to re-file.

As of this module's publication, none of these cases have reached final verdicts. They are being closely watched by the entire creative and tech industry because the outcomes will determine the legal rules for AI training for years to come — not just in the US, but as international precedent.

Module 5 · Lesson 2

Check Your Understanding

Three questions on the lawsuits and legal concepts.

1. What was the key piece of visual evidence in Getty Images' lawsuit against Stability AI?

✓ Correct. The corrupted watermarks were powerful evidence because they showed the AI had learned from watermarked source images — it absorbed the watermark as part of the visual pattern.

The watermarks weren't perfect — they were smeared and corrupted. But that's actually stronger evidence: the AI had learned from watermarked photos and partially reproduced the mark.

2. What is a "class action" lawsuit, and why did artists and authors use this legal strategy?

✓ Right. Class actions make sense here because potentially millions of artists had their work scraped. No single artist could afford to sue alone for their portion of the harm — but together, the case has scale.

A class action bundles many similar claims together. It was the right tool because potentially millions of artists were in the same situation — too many to each file individually.

3. What is the central legal question in the OpenAI authors' lawsuit?

✓ Exactly. The fair use question is the crux of most AI training lawsuits. If training = fair use, AI companies win. If training = infringement, the entire industry's data practices are in question.

The core legal question is fair use: does using someone's book to train an AI count as infringement, or is it protected as a transformative use? That's what the courts have to decide.

Module 5 · Lab 2

The Fair Use Judge

Step into the role of a judge and reason through a real legal framework.

Your Mission

You're going to apply the four-factor "fair use" test to AI training data. US copyright law asks courts to weigh four things: (1) the purpose of the use, (2) the nature of the original work, (3) how much was taken, and (4) the effect on the market. Present your analysis factor by factor and the AI will help you refine your reasoning — like a law professor's office hours, but less intimidating.

This lab is designed so younger learners can engage — you don't need any prior legal knowledge. The AI will guide you through each factor.

💬 Start with: "Let's analyze Factor 1 — the purpose of AI training. Is it commercial, transformative, or educational?" OR dive straight in with your own take on whether AI training is fair use.

AI Legal Tutor

Fair Use Analysis

Welcome to Lab 2! You're going to think like a judge today. ⚖️

US copyright law has a "fair use" doctrine with four factors courts use to decide if using someone's work without permission is okay. We'll apply those factors to AI training data together.

The four factors are:
1. Purpose of use (commercial? educational? transformative?)
2. Nature of the original work (creative or factual?)
3. Amount taken (a little? all of it?)
4. Effect on the market (does it hurt the original creator's income?)

Pick a factor to start, or just give me your gut feeling on whether AI training is fair use and we'll build from there!

Module 5 · Lesson 3

Who Owns What the AI Makes?

When a machine generates an image or story, can anyone copyright it?

If you type a prompt and AI produces an image — is that your creation, the AI's, the company's, or nobody's?

🖼️

New Territory for EveryoneCopyright law was written for human creators. Nobody planned for a world where a machine could generate artwork in seconds. The rules are being rewritten right now — which means the decisions being made today will directly affect how your generation creates and owns things.

The US Copyright Office's Position

The US Copyright Office (USCO) has been clear since at least 2023: purely AI-generated content cannot be copyrighted. Copyright in the US requires human authorship. A machine is not a human, and it cannot hold a copyright.

This position was reinforced in a pivotal February 2023 decision. The USCO reviewed a graphic novel called Zarya of the Dawn, created by Kristina Kashtanova. She had written the text and arranged the story — but used Midjourney to generate the images. The USCO's ruling: Kashtanova could copyright the text and the selection and arrangement of pages, but not the individual AI-generated images themselves. They were generated by a machine, not authored by a human, so they received no copyright protection.

📄 The Zarya Ruling in Plain English

Kristina Kashtanova owned: the words she wrote + how she arranged the pages + the overall story structure she designed.

Kristina did NOT own: any of the individual images, because she typed prompts and Midjourney produced images — the creative decisions in each image were made by the AI, not by her.

The Stephen Thaler Case

An even starker case: computer scientist Stephen Thaler created an AI system he calls the "Creativity Machine" and asked it to generate a painting called "A Recent Entrance to Paradise." He then tried to register the copyright — listing the AI as the author and himself as the owner (under a work-for-hire theory).

The USCO refused. Thaler sued. In August 2023, a federal judge upheld the USCO's decision: "Human authorship is a bedrock requirement of copyright." An AI cannot be an author. The painting entered the public domain — owned by no one.

Thaler has appealed. The case is ongoing, but as of this writing, every court to consider the question has agreed: AI alone cannot generate copyrightable work.

Public DomainWorks in the public domain have no copyright protection — anyone can use, copy, modify, or sell them freely. Works enter the public domain when copyright expires, or (in this case) when a court finds no copyright ever existed.

Human AuthorshipThe legal requirement in US copyright law that a work be created by a human being to receive copyright protection. This is why courts have ruled AI-only output cannot be copyrighted.

The "Human in the Loop" Question

So what if a human is involved? The courts are working this out too. The USCO's current guidance suggests that copyright protection scales with human creative input:

Likely Protectable

A human selects images from hundreds of AI outputs, arranges them into a specific sequence, adds original text, and makes intentional design decisions. The human creative choices are substantial and specific.

Likely Not Protectable

A human types a short prompt ("a cat sitting on the moon, photorealistic") and accepts the first image the AI generates. The AI made all the visual decisions. The human's contribution was minimal.

This creates a new kind of creative strategy question: if you want to own what you make with AI, you need to be substantially involved in the creative decisions. The more you direct, curate, arrange, edit, and shape the work, the more your copyright claim strengthens. The more you simply prompt and accept, the less protection you have.

This has real consequences for professional creators. A graphic designer who uses AI tools as part of a complex creative process may be in a very different legal position than someone who sells AI-generated prints directly from prompts.

🌍 Other Countries, Other Rules

The UK and EU are taking different approaches. The UK Copyright, Designs and Patents Act 1988 actually has a provision for "computer-generated works" — giving copyright to the person who arranged for the work to be generated. This might protect AI-generated content in the UK where US law wouldn't. The EU is still developing its approach under the AI Act. Copyright law is national — the same work might be protectable in one country and public domain in another.

What This Means for You

If you are a young creator using AI tools — whether for illustration, writing, music, or other creative work — here is what the current legal landscape means practically:

1. Document your creative process. If you want to claim copyright, keep records of the decisions you made, the iterations you directed, the edits you applied. The more your creative fingerprint is visible, the stronger your claim.

2. Understand that AI outputs alone may be free to copy. If someone takes a purely AI-generated image you made and uses it commercially, you may have no legal recourse under current US law.

3. The law is moving. What's true today may not be true in five years. Several proposed bills in the US Congress would change these rules. Pay attention — your generation will live with whatever gets decided now.

Module 5 · Lesson 3

Check Your Understanding

Three questions on AI output ownership and copyright.

1. In the Zarya of the Dawn case, what specifically could Kristina Kashtanova copyright?

✓ Correct. The USCO drew a clear line: her human creative choices (writing, arrangement) were protectable; the AI-generated images were not, because the visual creative decisions were made by Midjourney, not by her.

The USCO split the registration: human-authored text and arrangement = copyrightable. AI-generated images = not copyrightable. It's about where the human creative decisions were made.

2. What did the federal court rule in Stephen Thaler's case about his AI-generated painting?

✓ Right. "Human authorship is a bedrock requirement of copyright" — the judge's exact words. A painting made entirely by an AI machine has no copyright holder and enters the public domain.

The court was clear: building the AI is not the same as authoring the art. Copyright requires human creative authorship in the specific work. The painting has no copyright and belongs to everyone.

3. According to the USCO's guidance, how can someone strengthen their copyright claim when using AI creative tools?

✓ Exactly right. Copyright protection scales with human creative input. The more deliberate, specific, and extensive your creative choices, the stronger your copyright claim over the result.

It's about creative involvement, not cost or credits. The more human creative decisions you make — in selection, direction, arrangement, editing — the stronger your copyright claim becomes.

Module 5 · Lab 3

The Copyright Scorecard

Describe a creative project and find out how much of it you might actually own.

Your Mission

Describe a creative project you've done (or might do) that uses AI tools — a graphic novel, a music video, an illustrated story, a website design, a poem collection. The AI will ask you questions about your process and help you figure out which parts are likely copyrightable and which parts might not be — based on the real legal framework from Lesson 3.

This is especially useful for young creators who are starting to use AI tools professionally or for school projects. Understanding what you do and don't own matters.

💬 Try: "I made a 10-page illustrated short story where I wrote the text and then used Midjourney to create images based on my descriptions. How much of it do I own?" OR describe your own project.

AI Copyright Advisor

Ownership Analysis

Hi! I'm your Copyright Scorecard assistant. 📋

Describe a creative project you've made or want to make that uses AI tools. Tell me as much as you can about your process — how much did you decide vs. how much did the AI decide?

I'll ask follow-up questions and then give you a breakdown: what you likely own, what you probably don't, and how you could adjust your process to strengthen your ownership claim.

What's your project?

Module 5 · Lesson 4

New Rules for a New World

How lawmakers, platforms, and the creative industry are responding — and what's still unresolved.

The law is playing catch-up with technology. What rules are being written, and who gets to write them?

🌐

This Is Your FutureThe decisions being made right now — in Congress, in courts, in company boardrooms — will shape the creative economy you enter. Understanding these debates isn't just academic. It's preparation for the world you'll work in.

The EU AI Act and Transparency Requirements

The European Union passed the EU AI Act in 2024 — the world's first comprehensive legal framework for artificial intelligence. Among its provisions: companies deploying "general-purpose AI" systems (like GPT-4 or Stable Diffusion) must publish a summary of training data used to train those systems.

This transparency requirement is significant. Currently, most AI companies treat their training data as proprietary — they don't tell you what was in it. The EU Act would force disclosure, at least in summary form, which would allow artists and authors to find out if their work was used.

The Act also requires that AI-generated content be clearly labeled as such when it could be mistaken for human-made work — a separate but related issue for the creative industries. Companies must comply with EU AI Act provisions starting in stages from 2024 through 2027.

EU AI ActA comprehensive 2024 regulation from the European Union that classifies AI systems by risk level and sets rules for transparency, safety, and accountability. It's the world's first major AI-specific law.

Proposed US Legislation: The NO FAKES Act and TRAIN Act

In the US, multiple bills are competing in Congress. Two notable ones:

The NO FAKES Act (2023) — "Nurture Originals, Foster Art, and Keep Entertainment Safe." This bill targets AI-generated likenesses: if an AI is used to generate a realistic fake of a specific person without their consent — their face, their voice — the person would have a legal right to sue. This is aimed at deepfakes but also at AI-generated music that mimics specific artists' voices.

The TRAIN Act (2023) — "Transparency and Responsibility for Artificial Intelligence Networks." This bill would require AI companies to disclose what copyrighted works were used in training — similar to the EU requirement. It was introduced by Senators Brian Schatz and John Kennedy (bipartisan) but has not yet passed.

Neither bill had become law as of this module's writing. The US legislative process moves slowly, and the AI industry lobbies heavily. However, the direction of policy debate is clear: transparency and consent are becoming baseline expectations.

Industry Self-Regulation: Deals and Opt-In Models

Some AI companies, anticipating regulation, have begun making deals with content creators rather than waiting for laws to force them:

Getty Images + Nvidia (2023): Getty licensed its image library to Nvidia for training AI models — a commercial deal where Getty gets paid and Nvidia gets legal training data. This is the "opt-in, paid" model that artist advocates have been calling for.

Shutterstock + OpenAI (2023): Shutterstock licensed its content to OpenAI for training, and in exchange created a contributor fund to pay photographers and illustrators whose work was used. The fund distributes payments based on a formula — not everyone agrees it's adequate, but it's a step toward compensation.

The Associated Press + OpenAI (2023): The AP licensed its archive of news text to OpenAI in exchange for access to OpenAI technology. A model that works for both parties — though critics note the AP's archive is enormous and the terms haven't been fully disclosed.

These deals represent a possible future: a licensing ecosystem where AI companies pay for training data rather than scraping it. But they cover a tiny fraction of the creative work that has already been ingested into existing models.

💡 The Music Industry Comparison

The music streaming disputes of the 2010s offer a useful parallel. When Spotify launched, many musicians argued they were being paid fractions of a penny per stream while Spotify built a billion-dollar company on their work. After years of lawsuits, lobbying, and negotiation, the Music Modernization Act of 2018 created new licensing rules and better royalty structures. It was messy and imperfect, but it happened. Many observers think AI and creative work will follow a similar arc — conflict, negotiation, and eventually some kind of structured licensing framework.

What Young Creators Should Know Right Now

The legal landscape is unresolved, but here are practical realities for creators entering this field:

1. Watermark or document your original work. If your art is ever used to train an AI without permission, having dated documentation of your original creative process strengthens any future claim.

2. Read platform terms carefully. Some platforms — including some AI art tools — include clauses in their terms of service claiming rights to use your outputs or inputs for training. Understand what you're agreeing to before you upload.

3. The "style" question remains legally unsettled. Copying an artist's specific, original work is infringement. But "style" itself has historically not been copyrightable in the US. AI generators that mimic style without reproducing specific images may be operating in a legal gray zone — but that doesn't mean it's ethically settled.

4. Advocacy matters. The organizations pushing for creator rights — the Authors Guild, the Graphic Artists Guild, the American Society of Media Photographers — are active in shaping policy. If you care about this, pay attention to them.

🔮 Still Unresolved

As of this module's writing, courts have not issued final rulings in the major AI copyright cases. The US Copyright Office is conducting a formal study and may issue guidance that changes practice. Congress could pass legislation. The EU AI Act requirements will create new pressures globally. This area is moving fast — check current news for updates beyond this module's publication.

Module 5 · Lesson 4

Check Your Understanding

Three questions on policy, industry deals, and what's ahead.

1. What major transparency requirement does the EU AI Act impose on general-purpose AI companies?

✓ Correct. The EU AI Act requires disclosure of training data summaries — a transparency measure that would allow artists to find out if their work was ingested. It doesn't require individual consent, but it's a significant first step.

The EU AI Act requires a training data summary — not full source code, not mandatory consent, not a specific royalty formula. Transparency first, then accountability can follow.

2. What did the Shutterstock–OpenAI licensing deal introduce as a model for the industry?

✓ Right. The contributor fund model — where platforms that license content to AI companies share some compensation back to the original creators — is seen as a possible template for broader industry practice.

Shutterstock created a contributor fund to compensate creators whose work was licensed to OpenAI for training. It's not perfect, but it's an example of the "pay the creators" model in action.

3. Why is "style" considered a legal gray zone in AI copyright disputes?

✓ Exactly. Style has never been copyrightable in the US — you can paint "like Picasso" and it's legal. But AI systems that mimic a living artist's specific style at scale, using their actual work as training data, blur the line in a way the law hasn't fully addressed.

Style itself is traditionally not protected by US copyright. But when AI learns a style by training directly on someone's work, and then generates near-replicas of their aesthetic for profit, the ethical and legal line gets much blurrier.

Module 5 · Lab 4

Design the Rules

If you were writing the law, what would AI copyright rules look like?

Your Mission

You've seen the real debates, the lawsuits, the legislation. Now it's your turn: if you were advising Congress or the EU on AI copyright rules, what would you propose? The AI will push back on your ideas, point out tradeoffs you might not have considered, and help you think through the consequences of different policy approaches.

There's no single right answer here — this is genuinely unsettled territory. Your goal is to think rigorously, not to arrive at a predetermined conclusion. This lab is designed so younger learners can engage fully — bring your instincts and let the conversation sharpen them.

💬 Start with: "I think AI companies should have to get permission before using any copyrighted work for training, here's how I'd set it up…" OR "I think the current system is fine because…" OR "I'd create a compulsory licensing system like music streaming — here's why…"

AI Policy Workshop

Lawmaking Simulation

Welcome to the policy design lab! 🏛️

You're the advisor. The question on the table: what rules should govern AI training data and copyright?

You've seen the real debates — artists vs. companies, opt-in vs. opt-out, fair use vs. infringement, EU vs. US approaches. Now propose your framework.

I'll challenge your reasoning, bring up tradeoffs, and play devil's advocate. Strong policy thinking means considering who benefits, who gets hurt, what can be enforced, and what the unintended consequences might be.

What's your proposal?

Module 5

Module Test

15 questions covering all four lessons. Score 80% or higher to pass.

📝

Test Tips for Younger LearnersRead each question carefully before choosing. If you're unsure, eliminate the answers that are clearly wrong first. Remember: "still being decided by courts" is often a correct answer here — because it is.

1. What is LAION-5B?

✓ Correct.

LAION-5B was a massive dataset scraped from the web without permission — the foundation of Stable Diffusion and the center of major copyright disputes.

2. Why did artist Greg Rutkowski's situation become a symbol of the AI consent problem?

✓ Correct. Thousands of users were generating images "in the style of Greg Rutkowski" without him ever agreeing to it — making him a real-world example of the consent problem at scale.

Rutkowski's style was being mimicked in thousands of AI-generated images — his name became a prompt keyword. He hadn't agreed, hadn't been asked, and hadn't been paid.

3. What evidence most powerfully supported Getty Images' lawsuit against Stability AI?

✓ Right. The smeared watermarks were a "smoking gun" — evidence the AI had learned from watermarked photos, showing Getty's copyrighted images were in the training data.

The corrupted watermarks were the key evidence — they showed the AI had learned from watermarked Getty images, making it hard to deny the training data included Getty's copyrighted photos.

4. Which prominent authors filed a class-action lawsuit against OpenAI in 2023?

✓ Correct. The Authors Guild coordinated the action, with Grisham, Franzen, Martin, Picoult, and Hilderbrand among the named plaintiffs. They alleged their books were used without consent to train GPT-4.

The Authors Guild lawsuit included Grisham, Franzen, Martin, Picoult, Hilderbrand and others. They alleged OpenAI used their books — including from piracy sources — to train ChatGPT and GPT-4.

5. What does US copyright law require for a work to receive copyright protection?

✓ Correct. "Human authorship is a bedrock requirement of copyright" — the exact language used in the Thaler ruling. Registration is optional; human authorship is not.

Human authorship is the key requirement. You don't have to register (though it helps in lawsuits). But the creator must be human — that's why AI-only outputs currently cannot be copyrighted.

6. In Zarya of the Dawn, what part of the graphic novel DID receive copyright protection?

✓ Right. Kashtanova's written text and her creative choices in arrangement were protected. The AI images were not, because the visual creative decisions were made by Midjourney.

The human elements were protected: the text Kashtanova wrote and the arrangement decisions she made. Not the AI images themselves — those required human creative authorship that wasn't present.

7. Which AI art platform deliberately trained ONLY on licensed and public-domain content?

✓ Correct. Adobe specifically used Adobe Stock, licensed content, and public-domain works to train Firefly — a deliberate choice to avoid the consent controversy other platforms faced.

Adobe Firefly was trained on Adobe Stock and licensed content. Adobe made a specific business decision to avoid scraping unconsented work, positioning Firefly as "safe for commercial use."

8. What did the Shutterstock contributor fund represent as a policy model?

✓ Right. The contributor fund is a "voluntary opt-in commercial licensing" model — the platform gets paid for the deal, and some of that flows back to creators. It's imperfect but shows the direction.

Shutterstock licensed its content to OpenAI commercially, then created a contributor fund to distribute some of that revenue to photographers and illustrators whose work was included. It's one model for creator compensation.

9. What is "fair use" and why is it central to AI copyright disputes?

✓ Correct. Fair use is the primary legal defense AI companies raise against infringement claims. Whether training qualifies is literally what the courts are deciding right now.

Fair use allows limited copyright use without permission under certain conditions — commercial purpose, how much was taken, and market impact are key factors. AI companies claim training is fair use. Courts haven't fully decided.

10. What major transparency requirement does the EU AI Act impose on AI companies?

✓ Right. Training data summaries must be published — enabling creators to find out if their work was used. A first step toward accountability, even if not full transparency.

The EU AI Act requires training data summaries from general-purpose AI developers. Not full disclosure, not royalties — but a summary that creates at least some accountability and allows creators to investigate.

11. What does it mean for a work to enter the "public domain"?

✓ Correct. Public domain means no copyright — fully free for anyone to use. In the Thaler case, the AI-generated painting entered the public domain because no copyright could be established.

Public domain = no copyright = anyone can use it freely for any purpose. It's not "owned by the public" in any managed sense — it just means there's no restriction on use.

12. The NO FAKES Act (2023) specifically targets what type of AI-related harm?

✓ Correct. NO FAKES targets deepfakes and AI voice cloning — creating realistic digital replicas of specific people without consent. It gives those people a right to sue.

The NO FAKES Act targets AI-generated fake likenesses — your face, your voice — created without your permission. It's aimed at deepfakes and AI voice cloning, giving individuals legal recourse.

13. Why do artists argue that "opt-out" systems are unfair, even when they technically work?

✓ Right. The consent argument is about who bears the burden. Opt-out means you're already in — you have to discover it, find the form, and actively remove yourself. Artists say consent should come first, not cleanup.

The philosophical issue: opt-out assumes you're in by default. Artists argue the ethical standard should be opt-in — companies should ask permission before including someone's work, not after.

14. How does the music streaming royalty dispute of the 2010s relate to AI and creative work today?

✓ Correct. The streaming wars followed a predictable arc — conflict, lobbying, negotiation, new law. Many observers believe AI creative rights disputes will follow the same path, eventually producing licensing structures.

Streaming was messy — artists argued they were paid almost nothing while companies profited. After years of fights, better licensing structures emerged. It's a useful historical parallel for where AI might be heading.

15. A young creator uses an AI tool to generate 200 images, personally selects 12 of them, arranges them into a specific narrative sequence, writes original captions for each, and edits the final layout extensively. Under current US copyright guidance, how is this work likely treated?

✓ Correct. This follows the Zarya of the Dawn precedent: human creative decisions (selection, arrangement, writing, design) are protectable; the AI images themselves may not be. Strong human involvement strengthens the claim on the overall work.

Following the Zarya ruling: the human work (captions, selection, arrangement, layout) is protectable. The individual AI images are harder to protect. The more deliberate and specific the human decisions, the stronger the copyright claim on the whole.