In the summer of 2016, Sony's research lab CSL released a song called "Daddy's Car" — a cheerful Beatles-style pop track that most listeners assumed was an unreleased demo from the 1960s. It had jangly guitars, layered harmonies, and a breezy verse-chorus structure. What it didn't have was a human composer. Every note had been generated by a program called Flow Machines, trained on 13,000 lead sheets from 20th-century Western pop music. A human arranger, Benoît Carré, added lyrics and polished the production — but the melodic and harmonic core came entirely from the AI. Music Twitter argued about it for weeks.
This module is written for students roughly ages 11 to 16 who love music and are curious about technology but don't necessarily read music or play an instrument. You don't need to know what a "chord progression" is to understand why AI music matters — though you will understand chord progressions by the end of Lesson 2. Every technical term is explained the first time it appears, and labs are designed to be exploratory rather than technical.
AI music is one of the fastest-moving areas in all of AI. In 2020, only a handful of research labs could generate recognizable songs. By 2023, free tools let anyone type a sentence and receive a full produced track in seconds. Understanding how these tools work — and what their limits are — is genuinely useful whether you want to be a musician, a game designer, a filmmaker, or just someone who understands the world you live in.
AI music tools don't actually "understand" music the way a musician does. They learn statistical patterns — which notes tend to follow which other notes, which rhythms appear in which genres — and use those patterns to generate new combinations. This makes them incredibly fast and surprisingly good, but it also means they can make bizarre errors that no human musician would ever make.
1. Composition — AI generates new melodies, harmonies, and song structures. Tools like Suno and Udio do this from a text prompt. Google's MusicLM (2023) generates short audio clips matching a text description.
2. Performance / Style Transfer — AI learns the performance style of a real artist and applies it to new audio. This is how "voice cloning" works — training a model on recordings of a person's voice so it can generate new speech or singing that sounds like them. This technology raises serious ethical questions explored in Lesson 4.
3. Analysis and Recommendation — AI analyzes music to identify patterns, genres, moods, and instrumentation. Spotify's recommendation engine uses this constantly. When Spotify decides you might like a new artist, that's an AI that has analyzed the audio characteristics of thousands of songs you've played.
Spotify processes more than 100,000 new tracks uploaded to its platform every single day. No human listens to all of them. AI tools scan each track automatically for genre, tempo, key, mood, and "audio features" — then use that data to decide which listeners might enjoy it.
AI music tools are not "stealing" music the way copying a CD is stealing — but they do raise genuinely hard questions about whether training on copyrighted recordings without permission is fair. These legal questions were not settled as of 2024 and courts in the US and UK were actively considering cases involving AI training data.
AI music is also not "the end of musicians." Every time a new technology changed music — from recorded audio (1877), to synthesizers (1960s), to digital audio workstations (1990s), to auto-tune (1998) — people worried that musicians would become irrelevant. Each time, the number of people making music actually increased. AI is likely to follow the same pattern, though it will absolutely change what musicians spend their time doing.
You've just learned that AI music tools work by learning patterns from training data. Now let's go deeper. Ask your AI guide anything about how these tools work, what they're good at, and where they fall short. There are no wrong questions — this is an exploration lab.
Try to have at least 3 back-and-forth exchanges. The lab completes automatically once you reach that threshold.
When OpenAI released MuseNet in 2019, the research team gave it a single challenge: continue a piece of music. Feed it the first 30 seconds of a Mozart piano sonata, and MuseNet would generate the next 4 minutes. Feed it the opening bars of a Chopin nocturne, and it would continue in Chopin's style. Feed it a country guitar riff and ask it to blend in Beethoven, and it would actually try. The model had been trained on hundreds of thousands of MIDI files spanning classical, jazz, country, pop, and folk — and it had learned that these styles had different note-choice patterns, different rhythmic feels, and different structural shapes. It didn't understand any of this the way a musician does. But it had seen enough examples that it could fake it convincingly, at least for a few minutes at a time.
To understand what AI can and can't do with music, you need to understand the three basic elements every piece of music has. Don't worry — this is simpler than it sounds.
Melody is the part you hum. It's a sequence of individual notes that forms the recognizable "tune" of a song. When you hear the opening of "Twinkle Twinkle Little Star," what you're hearing is melody.
Harmony is what happens when multiple notes play at the same time. When a guitar player strums a chord, that's harmony — multiple notes ringing together. Harmony gives music its emotional color: major chords tend to sound bright or happy, minor chords tend to sound darker or sadder.
Rhythm is the pattern of beats over time — when notes happen, how long they last, and where the accents fall. A waltz has a very different rhythmic pattern (ONE-two-three, ONE-two-three) from a march (ONE-two, ONE-two) or hip-hop (heavy beats on 1 and 3, syncopation everywhere).
AI handles rhythm and melody reasonably well because they involve patterns that repeat in predictable ways. Harmony is trickier because the "right" chord often depends on context and emotional intent. And structure — the big-picture shape of a song — is where AI still struggles most. Many AI songs feel like they're going somewhere but never actually arrive.
The technology behind most modern AI music tools is called a Transformer — the same architecture that powers ChatGPT and Google Translate. A Transformer is very good at learning what tends to come next in a sequence. For language, the sequence is words. For music, the sequence is notes, chords, or audio samples.
The key feature of a Transformer is something called attention. Instead of just looking at the note immediately before the current one, the model can "attend" to notes from much earlier in the piece. This is why AI music can maintain a consistent key (musical key = the set of notes a song uses) across a long piece — the model "remembers" what key was established at the start and keeps pulling it back in.
Google's MusicTransformer (2018), developed by Anna Huang and colleagues at the Magenta project, was one of the first models to demonstrate that Transformers could generate piano music with long-range structure — repeating a theme introduced earlier, building tension and releasing it — in a way that felt more like real composition than previous AI music systems.
By 2023, tools like Suno (launched in late 2023) and Udio (launched April 2024) made AI music generation accessible to everyone. You type a text prompt — "dreamy indie folk song about late summer, female vocals, fingerpicked acoustic guitar" — and within 30 seconds you have a full produced track with vocals.
These systems work in multiple stages: First, a language model interprets your text prompt and converts it into a rich musical description. Then a music generation model (often using a technique called diffusion — the same technology behind image generators like DALL-E) generates audio that matches that description. The training data for these systems likely includes millions of hours of music, though neither Suno nor Udio has publicly disclosed exactly what they trained on — a fact at the center of an ongoing lawsuit filed by major record labels in June 2024.
Suno.com offers a free tier as of 2024. Try typing two very different prompts and compare the results: "sad piano ballad, slow tempo, single instrument" vs. "chaotic math rock, 180 BPM, odd time signatures." Notice how the AI interprets genre and mood differently. What does it get right? What sounds off?
Genre imitation — AI is very good at producing something that sounds like a specific genre. "Lo-fi hip hop study beats" is essentially an AI specialty at this point.
Variation generation — Given a melody, AI can rapidly produce dozens of harmonic arrangements or stylistic variations. Film composers and game audio designers use this to quickly explore possibilities.
Filling gaps — Tools like Adobe's Project Music GenAI Control (announced 2024) let you generate music that exactly fits a specified duration — useful for video creators who need a 43-second background track.
Intentional structure — Great songs are built around deliberate choices: a chorus that hits harder because of a specific dynamic change, a bridge that introduces harmonic tension before resolution. AI generates plausible next moments without a plan for the whole.
Lyrics that mean something — AI lyrics are often grammatically correct and topically relevant but emotionally hollow. They rhyme and scan but rarely say anything surprising or true. This is because the model is predicting likely word sequences, not trying to communicate an experience.
Cultural and emotional specificity — A really great blues song isn't just about the notes; it's about what the blues means historically and emotionally. AI can mimic the surface features of blues without any access to what the genre actually expresses.
Now that you know about melody, harmony, and rhythm, dig into how AI handles each one. Ask your AI guide to explain what AI does with a specific element, or challenge it with a harder question: why do some AI songs feel structurally empty even if each moment sounds OK?
When the Writers Guild of America went on strike in May 2023, one of their central demands was protection from AI replacing their work. The Screen Actors Guild followed in July. At the same time, several film and television productions quietly began using AI-generated music for temp tracks — placeholder scores used during editing before the final music is recorded. Some productions found the temp tracks so passable they didn't bother replacing them. This wasn't because AI scored better than human composers. It was because AI was free and instant, and the cost of hiring a composer for a streaming show that might be cancelled after one season was becoming hard for studios to justify. Scores of working film composers began losing regular work in 2023.
Film music has a specific job: it has to reinforce what's happening on screen without distracting from it. When a character is in danger, the music raises tension. When two characters fall in love, the music softens. This emotional specificity is something human composers are trained to deliver and AI still struggles with — because AI doesn't watch the film, it doesn't understand the character, and it doesn't know what moment needs to hit hardest.
However, for background music — a scene set in a coffee shop, ambient sound in a corridor, generic tension in a hallway — AI is genuinely competitive. Tools like AIVA (founded 2016, used by advertising agencies and content creators) and Soundraw (2020) let directors generate genre-appropriate background music in seconds at zero cost. As of 2024, AIVA had been used in over 300,000 creative projects.
Video games have a music problem that AI is genuinely well-suited to solve. A player might spend 40 hours in the same in-game environment — a forest, a city, a dungeon. If the background music loops every 3 minutes, players hear the same track 800 times. Human composers can't write 40 hours of non-repetitive ambient music economically.
Procedural music generation — music that the AI generates in real time based on what's happening in the game — is an active area of development. Dynamedia's AI system and academic projects at the Georgia Institute of Technology have demonstrated systems that shift musical style, intensity, and instrumentation based on game events: the music gets more urgent as enemies approach, and relaxes as they're defeated. The 2023 game Hi-Fi Rush (Bethesda) was notable for syncing all its gameplay and animations to a music beat, which hinted at the direction game audio is heading.
The original Super Mario Bros. theme (composed by Koji Kondo in 1985) was designed specifically to loop seamlessly because players would hear it for hours. Every note was chosen to avoid fatigue. Modern AI procedural music systems try to solve this problem differently — by never repeating the same sequence twice.
Spotify's recommendation algorithm — often called the "Discover Weekly" system — is one of the most impactful AI music systems ever built, and most people don't think of it as AI music at all. Launched in 2015, Discover Weekly analyzes each user's listening history and the audio features of millions of songs to generate personalized playlists. By 2016, Spotify reported that 40 million users had listened to Discover Weekly playlists, with a song-save rate of roughly 25% — meaning one in four songs landed well enough that users saved it to their libraries.
The deeper implication is about discovery: AI recommendation systems now determine which new artists get heard and which don't. A song that the algorithm reads as having the right audio features for a given listener mood gets served up; one that doesn't match known patterns gets buried. Some critics argue this is making music more homogeneous — pushing artists to produce tracks that the algorithm prefers.
A 2023 survey by the Musician's Union (UK) found that 52% of professional session musicians reported losing work to AI tools or to clients using AI-generated music instead of hiring live musicians. Among composers who write for advertising — a market that AI can serve cheaply — the figure was 68%. These numbers are from one year into widespread consumer AI music tools. The trend is accelerating.
The musicians adapting best to the AI era fall into two groups. The first group uses AI tools as collaborators — generating quick starting points that they then develop, reshape, and make personal. Producer Holly Herndon (who has a PhD from Stanford's Center for Computer Research in Music and Acoustics) has been doing this since 2019, using AI voice models trained on her own voice to create choral music that sounds like many versions of herself singing simultaneously. Her 2019 album PROTO was widely reviewed as genuinely innovative rather than just technically interesting.
The second group competes on irreplaceable humanity — the things AI cannot do. Live performance, improvisation, cultural authenticity, personal storytelling, the physical and social experience of music being made by people in a room. Jazz, blues, folk, and world music traditions that depend heavily on cultural specificity and live energy are arguably more protected from AI replacement than commercial pop.
The music industry has had "ghost producers" — professionals who make music credited to a famous name — for decades. AI doesn't invent this problem, but it scales it dramatically. If anyone can generate a full professional-sounding track in 30 seconds, the line between "I made this" and "I had this made" becomes very blurry. In 2023, several AI-generated songs were uploaded to Spotify and streaming platforms presenting them as real artist releases, some accumulating millions of streams before being removed. The platforms' systems for detecting AI-generated content were at the time inadequate.
You've learned about real impacts on film composers, game audio designers, and working musicians. Now think about it from different angles. Who gets hurt most? Who benefits? What should musicians do to adapt? Is any of this fair? These are genuinely open questions — explore them here.
On April 14, 2023, a track called "Heart on My Sleeve" went viral on TikTok and YouTube. It sounded exactly like Drake and The Weeknd collaborating on a melancholy R&B song. The production was flawless. The vocal performances were indistinguishable from the real artists. The track had been made by a producer using the pseudonym ghostwriter977 using AI voice cloning technology trained on publicly available recordings of both artists. Neither Drake nor The Weeknd had consented or been compensated. Within 48 hours the track had millions of streams and had been pulled from every platform by Universal Music Group, which represents both artists. The person who made it has never been publicly identified. The Recording Industry Association of America called it a pivotal moment that demonstrated the music industry was "wholly unprepared" for AI voice cloning at scale.
Voice cloning means training an AI on recordings of a specific person's voice until the model can generate new audio that sounds like that person saying or singing anything. The technology has legitimate uses — restoring the voice of a person who has lost the ability to speak, dubbing films into new languages with the original actor's voice, creating consistent narration for audiobooks. It also has deeply problematic uses, as "Heart on My Sleeve" demonstrated.
The key technical fact: voice cloning requires relatively little data. Early systems in 2018 needed hours of recordings. By 2023, systems like ElevenLabs could clone a voice convincingly from as little as three minutes of audio. This means that anyone with a microphone and a public profile — musicians, podcasters, actors, politicians — is potentially vulnerable to having their voice used without consent.
Copyright law protects specific creative works — a song, a recording, a lyric. It does not protect a style. You can legally make music that sounds like Elvis; you cannot legally use Elvis's actual recordings without permission. This framework was developed when copying required significant effort. AI changes the effort calculus dramatically.
The central legal question in 2024 was whether training an AI on copyrighted music constitutes copyright infringement. In June 2024, Universal Music Group, Sony Music, and Warner Music Group filed lawsuits against Suno and Udio, alleging that these companies had trained their AI systems on copyrighted recordings without permission or compensation. The record labels claimed this violated copyright law; Suno and Udio argued their use was covered by "fair use" doctrine (the legal provision that allows limited use of copyrighted material for purposes like education or research). These cases were unresolved as of late 2024 and were considered likely to shape AI music law for decades.
"Fair use" in US copyright law lets you use copyrighted material without permission in certain cases — for commentary, criticism, education, or parody. The question for AI training is: if a company feeds millions of songs into a computer to teach it music, is that "using" those songs in a way that requires permission? Courts will ultimately answer this, but it's not a simple question.
If you type a prompt into Suno and a song comes out, do you own that song? The answer varies by jurisdiction and is changing rapidly. As of 2024:
In the United States, the Copyright Office ruled in several cases that AI-generated work without "sufficient human authorship" is not eligible for copyright protection. This means a song generated entirely by AI — even if you typed the prompt — may be in the public domain and anyone could use it. However, if a human significantly shaped, arranged, or modified the AI output, that human contribution may be copyrightable.
In the UK, the law is different: computer-generated works can be protected by copyright for up to 50 years, with the copyright belonging to the person who arranged for the work to be created — meaning the person who typed the prompt might own it.
Most AI music platforms' terms of service claim some rights over generated content, and these vary significantly between platforms. Suno's 2024 terms of service gave users broad rights to commercial use on paid tiers, while claiming a license to use your prompts and outputs to improve their models.
Several major artists have spoken publicly about AI use of their music and voice. Paul McCartney used AI in 2023 to isolate John Lennon's voice from a low-quality demo to complete the final Beatles song "Now and Then" — a use most people viewed as respectful and consensual. Grimes announced in 2023 that she would share royalties with anyone who used AI to generate music in her voice — a genuinely unusual stance. Billie Eilish, Nicki Minaj, Katy Perry, and dozens of other artists signed an open letter in April 2024 calling on AI companies to stop "devaluing" human artistry and using artists' work without consent or compensation.
The ethical line that most people across the industry seem to agree on: using an artist's voice or likeness without their consent is wrong, regardless of the legal outcome. The harder question is what to do about it technologically and legally.
Future 1 — Negotiated Licensing: AI companies pay into a collective licensing fund (similar to how radio stations pay licensing fees) that distributes royalties to artists whose music was used for training. This is the model the music industry is pushing for and what organizations like ASCAP and BMI are advocating.
Future 2 — Open Source AI Music: AI music tools become fully open source and freely available to anyone. Music creation becomes completely democratized — anyone can make professional-sounding music. Commercial music becomes harder to monetize, and the music economy shifts toward live performance, brand deals, and direct fan support (like Patreon).
Future 3 — AI as Instrument: AI music tools are treated legally and culturally the same way synthesizers and drum machines are — as instruments that musicians use to make music. The musician is still the artist; the AI is just a very sophisticated tool. This requires society to decide that operating AI tools is itself a creative skill worth recognizing.
If you're a student who makes music: the skills that will matter most in an AI-saturated music world are taste (knowing what's good), cultural literacy (understanding what music means and where it comes from), live performance ability, and the human capacity to write lyrics from genuine experience. None of those can be generated. Learn the tools — but don't let the tools replace what only you can bring.
This lab is about your opinions. There are real ethical tensions in AI music — consent, ownership, economic fairness, creative credit — that don't have clean answers. Your AI guide will help you think through different angles without pushing you toward any single conclusion. Disagree with it. Push back. See where the argument goes.