Millions of copyrighted songs — including chart-topping hits — verifiably trained AI music generators, and now there are searchable databases to prove it. The Atlantic, through an investigation by staff writer Alex Reisner, published four catalogs documenting exactly which music fed these models:
- The largest contains roughly 12 million tracks
- The second holds about 9 million
- Two smaller sets clock in around 100,000 each
These aren’t obscure SoundCloud demos. Taylor Swift is in there. Bad Bunny is in there. The catalog of modern popular music, scraped and swallowed whole.
Suno, one of the most prominent AI music generators, acknowledged in court filings that it trained on “tens of millions” of recordings — later admitting unlicensed copyrighted material was included, according to Heavy Lifting, citing court filings.
The legal picture that emerges is striking. Sony, UMG, and Warner have filed lawsuits against Suno and Udio seeking up to $150,000 per song in statutory damages. A parallel book-industry case framed mass scraping as piracy rather than simple copyright infringement and reached an initial $1.5 billion settlement figure, according to Engadget. Meanwhile, the U.S. Copyright Office stated in January 2025 that AI-generated music often cannot itself be copyrighted without sufficient human authorship — meaning these tools can potentially infringe existing works while producing outputs that carry no protection of their own.
The AI companies call it fair use. They argue models learn abstract patterns, not specific songs. Labels call it piracy with a pitch deck. Courts are still deciding who’s right.
“Trained on copyrighted recordings without permission” — that’s how label plaintiffs have characterized the practice in filings, as summarized in industry commentary.
Researchers at the University of Tennessee developed HarmonyCloak, a tool that adds inaudible audio perturbations to recordings, making songs effectively unlearnable by AI models while sounding identical to human ears — a rare artist-controlled option in a landscape where most protections remain theoretical.
What Comes Next
The scrape-everything era may be ending as labels, lawmakers, and researchers build workable alternatives.
The fight is already shifting from courtrooms to contracts. Warner Music Group and Universal Music Group have reportedly struck deals with Udio and Suno respectively, moving toward licensed AI music models that actually compensate rightsholders. Tennessee passed a law protecting musical artists’ voices from unauthorized AI cloning. Streaming platforms are deploying AI-detection tools to flag and limit generative imitations — though results have been mixed, according to Engadget, with AI-generated copycats continuing to slip through and monetize.
Whether you’re an indie artist wondering if your EP got scraped, or someone who generated a birthday jingle on Suno last week, The Atlantic’s databases aren’t just journalism. They’re evidence. The Napster debate is back — this time wearing a licensing agreement and filing for fair use — and it just got a searchable answer. These are among the most consequential tech scandals to emerge from the AI era, reshaping how creators think about ownership in the digital age.




























