Major Book Publishers File Class Action Against Meta Over Llama AI Training

Key Points
- Five major publishers and author Scott Turow sue Meta over Llama AI training.
- Complaint alleges copying from pirate sites like LibGen, Anna’s Archive, Sci‑Hub.
- Llama reportedly reproduces textbook passages verbatim.
- Plaintiffs seek damages, an injunction, and a full list of copyrighted works used.
- Meta defends its actions as fair use and vows to fight the lawsuit.
- Previous rulings have favored Meta but did not endorse its training methods.
- The case could influence future AI data‑use policies and copyright law.
Five leading book publishers—Macmillan, McGraw Hill, Elsevier, Hachette and Cengage—along with author Scott Turow have sued Meta, alleging the company copied copyrighted books and journal articles to train its Llama AI models. The lawsuit claims Meta harvested material from notorious pirate sites and the Common Crawl dataset, then fed it into Llama, which can reproduce text verbatim. Plaintiffs seek damages, an injunction to stop the training, and a full inventory of the works used. Meta says it will fight the case aggressively, maintaining that AI training can fall under fair use.
Five of the world’s biggest book publishers—Macmillian, McGraw Hill, Elsevier, Hachette and Cengage—joined forces with bestselling author Scott Turow to launch a class‑action lawsuit against Meta Platforms. The complaint accuses the company of “one of the most massive infringements of copyrighted materials in history” by using their books and journal articles without permission to train the Llama family of artificial‑intelligence models.
Publishers allege massive copyright infringement
The suit says Meta deliberately scraped content from “notorious pirate sites” such as Library Genesis, Anna’s Archive, Sci‑Hub and Sci‑Mag, then incorporated those files into the Common Crawl dataset that feeds Llama. Plaintiffs argue the dataset is riddled with unauthorized copies, making Meta’s training process a direct violation of copyright law.
According to the filing, Llama can reproduce large blocks of text almost word‑for‑word. The complaint cites an example where the model, when prompted with two sentences from Cengage’s best‑selling textbook *Calculus: Early Transcendentals* (9th ed.), continued the passage verbatim, effectively recreating the copyrighted material.
Beyond the alleged copying, the publishers seek a court order that would force Meta to halt the disputed training activities and provide a comprehensive list of every book, journal article and other copyrighted work that contributed to Llama’s development. They also demand monetary damages for the alleged infringement.
Meta’s response, delivered through spokesperson Dave Arnold, frames the lawsuit as an attack on legitimate AI innovation. “AI is powering transformative innovations, productivity and creativity for individuals and companies, and courts have rightly found that training AI on copyrighted material can qualify as fair use,” Arnold said in an emailed statement. “We will fight this lawsuit aggressively.”
The case arrives amid a growing wave of litigation targeting AI developers. Earlier this year, a federal judge ruled in Meta’s favor in a separate copyright suit, though he cautioned that the decision “does not stand for the proposition that Meta’s use of copyrighted materials to train its language models is lawful.” In a parallel matter, Anthropic settled a class‑action claim for $1.5 billion after being accused of training its models on pirated books.
Legal experts note that the outcome could set a precedent for how AI companies handle copyrighted data. If the court sides with the publishers, Meta may be compelled to overhaul its data‑gathering practices, potentially reshaping the AI training landscape. For now, the lawsuit adds another high‑profile chapter to the ongoing debate over the balance between technological advancement and intellectual‑property rights.