Encyclopedia Britannica and Merriam-Webster Sue OpenAI Over Alleged Copyright Infringement

Key Points
- Encyclopedia Britannica and Merriam-Webster sued OpenAI for alleged copyright infringement.
- The lawsuit claims OpenAI scraped nearly 100,000 Britannica articles for AI training.
- Britannica alleges ChatGPT substitutes its content, reducing website traffic and revenue.
- The complaint also cites trademark violations and improper use of RAG workflow.
- Britannica warns that AI hallucinations threaten public access to reliable information.
- Legal precedent is unclear; a prior Anthropic case mixed transformation rulings with a $1.5 billion settlement.
- The case adds to a growing wave of publisher lawsuits against AI firms.
Encyclopedia Britannica and its subsidiary Merriam-Webster have filed a lawsuit against OpenAI, accusing the company of massive copyright infringement for scraping nearly 100,000 of their online articles to train its large language models. The complaint alleges that ChatGPT reproduces Britannica content, reduces web traffic and revenue, and violates trademark law. The case joins a growing wave of legal actions by publishers against AI firms, highlighting unresolved questions about the legality of using copyrighted material for AI training. A prior Anthropic case showed mixed rulings, underscoring the uncertainty that will shape future AI‑content use.
Background of Publisher Lawsuits
It is widely acknowledged that AI companies use web articles to train their models without compensating creators or obtaining permission. Major publishers such as The New York Times, the Chicago Tribune, and the Toronto Star have already pursued legal action against this practice.
Britannica and Merriam‑Webster Lawsuit
Encyclopedia Britannica and its subsidiary Merriam‑Webster have now joined the legal proceedings by filing a lawsuit against OpenAI. The complaint alleges that OpenAI committed “massive copyright infringement” by scraping and using nearly 100,000 of Britannica’s online articles to train its large language models without permission.
Britannica claims that ChatGPT generates responses that substitute its content, thereby reducing web traffic and potential revenue for the publisher. If users can ask ChatGPT a question and receive an answer based on Britannica’s articles, there may be less incentive to visit the website directly.
The lawsuit also targets OpenAI’s use of Britannica content in ChatGPT’s Retrieval‑Augmented Generation (RAG) workflow, a process where the AI scans the web for updated information when answering questions. Britannica alleges that the AI reproduces its content, in full or in part, when answering questions, and that OpenAI is violating trademark law.
Concerns About Hallucinations and Public Access
Britannica further argues that ChatGPT hallucinates information and then falsely attributes it to the publisher. According to the complaint, these hallucinations jeopardize “the public’s continued access to high‑quality and trustworthy online information.”
Legal Precedent and Prior Cases
There is currently no strong legal precedent establishing whether training an AI on copyrighted content constitutes copyright infringement. The law around this issue remains murky.
In a recent case involving Anthropic, a federal judge ruled that using copyrighted content as training data was transformative enough to be legal. However, the same judge found that Anthropic had illegally downloaded millions of books, resulting in a $1.5 billion settlement with affected writers.
Implications for the AI Industry
The outcome of these cases is expected to shape how AI companies can legally use web content in the future. Lawmakers and courts will need to address the balance between fostering AI innovation and protecting the rights of content creators and publishers.