Britannica Sues OpenAI - 100,000 Copied Articles Alleged

Encyclopedia Britannica and Merriam-Webster filed a copyright lawsuit against OpenAI on March 13, alleging that ChatGPT was trained on nearly 100,000 of their copyrighted articles - and that OpenAI refused to pay for a license even after Britannica asked.

"OpenAI used its online articles and encyclopedia and dictionary entries to teach its flagship chatbot ChatGPT to respond to human prompts and 'cannibalized' Britannica's web traffic with AI-generated summaries of its content."
Encyclopedia Britannica complaint, 1:2026cv02097 (SDNY)

The case adds two of the most recognizable names in reference publishing to a growing pile of litigation surrounding AI training data. The law firm bringing the claim is Susman Godfrey, the same firm representing the New York Times and a coalition of newspaper publishers in the existing AI copyright multidistrict litigation.

TL;DR

Case filed March 13 in the Southern District of New York (1:2026cv02097)
Britannica alleges ~100,000 encyclopedia articles and dictionary entries were scraped and used to train ChatGPT
OpenAI allegedly reproduced Merriam-Webster's definition of "plagiarize" verbatim when asked
A trademark claim (Lanham Act) targets hallucinated content falsely attributed to Britannica
Britannica says it approached OpenAI about licensing in November 2024 - OpenAI "never seriously pursued" it
OpenAI's response: fair use

What Britannica Is Claiming

The complaint lays out three stages of alleged infringement. First, OpenAI scraped Britannica's website to build training datasets. Second, that scraped content was fed into the model during training. Third, ChatGPT creates outputs that include "verbatim or near-verbatim reproductions, summaries, or abridgements" of the protected material when users query it.

The suit also names a RAG (retrieval augmented generation) workflow allegation - arguing that OpenAI doesn't just use Britannica's content at training time, but retrieves it again during inference when answering queries. If that claim sticks in court, it'd substantially broaden the infringement theory beyond what some earlier publisher suits have argued.

Physical copy of Encyclopaedia Britannica next to a laptop showing Wikipedia's encyclopedia article The Encyclopaedia Britannica, in print and digital form. The suit alleges OpenAI scraped nearly 100,000 articles from the online edition. Source: commons.wikimedia.org

The trademark count is the more novel element. Britannica argues that when ChatGPT hallucinates inaccurate information and attributes it to Britannica, it violates the Lanham Act by misleading users about the source. A 250-year-old brand built on accuracy is claiming real reputational damage from being associated with AI-produced errors it didn't write.

The complaint includes a pointed exhibit: ChatGPT reproduced Merriam-Webster's definition of the word "plagiarize" nearly verbatim when prompted. A chatbot plagiarized the definition of plagiarism. Researchers at Stanford already showed that models can reproduce copyrighted books word for word at minimal cost - a verbatim dictionary definition is a far lower bar.

The Licensing Breakdown

This case follows a now-familiar pattern: publisher reaches out to AI company about licensing, AI company stalls, publisher sues.

Britannica states it contacted OpenAI about licensing in November 2024. OpenAI "never seriously pursued" the opportunity, according to the complaint, even as it signed deals with other publishers. News Corp reached an agreement with Meta worth up to $50 million per year in March 2026, showing that licensing is commercially viable when AI companies choose to engage.

Stakeholder	Impact	Timeline
Britannica / Merriam-Webster	Potential licensing revenue or damages award; reputational harm from misattributed AI hallucinations	Case pending; trial years away
OpenAI	Added legal cost, potential licensing liability across reference content; precedent risk if RAG claim succeeds	MDL transfer likely within months
Other AI companies	Broader training-data exposure if courts reject fair use for reference content; licensing cost increases	Depends on NYT MDL outcome (no ruling before summer 2026)
Publishers still unlicensed	Precedent case to watch; litigation playbook is now well-established	Ongoing
Consumers	No direct impact now; potential chilling effect on AI-generated reference summaries if OpenAI loses	Long-term

Who Benefits, Who Pays

Companies That Already Paid

News Corp's $50 million-per-year Meta deal shows the market rate for major content licensing. Axel Springer, AP, and several other publishers have signed similar agreements. Their executives were right to negotiate before suing: they got cash, guaranteed distribution, and preserved the relationship.

Britannica gambled on negotiation and lost. It now has to litigate.

OpenAI's Exposure

OpenAI's stated defense is fair use. The company's spokesperson said models "are trained on publicly available data and grounded in fair use." That argument hasn't yet been tested in court against reference publishers specifically.

The NYT multidistrict litigation - consolidating more than 14 copyright suits - is the bellwether. No fair use ruling is expected before summer 2026. Legal analysts expect the Britannica case to be transferred into that MDL and stayed pending its outcome, which pushes any meaningful resolution to 2027 at the earliest.

If the fair use argument fails, OpenAI's exposure isn't limited to Britannica. The company would face a wave of retroactive licensing demands from every publisher that can demonstrate its content was used in training.

Sam Altman speaking at TechCrunch Disrupt SF 2019 OpenAI CEO Sam Altman has defended training on publicly available data as a fair use. Courts have not yet ruled on the argument. Source: commons.wikimedia.org (CC BY 2.0, TechCrunch)

The RAG Angle

The training-data copyright debate is already complex. The RAG allegation adds another layer. If a court finds that retrieving and surfacing copyrighted content during inference - not just at training time - constitutes infringement, the legal exposure for AI companies expands dramatically. Every query to an AI system that returns reference material could become a billable event.

That's a stretch under current copyright doctrine, but the claim is being made by competent litigators with a concrete exhibit. Courts have surprised AI companies before.

What Happens Next

The fair use argument hasn't yet failed in court. But it hasn't yet been tested at scale against publishers with 250 years of brand equity and a verbatim-reproduction exhibit.

The Britannica case will almost certainly be transferred to the NYT MDL. Once there, it's likely to be stayed until the lead cases produce a ruling on fair use. That could mean a three-to-five year legal process before Britannica sees any money.

In the short term, the suit adds pressure on OpenAI's licensing negotiations with any other reference publisher that's still on the fence. The cost of not reaching a deal is now litigation with Susman Godfrey, which doesn't come cheap to defend against.

The total count of AI copyright lawsuits in the US now stands at roughly 91. Britannica's case is standout not because it'll produce a fast ruling, but because it shows that even legacy institutions with deep content libraries are choosing litigation over waiting for voluntary licensing deals that aren't coming.

Merriam-Webster has been defining the English language since 1828. OpenAI trained ChatGPT on much of what it wrote. The two have now met in court, and the dictionary company gets to argue that the AI company copied the word for copying.

Sources: TechCrunch, Engadget, The Next Web, TechStartups