Encyclopedia Britannica and Merriam-Webster just dropped a major copyright lawsuit on OpenAI, alleging the AI giant trained GPT-4 on their premium content without permission—and now the model spits out near-verbatim copies on demand. Filed Friday, the suit claims OpenAI systematically scraped and memorized copyrighted encyclopedia entries and dictionary definitions, adding fresh fuel to the already blazing debate over AI training practices and intellectual property rights.
OpenAI is facing another heavyweight copyright battle. Encyclopedia Britannica and dictionary publisher Merriam-Webster filed a lawsuit Friday alleging the AI company systematically copied their copyrighted content to train ChatGPT, then generated responses that reproduce their work without permission or attribution, Reuters first reported.
The allegations are pretty damning. According to the complaint filed in federal court, GPT-4 has essentially memorized huge chunks of Britannica's encyclopedia entries. "GPT-4 itself has 'memorized' much of Britannica's copyrighted content and will output near-verbatim copies of significant portions on demand," the lawsuit states. "The memorized examples are unauthorized copies that [OpenAI] used to train their models, including GPT-4."
This isn't just about a few snippets here and there. The publishers claim OpenAI engaged in a pattern of wholesale copying, scraping their carefully curated reference materials to build out the knowledge base that makes ChatGPT seem so authoritative. When users ask ChatGPT for information, they're often getting responses that closely mirror—sometimes word-for-word—content that Britannica and Merriam-Webster spent resources creating and fact-checking.
The timing couldn't be more pointed. OpenAI has been on a signing spree lately, cutting licensing deals with major publishers like News Corp and the Associated Press to legitimize its training data sources. But those agreements came after the company had already trained its models on massive amounts of web content—including, the lawsuit alleges, premium reference materials that were never meant to be free for commercial AI training.












