Two of the world's most trusted reference publishers just threw a legal bomb at OpenAI. Encyclopedia Britannica and Merriam-Webster filed a lawsuit alleging the AI giant violated copyright on nearly 100,000 articles by scraping their content to train large language models. The suit marks another major escalation in the battle over AI training data, coming as OpenAI faces mounting legal pressure from publishers, artists, and creators who say the company built its empire on stolen intellectual property.
Encyclopedia Britannica and Merriam-Webster are taking OpenAI to court over what they claim is wholesale theft of their editorial work. The publishers allege OpenAI scraped close to 100,000 articles from their databases to train the company's large language models without permission, licensing deals, or compensation.
The lawsuit lands at a particularly sensitive moment for OpenAI. The company's already fighting multiple copyright battles with The New York Times, authors including John Grisham and George R.R. Martin, and visual artists. Each case chips away at OpenAI's defense that training AI models on publicly available content constitutes fair use under copyright law.
What makes this case different is the nature of the plaintiffs. Encyclopedia Britannica has been publishing authoritative reference content since 1768. Merriam-Webster has defined the English language for generations. These aren't just content mills or news aggregators - they're institutions that have spent centuries building reputations on accuracy, editorial standards, and meticulous fact-checking.
That carefully curated content apparently ended up in OpenAI's training datasets anyway. The publishers claim their articles, definitions, and educational materials were ingested to help ChatGPT and other models generate human-like text. In essence, OpenAI took the reference works that students and researchers have relied on for decades and fed them into its AI without asking.












