
Encyclopaedia Britannica and its subsidiary Merriam-Webster have filed a lawsuit against artificial intelligence company OpenAI, accusing it of using copyrighted reference materials to train AI systems without permission. The case adds to a growing number of legal disputes between technology developers and publishers as courts increasingly confront questions about how copyrighted content can be used in building generative AI models.
The lawsuit was filed in a federal court in Manhattan and alleges that OpenAI copied tens of thousands of articles and dictionary entries from Britannica’s platforms during the training process for its language models. According to the complaint, the AI system can generate responses that closely resemble the publisher’s original content, raising concerns about unauthorised use of intellectual property and the potential commercial impact on Britannica’s digital business.
Britannica argues that the alleged use of its materials could divert online traffic away from its platforms and weaken the value of its subscription-based reference services. The publisher claims that by incorporating its content into AI training datasets, OpenAI is benefiting from material that required significant editorial investment without providing compensation or licensing agreements.
The legal complaint also raises concerns about attribution and brand representation. Britannica states that AI-generated responses sometimes reference the publisher in ways that may mislead users into believing the company authorised or endorsed the information produced by the AI system. The lawsuit therefore includes claims related to copyright infringement as well as trademark misuse and false attribution.
The dispute reflects a broader legal and economic debate about how generative artificial intelligence systems should be trained. Technology companies typically rely on large datasets collected from across the internet to develop AI models capable of generating human-like responses. Publishers and content creators, however, increasingly argue that this process involves the unauthorised use of copyrighted material.
Britannica is seeking financial damages and a court order preventing OpenAI from using its content in future AI training. The outcome of the case could have far-reaching implications for the technology sector, particularly for companies developing large language models that rely on vast quantities of digital information.
As generative AI continues to expand across industries, courts are expected to play a central role in determining how copyright law applies to the data used to train advanced artificial intelligence systems.