In a secret and secure room somewhere in the United States, there is a computer without an internet connection that keeps the ChatGPT source code as if it were a vault. And in that room, in that lost spot in U.S. territory, are the lawyers of one of the most important newspapers on the planet, the New York Times (NYT), with the purpose of determining how OpenAI has used its content to train the most famous AI model in the world.
The investigation is taking place under enormous security measures. It is being done because of the lawsuit that the NYT group, other media outlets, and authors filed against OpenAI for using millions of articles without paying for them. To access this room, lawyers must identify themselves with official documents and cannot enter with phones, USB sticks, or other electronic devices. They are provided with a computer without internet access and word processing software. After each session, the notes taken can be downloaded to another computer and deleted from the original computer.
The lawsuit, led by the Susman Godfrey law firm, seeks to set a legal precedent for training artificial intelligence models. According to NYT lawyers, OpenAI used millions of articles without offering any compensation, and now that content allows the chatbot to replicate entire articles upon user request.
Napster case
The lawsuits filed against OpenAI aim to get part of the economic value generated by Altman's, which is currently valued at 157 billion. For this, Susman Godfrey's (SG) lawyers have compared this case to that of Napster, the music-sharing platform of the 2000s. The NYT's defense assures that OpenAI's case is worse than Napster's because Altman's is not a university project but a company backed by Microsoft that seeks to generate profits.
Justin Nelson, SG's lawyer, alleges that OpenAI has infringed copyright in two ways: first, by using the articles to train its models, and second, by allowing ChatGPT to reproduce entire NYT articles on demand without users paying a subscription to the media outlet.
On the other hand, OpenAI's lawyers assert that using these materials is protected under the "fair use" doctrine, arguing that verbatim reproductions of NYT articles are "highly unusual" and do not represent typical chatbot use.
Thus, the central issue in this dispute is how language models learn from copyrighted content. NYT lawyers are investigating whether ChatGPT's training process constitutes "copying" legally and whether the model's learning sufficiently transforms the original content to disassociate it from its source. It is hoped that the court decisions in this case could set a precedent for regulating AI model training in the United States, especially considering that the country's Congress has not yet weighed in on this area, unlike the European Union.
Implications
The Napster comparison is important because, although Napster was eventually shut down, its legacy spurred the music industry to adopt the streaming model, which now dominates in music, movies, and video games. However, the NYT's lawyers argue that OpenAI, being a sophisticated Microsoft-backed company, has a much more calculated approach than Napster; it is an entity that seeks to exploit protected content for commercial purposes.
This case could reach the U.S. Supreme Court and set an important precedent in the industry. The main question is whether OpenAI's use of protected content can be considered "transformative" and, therefore, protected under the fair use doctrine or whether it is simply copying and profiting from the work of others.