Five Key Questions Answered in the Copyright Battle Between New York Times and OpenAI
The gloves are off. The New York Times, one of the most respected publications in the United States with a global footprint, has sued ChatGPT maker OpenAI and its partner Microsoft, accusing them of multiple copyright infringements.
The Times becomes the first major publication to open a legal front against ChatGPT, which has taken the world by storm since its debut late last year – and has also raised existential questions about the future of jobs and certain industries, including the media.
The outcome of the case filed in a US court could affect how established publications and AI tools can coexist at a time when media around the world are experiencing a revenue squeeze in the digital age.
What does the lawsuit say?
The Times claims that:
• Independent journalism is “increasingly rare and valuable,” and The Times has given the world “deeply reported, expert, independent journalism” over the years.
• Accused “the illegal use of The Times’ work to create competing AI products threatens The Times’ ability to provide that service”
• Using others’ valuable intellectual property without paying “has been very profitable for defendants”
• The U.S. “Constitution and copyright law recognize the critical importance of giving authors exclusive rights to their works,” but defendants “have refused to recognize that protection.”
• The law does not allow the kind of systematic and competitive infringement committed by the defendants
What is at the heart of the controversy?
The original content created by news publishers (in this case The Times) and its copyright infringement by free distribution via AI chatbots (ChatGPT) is the main bone of contention in this heated debate.
The most critical question at this point is: how exactly has ChatGPT threatened The Times’ journalism? The answer lies in how AI chatbots, which are examples of so-called large language models (LLM), acquire their knowledge.
By now, we know what amazing things ChatGPT can do – help us write and edit, create a story from scratch, do math and code, and even translate text from one language to another. And to be able to do that, bots like ChatGPT need a lot of training involving huge amounts of data.
Although critics say AI makers have not been very transparent about where the data comes from, the chatbots in question process that data through a complex neural network. Transformers, for example, are a popular neural network architecture that can read volumes of text and find patterns, thus growing smarter with training and the ability to engage in human-like conversations.
The Times’ lawsuit objects to its use of data, read journalism, for free to train ChatGPT. It Alleges: Defendants’ generative artificial intelligence (“GenAI”) tools are based on large language models (“LLM”) built by copying and using millions of The Times’ copyrighted news articles, in-depth studies, opinion pieces, reviews, guides and more.
The Times website is behind a paywall – and users must pay subscription fees to access its online offering. The fear is that if the same content is available for free via AI chatbots, The Times will lose subscribers – and in turn revenue. If this becomes a wider pattern, the already struggling media industry could have huge consequences.
What does OpenAI say?
OpenAI’s decision not to contest the lawsuit is equally significant. In fact, the company has tried to settle the case out of court.
The truth is that AI cannot sustain and improve without the data it is trained on, and this is where the copyright issue comes into play.
OpenAI, on the other hand, emphasizes that ChatGPT was developed using:
• Information publicly available on the Internet
• Information they license from third parties
• Information provided by their users or trainers
OpenAI says on its website that ChatGPT does not copy or store training data in a database, adding:
“…we only use publicly available information that is freely and openly available on the Internet – for example, we do not look for information behind a paywall or on the “dark web”. We use filters and remove information that we don’t want our models to learn about or bring out, such as hate speech, adult content, sites that primarily collect personal information, and spam. We then use the data to train the models”
What is the counterargument?
The Times’ lawsuit cites several conversations with ChatGPT to combat this. In one of these, it shows – using a screenshot – how ChatGPT allegedly quoted “parts of the 2012 Pulitzer Prize-winning New York Times article” Snow Fall: The Avalanche at Tunnel Creek. The Times says it was “created in response to a prompt” by a user who complained he couldn’t access the article behind a paywall.
The Times also cited Microsoft’s Bing search index, “which crawls and categorizes The Times’ online content to produce responses that contain verbatim excerpts and detailed summaries of Times articles that are significantly longer and more detailed than traditional search engines.” Microsoft is a major supporter of OpenAI, having invested significantly in the AI startup.
The Times has sought damages and demanded that the defendants be ordered to stop using its content. “This action seeks to hold them liable for billions of dollars in statutory and actual damages owed by them for the illegal copying and use of The Times’ uniquely valuable works,” The Times says, without specifying the amount.
The Times alleges that the defendants sought to take advantage of its “huge investment in its journalism by using it to build substitute products without permission or payment” and that “Microsoft’s deployment of Times-trained LLMs across its product portfolio helped boost its market value by a trillion dollars in the past year alone.”
Microsoft has yet to respond to the lawsuit. OpenAI says it respects “the rights of content creators and owners and is committed to working with them to ensure they benefit from A.I. technology and new revenue models.”
Have there been similar cases?
The Times lawsuit is an important episode in a rapidly evolving chapter in which new ethical and operational questions surrounding the use of artificial intelligence tools frequently arise.
In September, Game of Thrones author George R.R. Martin and other well-known authors sued OpenAI for copyright infringement. The following month, Universal Music Group and other music publishers sued artificial intelligence company Anthropic for distributing what they said were copyrighted lyrics.
At the other end is the Associated Press news agency, which has struck a deal with OpenAI to “license the AP news archive to an AI company.”
And then there’s Axel Springer, publisher of Business Insider and Politico, which has partnered with OpenAI. A German media group is paid to allow ChatGPT to summarize its articles in ChatGPT’s responses.
Good or bad, AI’s impact on journalism and the media industry is unmistakable—and change is inevitable.
“It’s easy to dismiss legal filings as an inevitable sign of a tech boom – if there’s hype and money, lawyers will follow. But there are genuinely interesting questions at play here – about the nature of intellectual property and the pros and cons of going full speed into a new technology before anyone knows the rules of the road. Yes, generative AI now seems inevitable. These battles can shape how we use it and how it affects business and culture,” the article on the Vox website states.