Eight newspapers sue OpenAI, Microsoft for copyright infringement

NPR | By Bobby Allyn

Published April 30, 2024 at 1:13 PM MDT

Eight daily newspapers sued OpenAI and Microsoft on Tuesday, alleging that the maker of ChatGPT copied its work without permission or payment.

Eight daily newspapers including The New York Daily News and The Chicago Tribune sued OpenAI and Microsoft on Tuesday seeking to stop the tech companies' from using copyrighted articles to train artificial intelligence chatbots.

The copyright infringement suit claims the maker of ChatGPT copied stories from the newspapers "with impunity," never seeking permission or payment for the use of millions of articles that were used by the popular chatbot to respond to questions.

ChatGPT, which relies on vast amount of data scraped from the internet, has become a direct competitor to the newspapers at a time when the news industry has been pummeled by sinking advertising and subscription revenue, the lawsuit argues.

In addition, according to the suit, ChatGPT at times falsely attributes reporting to the newspapers in the answers it generates, tarnishing the reputation of the news outlets.

For instance, the suit cites an example in which ChatGPT states that The Chicago Tribune has recommended an infant lounger the paper never endorsed. In fact, the product the chatbot mentions had been linked to infant deaths and recalled.

In another example noted by the suit, ChatGPT was asked if smoking cures asthma, and the chatbot fabricated that The Denver Post published research indicating that smoking can be a cure for asthma. That assertion is obviously false, and the paper never published such research.

"This issue is not just a business problem for a handful of newspaper or the newspaper industry at large," lawyers for the newspapers wrote in the suit. "It is a critical issue for civil life in America."

In a statement, OpenAI said it takes "great care" to support news organizations in building its products.

"Along with our news partners, we see immense potential for AI tools like ChatGPT to deepen publishers' relationships with readers and enhance the news experience," said a spokesperson for OpenAI.

Microsoft, OpenAI's biggest financial backer, did not respond to a request for comment.

Attorneys for the newspapers, which are all owned by the New York investment fund Alden Global Capital, are asking for unspecified monetary damages and for the practice of using its copyrighted work to end.

The suit additionally asks for the destruction of any AI models OpenAI uses that incorporate works published by the newspapers — something AI experts have said would be nearly impossible to accomplish without completely rebuilding its models, an incredibly arduous and costly endeavor.

Re-training an AI model could cost "on the order of a hundred million dollars for earlier models, and a billion or even multiple billions for future models," said Gary Marcus, a professor of psychology at New York University and the author of the forthcoming book Taming Silicon Valley: How We Can Make Sure That AI Work For Us.

Marcus said trying to filter out copyrighted material from a dataset can be complicated because even if there was a master list of all the URLs that had to be removed, sites like Reddit often include versions of copyrighted stories in posts, meaning there is "no guarantee that you won't have scraped copies from elsewhere on the internet," Marcus added.

Newspapers' suit fuels legal showdown over AI and copyright law

It is the latest legal headache for OpenAI, which was hit with a similar copyright infringement lawsuit from The New York Times last year.

Together, the legal challenges are set to be a high-stakes court battle pitting one of the world's leading AI companies against news publishers, duking it out over an area of law that experts say is unresolved and murky.

To the newspaper publisher, the case appears straightforward: OpenAI, they claim, stole its copyrighted material.

"This lawsuit is about how Microsoft and OpenAI are not entitled to use copyrighted newspaper content to build their new trillion-dollar enterprises, without paying for that content," according to the suit.

Yet OpenAI has long claimed that its so-called "large language models," hoover up vast amounts of data from all corners of the internet under what is known as the "fair use" doctrine.

Under that legal theory, copyrighted works can be used without permission if certain criteria are met, like if it is substantially changed, or if the new work does not compete with the original.

The "fair use" doctrine allows researchers, teachers, critics and others to rely on copyrighted works without permission and payment.

Yet legal scholars have said it is far from certain that the law is on the on the side of AI companies, and it will likely take years of court battles and a long appeals process to determine whether leading technology firms like OpenAI have violated the law or not.

Other publishers have chosen to take a more conciliatory path with the company. The Associated Press, German publisher Axel Springer, which owns Politico and Business Insider, and, recently, The Financial Times, have all struck licensing agreements with OpenAI to be paid for use of copyrighted material.