ChatGPT maker OpenAI faces class action lawsuit over data used to train AI

SAN FRANCISCO — A California-based law firm has filed a class action lawsuit against OpenAI, alleging that the artificial intelligence company that created the popular chatbot ChatGPT committed massive breaches. The copyrights and privacy of countless people were violated when we used data collected from the internet to train our technology.

The lawsuit seeks to test a new legal theory that OpenAI violated the rights of millions of internet users when it used social media comments, blog posts, Wikipedia articles and family recipes. . The law firm Clarkson, which led the lawsuit, has filed large class-action lawsuits on issues ranging from data breaches to false advertising.

The company’s managing partner, Ryan Clarkson, said the company wants to represent “the real people whose information was stolen and exploited commercially to develop this very powerful technology.” said.

The lawsuit was filed Wednesday morning in federal court for the Northern District of California. A spokeswoman for OpenAI did not respond to a request for comment.

The lawsuit gets to the heart of a big unsolved problem with the proliferation of “generative” AI tools such as chatbots and image generators. The technology works by taking billions of words from the open internet and learning how to build inferences between them. After consuming enough data, the resulting “large scale language model” can predict what to say in response to a prompt, so you can write poetry, conduct complex conversations, and pass professional exams. You will be able to But the humans who wrote those billions of words did not consent to companies like OpenAI using them for their own benefit.

Inside a secret list of websites that make AI like ChatGPT sound smart

“All of this information is being used at scale, even though it was never intended to be used in large-scale language models,” Clarkson said. He said he would like the court to put some guardrails on how AI algorithms are trained and how people are compensated when their data is used.

The company already has a group of plaintiffs and is actively seeking more.

The legality of using data obtained from the public internet to train tools that could bring significant benefits to developers is still unclear. Some AI developers argue that using data from the internet should be considered “fair use.” This is a copyright law concept that makes an exception when the material is altered in a “transformative” way.

The fair use issue is “an open issue that will be fought in court in the coming months and years,” said Katherine Dettmer, an intellectual property attorney at the law firm Gunderson Dettmer, which primarily represents tech startups. Gardner said. UPS. Artists and other creative professionals who can prove that copyrighted work was used to train AI models may have arguments against companies using AI models, but web People who just post or comment on the site are unlikely to win damages, she said. she said.

“When you put content on a social media site or any other site, you typically give the site a very broad license that allows them to use the content in any way,” Gardner said. . “It would be very difficult for ordinary end-users to claim that they are entitled to any form of payment or compensation for using their data as part of training.”

The lawsuit also joins a growing list of legal challenges against companies hoping to build and profit from AI technology. OpenAI and Microsoft filed a class-action lawsuit in November over how computer code from the Microsoft-owned online coding platform GitHub was used to train AI tools. In February, Getty Images sued small AI startup Stability AI for illegally using its photos to train image-generating bots. And earlier this month, OpenAI was sued for defamation by a radio host in Georgia, accusing it of crafting a sentence that unfairly accused ChatGPT of fraud.

OpenAI is not the only company using large amounts of data collected from the open internet to train AI models. Google, Facebook, Microsoft, and more and more companies are all doing the same. But Clarkson said he decided to pursue OpenAI because when it captured the public’s imagination with ChatGPT last year, it played a role in encouraging larger rivals to push their own AI. .

“They are the companies that started this AI arms race,” he said. “They are the natural first target.”

OpenAI has not disclosed what data has been incorporated into its latest model, GPT4, but we know that previous versions of the technology digested Wikipedia pages, news articles and social media comments. It is shown. Chatbots from Google and other companies use similar data sets.

Regulators are discussing new legislation that would require companies to be more transparent about what data is fed into AI. Gardner, an intellectual property attorney, said the lawsuit could also trigger judges to force companies like OpenAI to hand over information about what data they used.

Some companies are trying to stop data scraping by AI companies. In April, music distribution company Universal Music Group asked Apple and Spotify to block scrapers, according to the Financial Times. Social media site Reddit has blocked access to its data stream, citing the big tech company’s years of collecting comments and conversations on the site.Twitter owner Elon Musk threatened to sue Microsoft is grateful for using the Twitter data obtained from the company to train its AI. Musk is starting his own AI company.

A new class action lawsuit against OpenAI takes that allegation further, accusing people who signed up to use the company’s tools that the data they input into their models will be used to train new products the company develops. It claims the company has not been transparent enough about what could be done. will generate revenue from things like plugin tools that allow other companies to use his OpenAI. OpenAI also claims it doesn’t do enough to prevent children under the age of 13 from using its tools, a accusation that other tech companies, including Facebook and YouTube, have had for years. .

Source link