OpenAI faces a pile of legal troubles in lawsuit over AI training for novels

OpenAI Inc. has been hit with another copyright class action lawsuit alleging that the company’s wildly popular artificial intelligence chatbot ChatGPT was trained off books without the author’s permission.

A complaint filed Wednesday in San Francisco federal court said ChatGPT’s machine learning training datasets were derived from books and other texts “copied by OpenAI without consent, without credit, and without charge.”

OpenAI and other generative AI companies have faced a flurry of intellectual property and privacy lawsuits in recent months as Congress and government regulators hunt for reigns in the burgeoning industry.

This week, OpenAI filed another complaint that the machine learning model and text-to-image generator DALL-E behind ChatGPT illegally collected personal information on the internet in violation of various state and federal privacy laws. was sued in a broad class action lawsuit. The company filed another copyright lawsuit last fall, alleging that its AI coding assistant, Copilot, copied open-source software without proper copyright notices.

The court has yet to determine whether using copyrighted material to train a generative AI model constitutes copyright infringement.

Wednesday’s lawsuit, filed in the United States District Court for the Northern District of California by the same law firm that filed the co-pilot case, was filed by science fiction and horror author Paul Tremblay and novelist Mona Awad.

They say that ChatGPT’s ability to provide generally accurate summaries of books led them to believe that the work was “copied by OpenAI and incorporated without permission by the underlying OpenAI language model.”

The complaint cites OpenAI’s 2020 paper introducing ChatGPT-3, stating that 15% of the training dataset comes from “two internet-based book corpora.” The authors claim that one of these book datasets contains more than 290,000 of his titles, which are from “shadow libraries” such as Library Genesis and Sci-Hub. bottom. These libraries use the torrent system to illegally publish thousands of copyrighted works.

“These grossly illegal shadow libraries have long been of interest to the AI training community,” the complaint states.

The lawsuit also said ChatGPT stripped books of their copyright notices in violation of the Digital Millennium Copyright Act.

OpenAI did not immediately respond to a request for comment.

Joseph Saveri Law Firm LLP is representing the author.

The action, Tremblay v. OpenAI Inc., ND Cal., No. 3:23-cv-03223, was filed on June 28, 2023.

Source link