Authors suing OpenAI for mass novel copyright infringement – The Hollywood Reporter

Uncarb/Getty Images

Another lawsuit has been filed against OpenAI over illegally collecting information on the web to train an artificially intelligent chatbot, but this time ChatGPT has allegedly infringed the novel’s copyright. by the claiming authors.

A proposed class action lawsuit, filed in San Francisco federal court on Wednesday, alleges that OpenAI “relied on the mass harvesting” of copyrighted works “without consent, credit or compensation.” there is The company is seeking a court order that it infringed the author’s copyrighted work when it illegally downloaded a copy of the novel to train an AI system, and that ChatGPT’s response constitutes infringement.

Generative AI companies face legal challenges over the materials used to train their AI systems as courts battle whether the practice qualifies as fair use. In addition to the lawsuit it filed Wednesday targeting the automated copying of personal data from hundreds of millions of people, OpenAI said billions of lines of computer code that its AI technology analyzes to generate its own code are piracy. We are facing a proposed class action lawsuit alleging that . Getty Images also sued AI art generator Stable Diffusion for copyright infringement.

As proof of copyright infringement, the lawsuit filed by the authors points to ChatGPT generating novel synopsis on prompt. They argue that it is “only possible if ChatGPT is trained on the plaintiff’s work.”

And because AI systems cannot function without information extracted from the material, the software program known as the Large Scale Language Model that powers ChatGPT itself infringes derivative works created without plaintiffs’ permission, Infringes exclusive rights under copyright. Please act,” the lawsuit reads. A derivative work is a work based on an existing copyrighted work.

The authors raise the issue that OpenAI illegally downloads hundreds of thousands of books to train AI systems. In June 2018, the company revealed that it supplied the first iteration of its large-scale language model, GPT-1, with a collection of over 7,000 novels on BookCorpus assembled by a team of AI researchers.

“They copied the book from a website called Smashwords.com, which hosts unpublished novels that readers can read for free,” the complaint states. “However, most of those novels are copyrighted. They were copied into the BookCorpus dataset without consent, credit or compensation to the authors.”

Subsequent versions of OpenAI’s large-scale language model were trained on a large number of copyrighted works, according to the complaint. In his 2020 paper introducing GPT-3, the company reveals that 15% of the training dataset comes from “two internet-based book corpora,” simply called “Books1” and “Books2.” I made it It is not clear what works are included in these datasets, but the authors claim that those works are from “notorious shadow library websites” such as Library Genesis, Z-Library, Sci-Hub, and Bibliotik. It claims to be from

“These grossly illegal shadow libraries have long been of interest to the AI training community. For example, an AI training dataset called ‘Books3’ published in December 2020 by EleutherAI contains the It contains reproductions and contains about 200,000 books,” he wrote. The author’s attorney, Joseph Saveri, is also representing programmers in class action lawsuits against OpenAI and Microsoft.

OpenAI does not disclose information about the source of its datasets.[g]We considered both the competitive environment and the safety implications of large-scale models like the GPT-4,” the company said last year.

The lawsuit, which seeks to represent a national class of hundreds of thousands of American authors, was brought by Paul Tremblay and Mona Awad.Tremblay wrote a novel hut at the end of the worldadapted by M. Night Shyamalan, knock on the cabin. The complaint alleges direct copyright infringement, agency copyright infringement, violation of the Digital Millennium Copyright Act, unjust enrichment and negligence.

Microsoft, which owns OpenAI and some of its AI companies, did not respond to requests for comment.

At a May hearing before the House Judiciary Subcommittee on Intellectual Property and the Internet, courts looking at the intersection of AI and copyright law, key Hollywood officials allege illegal art collection to train AI systems. argued in favor of a bill to ban the rampant use of “The rapid adoption of generative AI systems poses an existential threat to the livelihoods and continuation of our creative professions, unless we take immediate legal and economic steps to address these emerging problems. It’s seen as a threat,” said association president Ashley Irwin. Composers and lyricists who attended the hearing. “Prioritizing policy and regulation is essential to protecting the intellectual property and copyrights of creators and preserving America’s diverse and dynamic cultural landscape.”

In addition to providing appropriate credit, Irwin said, AI companies should ensure creators’ consent to use their work to train AI programs, and that any new work that is subsequently created is fair. He stressed that he should be obligated to compensate at reasonable market rates.

Source link