AI litigation explained: Who can be sued?

AI For Business


Many feel it's time for AI companies to pay for the free data lunch that made their generation systems so big and powerful.

Recently, a number of lawsuits have been filed in the United States and Europe demanding compensation from AI companies. Plaintiffs include writers, artists, and major media organizations who have consistently expressed concerns about AI stealing their works and producing mediocre derivative works.

An open letter from the Authors Guild, signed by more than 8,500 authors including Margaret Atwood, Dan Brown and Jodi Picoult, urges technology companies responsible for generative AI applications such as ChatGPT and Bard to do the right thing. We request that you stop using our works without permission. or compensation. The authors hope that companies will pay for the data they collect for training, the “food” of their AI systems: endless meals without charges.

Authors have also expressed concern that generative AI could threaten their profession by flooding the market with machine-written content based on their work. This has been an issue in recent months as Amazon took action against AI authors spamming bestseller lists with their generated works.

Prior to the release of the Authors Guild letter, two North American authors, Mona Awad and Paul Tremblay, filed a lawsuit against OpenAI, alleging that the company violated copyright laws. The lawsuit argued that OpenAI violated copyright law because ChatGPT was producing accurate summaries of the authors' works, and therefore must have been trained on the authors' works. They're not alone. Author and comedian Sarah Silverman is also suing OpenAI and Meta for illegally copying her memoir. person who wets the bed, without permission. However, due to the way generative AI works, this argument may not hold up in court.

Individual writers and artists are not the only plaintiffs. In December 2023, new york times became the first major American news publication to sue OpenAI for using copyrighted material in AI development.

What is generative AI?

Generative AI is the technology that powers ChatGPT and Bard. Text-based generation AI uses algorithms to predict the next likely word in text and generates that text based on prompts from the user. ChatGPT knows what to generate because it is trained on a large corpus of publicly available data from the internet. It learns patterns from training and matches those patterns to prompts from the user.

Generative AI is typically a black box AI system. This means that no one, not even the programmers, understands the exact steps the machine takes from input to output. When input comes in, magic happens and output comes out.

All machine learning and generative AI tools use some kind of existing work.

Why do people file lawsuits?

People are suing AI companies over copyright. ChatGPT is trained with data from the internet, but without the permission of the data creator. For example, GPT-3 was trained on sources such as Wikipedia and Reddit. However, conversations about or parts of copyrighted works may be present in training materials, and there may be sufficient context in the larger language to accurately summarize those copyrighted works. can be provided to the model.

On a larger scale, people are suing because AI is a black box and it's impossible to know how it works at a detailed level. The fear is that people will use AI to avoid responsibility for their decisions and what the AI ​​produces.

“If AI companies are allowed to sell AI systems that are essentially black boxes, they could become devices that ultimately justify the ends and means,” said a lawyer who has handled several lawsuits. said Matthew Butterick on his blog. “Soon, we will no longer delegate decisions to AI systems because they perform better. Rather, we will delegate decisions to AI systems because they can avoid everything we can’t do. It will be.”

What kind of AI lawsuits are being filed?

Numerous lawsuits have been filed against generative AI companies regarding copyright and abuse. Here are some of the companies that have been sued.

GitHub, Microsoft, OpenAI

A class action lawsuit has been filed against these companies in connection with GitHub's Copilot tool. This tool generates code predictively based on what the programmer has already written. The plaintiffs allege that Copilot copies and republishes code from GitHub without complying with the requirements of GitHub's open source license, including failing to provide attribution. The complaint also includes allegations of GitHub's mishandling of personal data and information, as well as claims of fraud. The complaint was filed in November 2022. Microsoft and GitHub have repeatedly asked for the lawsuit to be dismissed.

Stability AI, Midjourney, DeviantArt

The complaint against these AI image generation providers was filed in January 2023. Plaintiffs alleged that the system directly infringed their copyrights by training on works created by Plaintiffs and by creating unauthorized derivative works. The complaint also takes issue with the fact that the tool can be used to generate works in an artist's style. The judge in the case, William Orrick, said he intended to dismiss the case.

Stability AI

In January 2023, Getty Images issued a complaint against Stability AI in the UK for allegedly copying and processing millions of images and associated metadata owned by Getty. Getty filed another lawsuit against Stability AI a few days later in the U.S. District Court for the District of Delaware. It raised a number of copyright and trademark-related claims, pointing to “bizarre or grotesque” generated images that contained Getty Images' watermarks, thus damaging Getty's reputation.

OpenAI

Authors Paul Tremblay and Mona Awad are suing OpenAI for violating their copyright. Mr. Baterick is one of the attorneys representing the authors. The complaint estimates that more than 300,000 books were copied using OpenAI's training data. The lawsuit seeks an unspecified amount of money. The lawsuit was filed in June 2023.

OpenAI and Microsoft

new york times is suing OpenAI for copyright infringement. The lawsuit, filed in December 2023, claims millions of dollars in new york times The articles are used to train and develop OpenAI's chatbots and other technology, which now competes with news organizations as a trusted source of information. The lawsuit also claims that OpenAI's language model mimics language models. Temporal Style it and quote it verbatim. times is the first major American news organization to accuse OpenAI and Microsoft of copyright infringement. times approached the two companies earlier this year to discuss copyright issues, but no agreement was reached.

Eight other newspapers filed suit against OpenAI and Microsoft on April 30, 2024, accusing them of plagiarizing millions of copyrighted news articles to train their AI. The newspapers targeted in the lawsuit include the New York Daily News, Chicago Tribune, Denver Post, Mercury News, Orange County Register, St. Paul Pioneer Press, Orlando Sentinel, and South Florida. It's the Sun Sentinel.

Meta and OpenAI

Sarah Silverman's lawsuit against Meta and OpenAI alleges copyright infringement, stating that ChatGPT and Large Language Model Meta AI (Llama) were trained on illegally obtained datasets containing her copyrighted works. Ta. The complaint alleges that the books were obtained from shadow libraries such as Library Genesis, Z-Library, and Bibliotek, where you can torrent books. Torrenting is a common method of downloading files without proper legal permission. Specifically, Meta's language model Llama was trained on a dataset called Pile that uses data from Bibliotek, according to a paper from EleutherAI, which assembled Pile. The lawsuit was filed in July 2023.

Google

A class action lawsuit has been filed against Google for alleged misuse of personal information and copyright infringement. Some of the data specified in the lawsuit includes photos from dating sites, Spotify playlists, TikTok videos, and books used to train Bird. The lawsuit, filed in July 2023, says Google could pay at least $5 billion. The plaintiffs chose to remain anonymous.

This is not the first such copyright lawsuit against a major technology company. In 2015, the Authors Guild sued Google for creating digital copies of millions of books and making some of them available to the public. The court ultimately sided with Google, holding that the work was transformative and did not provide a replacement for books on the market.

What questions are answered in these cases?

The above cases are important in answering the following questions:

  • Do I need a license to train models with copyrighted material? Generative AI systems create copies of training materials as part of the training process. Does that interim copy require a license or is it fair use?
  • Does the generated AI output infringe the copyright of the material used to train the model? It infringes copyright if the generated output constitutes a derivative work or infringes the right to reproduce the training data. Courts will need to determine whether the similarities between the output data and the training data originate from protected or unprotected material. Who is responsible if AI infringes copyright?
  • Does generated AI violate restrictions on deletion, modification, or falsification of copyright management information? The Digital Millennium Copyright Act provides restrictions on the removal or modification of copyright management information, such as watermarks. This is exemplified in the Stability AI case, where watermarks reproduced on works produced by Stable Diffusion constituted false copyright management information.
  • Does producing work in someone's style infringe on that person's rights? This is known as right of publicity, varies by state. You may not use anyone's likeness, name, image, voice, or signature for commercial purposes.
  • How do open source licenses apply to training AI models and distributing the resulting output? Plaintiffs in the Copilot case argued that republishing Copilot training materials without attribution and failing to make Copilot itself open source violates the terms of the open source license.

As litigation continues to take shape and answers emerge, companies involved in generative AI tools should take note of guidance on the intersection of AI and intellectual property and whether risk mitigation strategies are needed.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *