A federal judge in San Francisco has determined that training AI models on copyrighted works without specific permission is not a violation of copyright law.
US District Judge William Alsup said that humanity, an AI company, can assert “fair use” defense against copyright claims to train Claude AI models with copyrighted books. However, the judge also determined that exactly how those books were obtained was important.
Alsup supported humanity's claim that buying millions of books and digitizing them for use in AI training is “fair use.” The judge said it was not okay that humanity downloaded millions of pirated copies from the internet and maintained a digital library of those pirated copies.
The judge ordered another trial regarding the human custody of those pirated books. Also, judges have not yet ruled whether to grant action status for case classes. This could dramatically increase the artificial risk if it turns out that the author's rights are infringed.
I discovered that it is “fair use” to train AI models in a book written by AI companies (Andrea Burtz, Charles Graver and Kirk Wallace Johnson). AI models without the consent of the owner?
Dozens of AI and co-copyright lawsuits have been filed over the past three years, most of which lie on the concept of fair use. This is a doctrine that allows copyrighted material to be used without permission when its use is sufficiently transformative.
The Allsup ruling may set precedents for these other copyright cases. Many of these judgments could also be appealed. This means it will take years to turn AI and copyright in the US.
According to the judge's decision, the use of humanity of books to train Claude was “very transformative” and constituted “fair use under section 107 of the Copyright Act.” Humanity told the court that its AI training was not only acceptable, but it is consistent with the spirit of US copyright law. The company said it copied the book to “study the plaintiff's writing, extract undeveloped information from it, and use what it has learned to create innovative technologies.”
Training AI models using copyrighted data may be considered fair use, while Alsup may be considered fair use with copyrighted data, but the independent action of building and storing Pirated Books searchable repositories is not. Alsup said the fact that humanity later purchased copies of books that had previously stole the Internet “cannot be spared liability for theft, but could affect the extent of statutory damage.”
The judge also asked about humanity's approval that he ended up downloading pirated books to save time and money to build an AI model. “We suspect that this order can meet the burden of explaining why the defendant infringer is to download source copies from a pirate site that he may have legally purchased or otherwise accessed,” Alsup said.
The “transformative” nature of AI output is important, but that's not the only thing that matters when it comes to fair use. There are three other factors to consider. What kind of work is (creative works gain more protection than factual ones). How much work is being used (the less, the better). And whether new uses will damage the original market.
For example, there is a continuing meta and open lawsuit by comedian Sarah Silverman and two other authors, filing a copyright infringement lawsuit in 2023, claiming that the pirated version of their work was used without permission to train AI language models. The defendant recently argued that use falls under the doctrine of fair use as “study” AI systems works to “learn” and create new, transformative content.
Federal District Judge Vince Chhabria noted that even if this is true, AI systems are “changed dramatically, you might even say they're erasing the market for their work.” However, he also had problems with the plaintiffs, saying their lawyers were not providing sufficient evidence of potential market impact.
The decision of Alsup was significantly different from that of Chhabria in this respect. Alsup said it is undoubtedly true that Claude could lead to increased competition for the author's work, but that this kind of “competitive or creative displacement is not like a competitive or creative evacuation related to copyright law.” The purpose of copyright is to encourage the creation of new works rather than protecting the author from competition, and he likened the objections to Claude to the fear that teaching schoolchildren to write well might also lead to a competitive book explosion.
Also noted that Alsup built a “guardrail” on Claude to prevent humanity from incorporating “guardrails” into Claude and generating output directly plagiarized the book they were trained to.
Neither anthropology nor the plaintiff's lawyers immediately responded to requests for comment on the ALSUP decision.