Federal judges are on the side of humanity in key copyright rulings, and declare that artificial intelligence developers can train models using published books without the author's consent.
The decision, filed Monday in the U.S. District Court for the Northern District of California, sets a precedent that training in AI systems on copyrighted works constitutes fair use. Although it is not a guarantee that other courts will comply, Judge William Alsap's decision makes the first claims of dozens of ongoing copyright cases to give answers about fair use in the context of generating AI.
This is a question that has been raised by creatives from various industries for years since generative AI tools exploded into the mainstream, making it easy to create art from models trained with copyrighted works without the knowledge or permission of human creators.
Since 2023, AI companies have been struck by copyright lawsuits from media companies, music labels and authors. The artist has signed multiple open letters urging government officials and AI developers to restrict the misuse of copyrighted works. In recent years, companies have reached licensing transactions with AI developers and directed terms of use of artists' works.
Alsup was awarded in a lawsuit filed in August by three authors, Andrea Bartz, Charles Graeber and Kirk Wallace Johnson.
“The copies used to train a particular LLM were justified as fair use,” Alsup wrote in the ruling. “All factors other than the nature of copyrighted work support this outcome. The technology in question has been one of the most transformative things that many of us have seen in our lives.”
His decision was “very transformative” enough to fall into fair use, using humanity's books to train models, including versions of the flagship AI model Claude.
Fair use takes into account four factors, as defined by copyright law. The purpose of the use, what copyright work is being used (Creative Works gains greater protection than the works in fact), how much use has been used, and whether the use will undermine the market value of the original work.
“The court is pleased to recognize that using work to train LLMS is transformative,” Humanity said in a statement, citing the ruling. “In line with copyright purposes that enables creativity and promotes scientific advancement, “The LLM of humanity trained its works to not race, reproduce or replace them first.”
Bartz and Johnson did not respond immediately to requests for comment. Graver declined to comment.
All authors' works include “expressive elements” and have gained stronger copyright protections.
He added that making digital copies of purchased books is a fair use, but that downloading pirated copies for free is not a fair use.
But Aside from millions of pirated copies, ALSUP writes. It was “particularly reasonable” to copy the entire piece to train an AI model. This is because the model does not replicate copies of them for public access and “do not replace demand, not replace demand” in the original book.
His ruling stated that AI developers can legally train AI models with copyrighted works without permission, but they should acquire them through legitimate means that they do not involve pirated or other forms of theft.
Despite siding with AI companies on fair use, Alsup writes that he will still face trials of the pirated copies that humanity used to create a large central library of books used to train AI.
“After that humanity has purchased a copy of a book that he had previously stolen from the Internet, he cannot be spared responsibility for theft,” writes Alsup.