A federal court in California has made a significant decision in a copyright lawsuit relating to training generative artificial intelligence models. In a decision issued this week, Northern District Court of California, Judge William Alsup, sided with some humanity.
Short background: Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson filed a copyright infringement lawsuit against humanity in August 2024, claiming that they built their generative AI product, Claude, by “stealing hundreds of thousands of copyrights.” “With permission, they personified rather than paying a fair price for the creations they misused.” In response, humanity argued that the use of copyrighted books is protected under the doctrine of fair use, whether downloaded from pirate sources or scanned from purchased printed copies.
Closely viewed victory
In his opinion on June 23, Judge Alsap determined that Claude Maker's decision to use millions of copyrighted books would be scanned from legally purchased printed copies, and downloaded from known pirate sites raise clear legal issues under copyright law. The court ultimately led a solid line between using legally obtained materials (which are considered fair use) to train AI models and stockpiled pirated content under the guise of innovation.
In its ruling on human claims seeking summary judgment, the court split the analysis into two core uses. (1) the use of books to train language models, and (2) the broader creation of permanent internal libraries, where many of them have become pirated. These uses were subjected to very different legal treatments.
> Training LLMS = Fair Use: The court formed humanity's allies on the question of whether to train Claude with a copy of the plaintiff's book. Citing the U.S. Supreme Court decision Googlev. Oracle A precedent for other transformative use, Justice Alsup found that using copyrighted books to teach AI and respond to new prompts using original text output is “typically converted.”
The court emphasized that the plaintiffs did not claim direct output copy. In other words, Claude did not let the user flow the text back. Instead, the courts likened the process to how humans read and internalize style, themes, and writing structures. “This was similar to readers who were aiming to become writers,” writes Alsup. “The LLMS of humanity trained their works to not race and reproduce them first, or replace them, but turn hard corners and create something different.”
> Scanning purchased books = fair use: The court also found that the digitization of humanity of millions of purchased printing books is a permitted fair use. This is because the printed copies were legally obtained and merely promoted internal storage and searchability. No additional copies have been distributed externally.
This type of shift in form – printed copies have been destroyed and replaced by digital copies – has been found to be consistent with past rulings, particularly on media transformation. Sony v. Universal and Author Guild v. Google.
> Pirated copy and undefined retention = not converted: It was in the decision to build a library using millions of pirated copies downloaded from illegal sources that humans were caught in legal trouble. Only some of these books were later used for training, but the company kept them all indefinitely. Human internal communications revealed that executives preferred copyright infringement as executives avoided the “legal/practical/business slog” in their license book. “The rationale” “cannot be squared under copyright law,” the court said.
Importantly, Judge Alsup rejected the notion that downstream transformative use (such as training in LLM) could sanitize upstream infringement. “The pirated copy for building a research library without paying was its own use and not transformative,” the order was held.
Humanity's claim that all copies serve a higher transformational purpose was legally inadequate, in the court's view. “There are no sculptures from AI companies' copyright laws.”
The whole picture
This ruling is one of the earliest substantive decisions in a series of high-profile copyright cases for AI developers. It provides early judicial verification to the argument that training AI models using legally obtained materials could be certified as fair use, but also provides a clear warning that origins are important. AI companies, despite their transformational goals, still obtain training data through illegal channels and remain responsible.
As content owners across publishing, fashion and entertainment are increasingly facing AI companies about the use of intellectual property, court decisions underscore the importance of how training materials are sourced from anywhere.
For now, the courts allow authors' claims relating to pirated copies to proceed, while giving humanity a favorable judgment on its transformative use defenses.
The case is Bartzv. EnthropicPBC, 3:24-CV-05417 (nd Cal.).