AI Training on Public Books Is Allowed by the Judge in a Historic Case
Breaking new ground, Judge William Alsup has made a landmark ruling, siding with AI firm Anthropic and establishing that it's completely legal to train AI models on publicly available books—without needing to seek permission from the authors. This is a game-changer as it's the first time we're seeing courts lend their support to the idea that utilizing copyrighted materials for training large language models (LLMs) amounts to fair use, effectively freeing AI companies from liability.
This verdict, however, doesn’t sit well with authors, artists, and publishers. They’ve launched a barrage of lawsuits against tech behemoths—OpenAI, Meta, Midjourney, Google, among others. Although Judge Alsup’s opinion is not a precedent-setting green light, it undoubtedly paves the way for courts to favor tech companies over creatives in future cases.
The outcome of such litigations hinge on the judges’ perception of the fair use doctrine. This is a notoriously complex part of copyright law that hasn't seen any updates since 1976, a bygone era before the dawn of the internet, and certainly before the emergence of generative AI training sets.
Fair use criteria evaluate the purpose of the material's usage (e.g., education or parody), monetization, and how much the derivative work deviates from the original.
Companies, such as Meta, have used similar arguments justifying fair use of copyrighted content for AI training. However, until the latest ruling, the judicial stance on this matter was nebulous.
In this specific case, plaintiffs raised concerns over how Anthropic stored their works in the lawsuit, Bartz v. Anthropic. The company had the staggering ambition to compile a "central library" housing "all the books in the world" for eternity. Further controversy stems from the fact that millions of these copyrighted books were illegally obtained for free from pirate websites. While Anthropic's use of these materials for training was deemed fair use by the judge, there will be a separate trial to investigate the nature of the "central library."
Pending a separate trial on the pirated books that found their way into Anthropic's central library and the associated damages, Judge Alsup clarified in his judgment, “That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for theft but it may affect the extent of statutory damages.”