Meta and Microsoft were accused by multiple writers of using infringing works to train AI models

A report by Reuters alleges that several American authors have launched a collective lawsuit in a New York federal court, accusing industry giants such as Meta, Microsoft, and Bloomberg News of utilizing their copyrighted works to train artificial intelligence models without prior consent.

The lawsuit contends that Meta and Microsoft have employed the controversial ‘Book3’ database to instruct their AI models. Bloomberg News is also implicated in using this contentious database for similar purposes. Meanwhile, the AI research institution, Eleuther AI, is charged with providing the Book3 database, among others, to tech entities for training purposes.

White House artificial intelligence

The Book3 database is known to source its content from ‘Shadow Libraries’, which often include academically-oriented texts and novels obtained in potential breach of copyright laws. Notable entities operating as massive ‘Shadow Libraries’ encompass Genesis Library, Z-Library, and Sci-Hub, primarily offering content in a decentralized and anonymous manner.

Given that the large-scale natural language model ‘Llama 2’, co-developed by Meta and Microsoft, utilized the Book3 database for its training, it has garnered accusations from multiple American authors. These authors not only demand the prohibition of any misuse of their intellectual property but also seek significant reparations from Meta and Microsoft.

While Microsoft has remained tight-lipped on the issue, Meta has refrained from issuing any formal statements. Bloomberg News, on the other hand, staunchly denies having trained its commercial natural language model, BloombergGPT, on the Book3 database, asserting that it solely relies on its proprietary data.

Beyond the accusations levied against Meta and Microsoft, other tech behemoths, including OpenAI and Google, have previously faced allegations concerning the data utilized for AI training. As a consequence, in subsequent public declarations, these entities emphasized their commitment to train models using copyright-free data and content explicitly sanctioned for online dissemination, hoping to circumvent any further contentious debates.