Meta accused of hacking books to train his AI

F1 delphine. sport Monday 13th January 2025 05:29 AM REPORT

Enough to reignite tensions between creators and artificial intelligence. This week, a collective of authors, including Ta-Nehisi Coates and comedian Sarah Silverman, are suing Meta. At issue: the alleged use of their works, without authorization, to train Llama, the company's language model designed for its chatbots.

LibGen, a database already condemned

The court documents, based on internal exchanges at Meta, point the finger at the role of Mark Zuckerberg. The CEO would thus have validated the exploitation of LibGen, or Library Genesis, a “ ghost library» Russian containing tens of terabytes of digitized books, including works protected by copyright.

Already fined $30 million in 2024 for copyright infringement, this database was nevertheless integrated into Meta's work, despite internal warnings about the illegal nature of this action.

An internal memo that sparks debate

According to The Guardian an internal company memo refers to Mark Zuckerberg's initials: “After escalation to MZ, the AI team received the green light to use LibGen.» A decision which, according to other documents, would have sparked debates within the company.

READ – HarperCollins sells its books to AI, an offer deemed “insulting”

Meta engineers have in fact expressed reservations about access to this data, moreover via company equipment. Aware of the illegality of their approach and the harm inflicted on authors and publishers, they also warned of the possible repercussions on… the image of the company: “Media coverage mentioning the use of a database that we know to be hacked, such as LibGen, risks harming our negotiations. »

A relaunched affair

This complaint follows a first action brought in 2023 by the same authors. Although dismissed by federal Judge Vince Chhabria, the new complaint now relies on evidence bolstering the copyright infringement charges. The plaintiffs are also considering adding a computer fraud charge, which they believe is appropriate in the situation.

Judge Vince Chhabria, however, remains cautious, and declares that he doubts the “solidity of the accusations of fraud and management of copyrights”.

A legal framework that is still unclear

The controversy rekindles a crucial debate: the use of protected texts to train artificial intelligence models. Creators and publishers are warning of the risks to their income, while the giants of thetechexploit these bases without a clear legal framework.

A recent study by the Authors' Licensing and Collecting Society (ALCS) reveals that 77% of authors do not know if their works have been used against their will, and 91% demand a right of consultation before any transfer.

READ – Future of publishing: an AI tool that assesses the quality of books

For authors and publishing houses alike, the challenge is major: protecting their rights and establishing a viable economic model in a world where generative AI is disrupting the creative ecosystem. But without rapid legislation, legal battles risk piling up…

Image credits: Alpha Photo CC BY-NC 2.0

By Louella Boulland
Contact : [email protected]