Nvidia Defends Shadow Libraries Amid Growing Legal Pressure Over AI Training Data : Book News : Books & Review

May 29, 2024 08:32 AM EDT

Nvidia Defends Shadow Libraries Amid Growing Legal Pressure Over AI Training Data

Shadow libraries such as Z-Library and Library Genesis (Libgen) face more legal pressure for pirating books. The US Department of Justice charged Z-Library with criminal copyright infringement, while textbook publishers sued Libgen for mass distribution of copyrighted works.

Despite these pressures, Nvidia, the AI chipmaker benefiting from the AI boom, has emerged as an unexpected defender of these libraries.

Legal Battle Over Use of Shadow Libraries in AI Training

Nvidia's defense came to light during a lawsuit from book authors over using data repositories, including notorious shadow libraries, to create the Books3 dataset for training Nvidia's AI platform NeMo. Nvidia's court response denied the characterization of these repositories as shadow libraries and contended that hosting or distributing data from them does not necessarily violate the US Copyright Act.

The authors claimed their works were part of a dataset comprising approximately 196,640 books to help train NeMo in simulating ordinary written language. This dataset was taken down in October due to reported copyright infringement.

The authors argued that the takedown is an implicit admission by Nvidia that it used the dataset to train NeMo, thereby infringing on their copyrights. They sought unspecified damages for US individuals whose copyrighted works have been used to train NeMo's large language models over the past three years.

The authors linked the company to shadow libraries, emphasizing their significance to the AI-training community for hosting vast amounts of unlicensed copyrighted material. Nvidia disputed the classification of these libraries and was prepared to defend its use of their content.

This stance hinges on the court's agreement that transforming published works into weights governing AI outputs qualifies as fair use. Authors argued that these weights are derived from protected expressions in the training dataset, copied without their consent or compensation.

In contrast, companies like OpenAI have started licensing content from publishers to avoid such copyright disputes.

Nvidia's Stance on AI Training and Intellectual Property Law

The company argued that its AI training methods constitute fair use, describing the process as highly transformative. It involves adjustments to numerical parameters and weights that are not direct copies of the copyrighted works.

At Nvidia's AI tech conference, deputy general counsel Iain Cunningham expressed his view that intellectual property (IP) law will not extend to creations generated by AI models. Speaking on March 18 during a session on the ethical challenges of technology, Cunningham referenced the US Copyright Office's stance that AI-generated content cannot be copyrighted.

Cunningham emphasized that IP law protects human intellectual effort, a principle unlikely to change. He highlighted the difficulty distinguishing between human contributions and machine-generated content in AI creations. Since AI creations lack traditional human intellectual effort, he argued that it does not make sense for AI models to 'own property.'

According to Cunningham, the core purpose of IP law is to incentivize human creativity, which is unnecessary for machines. Therefore, extending IP protection to AI-generated content would be illogical. He acknowledged the increasing complexity for courts and decision-makers in determining which parts of AI-generated creations are protectable under IP law, especially as users incorporate their inputs into AI models.

Ongoing Legal Battles Amid High Profits

Until courts or lawmakers resolve this issue, companies using the Books3 dataset will likely face lawsuits from rights holders. These rights holders view AI models as exacerbating the harm caused by illegal shadow libraries.

Matthew Oppenheim, a lawyer for textbook publishers suing Libgen, described it as a thieves' den of illegal books, asserting the site's conduct as 'massively illegal.' Meanwhile, creators of these sites, like Anna of Anna's Archive, embraced the term 'shadow library' and continued to advocate for the free distribution of information.

Given its significant profits from the AI sector, Nvidia's defense aligns with its financial interests. In the first quarter of 2024 alone, Nvidia reported a record $26 billion in revenue. For AI companies aiming to maximize profits and dominate the market, accessible data from shadow libraries remains an attractive option. This economic incentive drives their resistance to copyright claims and fuels ongoing legal battles over the legitimacy of shadow libraries and the data they provide.

Legal Battle Over Use of Shadow Libraries in AI Training

Nvidia's Stance on AI Training and Intellectual Property Law

Ongoing Legal Battles Amid High Profits

featured articles

Why I Was Wrong About Speed Reading Apps

How can reading improve your writing skills?

5 Winning Lottery Books

Don't Miss! kids' furniture, décor & storage toys & games $100 to $200 with 70% off or more Coupons, Promo Codes, and Special Deals on May 6, 2017