A new proposed class action lawsuit has been filed in the federal court in Northern California, accusing tech giant Apple of unlawfully utilizing copyrighted books to train its AI models. This legal action has brought attention to the ongoing debates surrounding the ethical and legal implications of AI training datasets.
As reported by Reuters, authors Grady Hendrix and Jennifer Robertson claim that Apple has incorporated a pirated dataset that includes their works. The lawsuit states, “But Apple is building part of this new enterprise using Books3, a dataset of pirated copyrighted books that includes the published works of Plaintiffs and the Class. Apple used Books3 to train its OpenELM language models. Apple also likely trained its Foundation Language Models using this same pirated dataset.”
The accusations stem from information provided by Apple in its documentation regarding OpenELM, an open-source model released on Hugging Face last year. The documentation mentions RedPajama as one of the datasets utilized in the model. According to the lawsuit, RedPajama employs a dataset known as Books3, which the plaintiffs assert is recognized as “a known body of pirated books.”
The authors are requesting the court to permit the lawsuit to advance as a class action against Apple. They seek several remedies following a jury trial, including:
Approval for the action to proceed as a class action, with plaintiffs serving as class representatives and their counsel as class counsel;Awarding statutory damages, compensatory damages, restitution, disgorgement, and any other relief permissible by law or equity;A permanent injunction to prevent Apple from the alleged unlawful, unfair, and infringing conduct;An order for the destruction of all Apple Intelligence or other Large Language Models (LLMs) that include the plaintiffs’ and class members’ works, as stipulated under 17 U.S.C. § 503(b);Coverage of costs, expenses, and attorney fees as permitted by law;Any other relief the court finds appropriate and just.This lawsuit emerges in a landscape of mixed outcomes in similar cases. Recently, Anthropic reached a record $1.5 billion settlement in a closely related matter. Meanwhile, Meta faced a similar trial, which resulted in a favorable ruling for the company. The judge determined that Meta's use of copyrighted books for AI training was protected under the doctrine of fair use. This perspective has been echoed by notable figures, including former President Donald Trump, who remarked on the impracticality of expecting a successful AI program while requiring payment for every article and book studied.
The ongoing debate raises a crucial question: should authors be compensated for the use of their books in training AI models? We invite you to share your thoughts in the comments section below.
For those interested, explore accessory deals available on Amazon to enhance your reading and learning experience.