Microsoft is facing a lawsuit from a group of authors who allege the tech giant used pirated copies of their books to train its artificial intelligence system, Megatron. Filed in a New York federal court, the complaint claims Microsoft infringed on the copyrights of authors such as Kai Bird, Jia Tolentino, and Daniel Okrent by incorporating their works into a dataset used to develop its AI model without permission.
This legal action adds to a growing number of lawsuits filed by authors, publishers, and media outlets targeting major tech companies like Meta, Anthropic, and OpenAI for allegedly misusing copyrighted content in the development of generative AI systems. The case highlights the mounting legal challenges facing AI firms as they rely on vast amounts of textual data to train their algorithms.
The lawsuit comes on the heels of a California judge’s ruling in a separate case against Anthropic, which concluded that using copyrighted works for AI training might fall under fair use under U.S. copyright law, but that the use of pirated versions could still constitute a violation. This ruling was the first of its kind in the U.S. and sets an important precedent as similar cases unfold.
The authors accuse Microsoft of using a dataset containing nearly 200,000 pirated books to develop Megatron, enabling it to replicate the styles, themes, and voices of the original works. They argue that this practice unfairly exploits the labor and creativity of content creators to generate new content through AI that closely mimics human authorship.
While Microsoft has not yet responded to the allegations, tech companies in similar lawsuits have maintained that their AI systems make “fair use” of copyrighted material, framing it as transformative rather than exploitative. The authors are seeking a court order to halt Microsoft’s alleged infringement and are demanding statutory damages of up to $150,000 for each misused work.
Source: Reuters
