U.S. Copyright Office Releases Pre-Publication Report on AI Training and Copyright Law

Written by Jeremy Werner

Jeremy is an experienced journalist, skilled communicator, and constant learner with a passion for storytelling and a track record of crafting compelling narratives. He has a diverse background in broadcast journalism, AI, public relations, data science, and social media management.
Posted on 05/23/2025
In News

The U.S. Copyright Office has released a pre-publication version of a long-anticipated report examining whether using copyrighted works to train generative artificial intelligence (AI) systems constitutes infringement. The report, “Copyright and Artificial Intelligence, Part 3: Generative AI Training,” outlines the Office’s initial conclusions and recommendations on one of the most contentious issues in modern copyright law.

 

The report marks the third installment in the Copyright Office’s broader initiative on AI and copyright. It focuses specifically on the legal, economic, and policy questions surrounding how generative AI models are trained using massive volumes of copyrighted content—including books, images, songs, and news articles.

 

The Office concludes that many of the acts involved in AI training, including copying and organizing copyrighted works into datasets, may be prima facie infringing unless justified under a recognized exception like fair use. “Making commercial use of vast troves of copyrighted works to produce expressive content that competes with them… goes beyond established fair use boundaries,” the report states.

 

At the same time, the Office refrains from recommending new legislation—at least for now. It argues that the licensing market for AI training data is still in flux and should be allowed to develop. Several platforms and rights-holder organizations are experimenting with direct and collective licensing frameworks, which the Office sees as a more adaptable near-term solution.

 

Only if these market-based approaches prove unworkable, the Office notes, should Congress consider interventions like extended collective licensing. Such mechanisms are used in some countries to allow mass use of works when direct licensing is impractical, while still ensuring that creators receive compensation.

 

A key takeaway is that the legality of AI training is fact-specific. Fair use defenses, for instance, must be evaluated based on how transformative the use is, how much of the work was taken, whether it harms the work’s market, and whether there are viable licensing alternatives. The Office also expresses concern that unauthorized training could “diminish incentives to create” or interfere with licensing markets for creative works.

 

The report also outlines broader international developments. It highlights how the European Union, United Kingdom, Japan, Israel, and Singapore have taken different approaches to regulating AI training. Some countries permit text and data mining under specific exceptions; others rely more heavily on fair use principles. These differing frameworks, the Office warns, could lead to regulatory uncertainty and trade friction.

 

The report acknowledges that while generative AI can offer major benefits—from expanding creative tools to driving economic growth—it also poses serious risks to authors, performers, and other rights holders. “Striking the right balance is essential,” the Office writes, adding that government, industry, and civil society must work together to ensure innovation does not come at the cost of creative labor.

 

In addition to the legal analysis, the Office calls for greater transparency in how AI systems are trained. It encourages Congress to consider policies that would mandate disclosures from AI developers—such as information about data sources and model behavior—to ensure accountability.

 

This pre-publication version reflects extensive public consultation, including over 10,000 comments from stakeholders and dozens of listening sessions held in 2023 and 2024. The final version of the report may include updates based on further review and policy developments.

 

As policymakers continue to debate AI regulation, the Copyright Office’s evolving position will likely shape how Congress, courts, and companies approach the increasingly blurred line between human creativity and machine generation. The Office has pledged to monitor the licensing landscape and issue further guidance as needed.

 

For now, the Office’s message is clear: generative AI developers cannot assume that scraping the internet for training material is risk-free—and creators’ rights must remain central to the future of innovation.

 

 

Need Help?

 

If you have questions or concerns about how to navigate the global AI regulatory landscape, don’t hesitate to reach out to BABL AI. Their Audit Experts can offer valuable insight, and ensure you’re informed and compliant.

Subscribe to our Newsletter

Keep up with the latest on BABL AI, AI Auditing and
AI Governance News by subscribing to our news letter