Copyright Office Report on AI Training and Fair Use

Can AI Use Copyrighted Material Under Fair Use?

Can AI systems rely on the fair use doctrine when training on copyrighted works? That question sits at the center of ongoing copyright debates.

Fair use is a doctrine in U.S. copyright law. It allows limited use of copyrighted material without permission in certain situations. Courts apply a four-factor test to determine fair use. Those factors include the purpose of the use, the nature of the work, the amount used, and market impact.

However, these factors are only guidelines. Importantly, no single factor guarantees fair use on its own.

Common Examples of Fair Use

Fair use often applies to criticism, commentary, and news reporting. It also commonly applies to teaching, scholarship, and research. For example, a reviewer may quote a copyrighted book without permission when critiquing that book. However, fair use is context-specific. Uses outside these categories may still qualify, while listed purposes may still fail.

AI Training Raises New Fair Use Questions

Generative AI systems train on massive datasets. Those datasets often include copyrighted material. As a result, a key question arises. Can AI training qualify as fair use and avoid infringement? This issue was the focus of a recent study. The study was issued by the U.S. Copyright Office (USCO).

Do AI models Infringe Training Data?

The USCO report concludes that AI training implicates several exclusive copyright rights. These include reproduction and derivative work rights.

The report also examines a critical issue. Are a model’s weights themselves infringing copies?

Developers argue that models contain only numerical values. They claim these values are not copies of copyrighted works. Others disagree, pointing to AI outputs that closely resemble training data.

According to the USCO, similarity matters. When outputs are substantially similar, infringement concerns grow stronger. In those cases, the report finds a compelling argument. Copying model weights may implicate reproduction and derivative rights.

How the Fair Use Defense Applies to AI

The key issue when analyzing fair use has been whether the use is “transformative”. There is of course a spectrum. Where the output is based on a diverse dataset, it is more likely to be transformative. But where it is trained to generate outputs substantially similar to copyrighted works, then it is “at best, modestly transformative”.

Why the Human Learning Analogy Falls Short

The USCO rejected arguments that AI training is inherently transformative, because it is analogous to human learning. The report asserts that the analogy rests on a faulty premise. Fair use is not a defense for all acts if those acts are used for learning. An example was given that a student couldn’t rely on fair use to copy all of the books at a library. The analogy also breaks down technically. Humans retain imperfect, filtered impressions of works. AI systems do not. The structure of exclusive copyright rights is premised on certain human limitations.

What Comes Next for AI and Copyright

Currently, there are more than 40 cases currently pending related to the issue of AI using copyrighted materials. There won’t be a single answer whether unauthorized use is fair use. There is an ongoing discussion of some form of licensing scheme for AI training data. For now, the recommendation is to continue development without government intervention.