From: "Potter, Peter" <[log in to unmask]>

Date: Thu, 21 Mar 2024 17:28:12 +0000

Hi Rachael.

Wow! Thanks for responding so thoroughly to my questions. I’ll bet that others will agree with me in saying that you’ve helped to bring clarity to any number of concerns that come up again and again in discussions about AI and copyright.

I won’t go into great detail here because, I’m sure we actually agree in principle on the major points you’ve made. I simply want to clarify what I was trying to say (unartfully, it seems) in my last message:

My point about authors being protected by copyright law from having their works exploited (i.e., used) for TDM and AI was not to question the principle of fair use—or even to question the University of California’s position on TDM/AI. As far as I’m concerned, researchers at the University of California (and anywhere else, for that matter) are entirely within their right to use TDI/AI to the extent that it falls within the bounds of fair use. My concern is with those who might use TDI/AI (primarily generative AI) in ways that go beyond fair use and therefore violate copyright.

So, you are absolutely correct when you say, a few paragraphs down in your response, that what I actually want to avoid is others profiting from using licensed products with generative AI to make new works. But, again, my concern here is definitely not with institutions like the University of California licensing content for fair scholarly and research uses. My real concern is with commercial AI systems that are able to ingest the increasingly massive body of open access HSS scholarship on the web to generate new works—without crediting the original authors much less remunerating them in instances where remuneration is required. In short, what recourse do authors have should they suspect that their intellectual property is being used in ways that violate the open license?

Thanks, again.

Peter Potter

From: Rachael G Samberg <[log in to unmask]>

Date: Wed, 20 Mar 2024 15:34:50 -0700

Dear Peter,

Many thanks for reading our blog post. I write now to address what seem to be misunderstandings in your response, and I hope I manage to address them in a way that illuminates more alignment than you think. Because legal concepts can be tricky, and because the readership for this listserv is mostly non-lawyers, I will also try to parse some concepts and terms that got confused in your reply.

First, though, I want to make clear that I am writing this reply in my personal capacity, and not on behalf of the University of California. With that said:

You refer to the right of authors not to have their copyrighted works “exploited” by TDM and AI usage. This is not correct. Setting aside the loaded use of the term “exploited” (I’ll just presume you meant “used”!) the fair use of copyrighted works is expressly authorized by 17 USC § 107. And that fair use provision does not afford copyright owners any right to opt out of allowing other people to use their works fairly.

Of course, this is for good reason: If content creators were able to opt out, the provision for fair use would be eviscerated, and little content would be available to build upon for the advancement of science and the useful arts. In all events, fair use exists as an affirmative statutory right for authors and readers alike, so that anyone can use copyrighted works fairly—regardless of whether any individual creator wanted them to be used or not.

In turn, your message suggests that scholarship published with a CC-BY-NC-ND license should be protected against “derivative uses” like TDM and AI. I’ll explain why this isn’t so.

I’ll start with the misplaced reference to “derivative uses.” A use is a use, and uses that are fair uses are statutorily protected. Assuming you instead meant “derivative work”, then it’s important to understand that conducting TDM, and using AI to conduct that TDM, does not create a derivative work. A “derivative work” refers specifically to the creation of a new work that incorporates a previously-existing work. TDM, and using non-generative AI to conduct TDM, provides insights and understandings about existing works through the creation of derived data, results, metadata, and analysis; these are not “derivative works.” And in all events, the right of someone other than the copyright owner to create a derivative work is permitted if the creation of the derivative work falls under an exception like fair use, as TDM research does. In all events, creation of derivative works is typically already expressly precluded by content license agreements anyway.

Turning next to the stated desire to safeguard “CC-BY-NC-ND” works against any TDM and AI uses (presumably because you think that such a CC license indicates authorial intent not to have the work be used), one should understand that the affirmative application of any Creative Commons license—or no application of a license at all (i.e. “all rights reserved”)—has no bearing at all on whether fair uses are permitted to be made of the work. A Creative Commons license applies and comes into play only when one is going beyond statutory exemptions like fair use. See: https://creativecommons.org/licenses/by/4.0/legalcode.en: “Exceptions and Limitations . For the avoidance of doubt, where Exceptions and Limitations [e.g., 17 USC § 107], apply to Your use, this Public License does not apply, and You do not need to comply with its terms and conditions.” (See also the explanatory FAQ, confirming the same.) As such: Authors cannot use CC licenses to control TDM and AI fair uses, and conversely scholars needn’t worry about whether a work has a Creative Commons license so long as the scholar is making a fair use.

If I may be permitted to reflect on what I think you actually intended to express: It’s that you wish to prohibit any use of generative AI, because the outputs of generative AI might exceed fair use (as no one disputes). I’ll explain why banning all generative AI use and training merely to prevent certain outputs in this fashion is overreaching in a bit. But in the meantime, I want to focus on the implications of that “no generative AI” sentiment for the rest of TDM and non-generative AI research uses; that is, to underscore how TDM and non-generative AI simply don’t come into play for what seems to have concerned you.

TDM: TDM research relies on automated methodologies to surface “latent” trends or information across large volumes of data. Every single court case that has addressed TDM research in the contexts at issue here has found it to be fair use. There is no “profiting” from conducting fair use TDM in the manner at issue in our licenses.
Non-Generative AI: TDM research methodologies can but do not necessarily need to rely on AI systems to extract this information. Sometimes algorithms can just be used to detect word proximity or frequency, or conduct sentiment analysis. In other instances, an AI model might be needed as part of the process. For instance, I’ve been working with a professor for several years as he studies trends in literature and film. Right now we have a Mellon Grant project for him to study such matters as the representation of guns in cinema. In order for him to assess how common guns are, and the types of circumstances in which guns appear, he has to find instances of guns in thousands of hours of movie footage. To do that, he needs an algorithm to search for and identify those guns. But first he has to show an AI tool what a gun looks like, by showing it some stills of guns from a small corpus of films, so that the AI tool can learn how to identify a gun before it then goes and looks for other instances of guns in a much larger body of works. This is a classification and regression technique called discriminative modeling. And it involves AI, but not generative AI, as the AI is not creating new images or footage of guns—as part of his TDM research. And once again, scholars have lawfully relied on this kind of non-generative AI training within TDM for years under the fair use doctrine.

So with this understanding, perhaps we can refine what you want: Perhaps what you actually want to avoid is scholars profiting from using your licensed products with generative AI to make new works. No problem: We’re not licensing your content for scholars to do that anyway. We’re licensing your content for fair scholarly and research uses. Any acts beyond fair use, or whatever additional rights are carved out in the agreement, would violate the license agreement anyway.

Okay, let’s refine the wishlist further: Maybe you don’t want scholars to use generative AI in a way that releases trained AI to the public. No problem again: Our adaptable licensing language can preclude that. Indeed, with language that we have already successfully secured with publishers, we impose commercially reasonable security measures and prohibit the public release or exchange of any generative AI tool that has been trained, or any data from such a generative AI tool. Certainly more aggressive licensing language could preclude the training of a third-party generative AI tool altogether—though there would be no need for such measures, as long as the license agreement prohibited the public release or third-party exchange of any trained tool or its data, and added further assurances of appropriate security measures.

To that end, I think one thing that is lost in your message is the difference between use of a generative AI tool, and training of a generative AI tool. Using a generative AI tool means: You have a corpus of works, you ask the AI a question about the works, it tells you the answer. Training AI differs in that the act of asking the AI questions, and the content you show it to answer the questions, actually helps the AI learn how to give better answers or improves the tool in some way. And this is where your message muddles the notion of “plagiarism.” The mere use of generative AI without also training the underlying generative AI tool has no implications for whether a publisher’s content will be (to use your word) “plagiarized”—i.e. there is no embedding of licensed content in the tool even if public release of a tool were ever authorized.

The availability of copyrighted works for use in TDM (and AI reliance for TDM) in scientific research is already a reality that authors and publishers face in the European Union. The European Parliament considered whether to let copyright owners opt out of having their works used for TDM or AI, and decided unequivocally in the scientific research context not to grant any such ight, and further not to allow contracts to take away these rights. See article 3 of the EU’s Directive on Copyright in the Digital Single Market (preserves the right for scholars within research organizations and cultural heritage institutions to conduct TDM for scientific research); article 7 (proscribes publishers from invalidating this exception by license agreements); and the new AI regulations which affirm that publishers cannot override Article 3 / scientific research AI training rights. Publishers must preserve fair use-equivalent research exceptions for TDM and AI within the EU. Through the licensing protections we’ve outlined, they can do so in the United States, too.

I hope this response furthers the understanding of how licenses will be used effectively to safeguard publishers’ (and authors’) financial interests, while also supporting scholarly research in accordance with statutory fair use rights.

Best,

Rachael

Rachael G. Samberg, J.D., MLIS

Scholarly Communication Officer & Program Director

Office of Scholarly Communication Services

University of California, Berkeley

Doe Library, 189 Annex

Berkeley, CA 94720-6000

Pronouns: she/her

On Mon, Mar 18, 2024 at 8:04 PM LIBLICENSE <[log in to unmask]> wrote:

From: "Potter, Peter" <[log in to unmask]>

Date: Sun, 17 Mar 2024 23:57:18 +0000

Thanks to Rachael and the team at UC’s OSC for sharing this document. I came away from it with a much better understanding of the concerns of libraries as they try to account for TDM and AI when negotiating licenses for electronic resources.

It does, however, raise a question for me about the other side of the fair use argument—namely, the rights of authors to not have their copyrighted works exploited by TDM and AI usage. This is especially pertinent in the humanities and social sciences where much of the OA scholarship is published with a CC BY NC-ND license because of authors’ (and publishers’) concerns about others profiting from derivative use of a work. Increasingly, I am hearing from authors who want to know the extent to which the “no derivatives” part of a CC license protects them against TDM and AI usage, specifically generative outputs. I’m curious to know what folks think about the fair use question when it comes to authors specifically.

The UC OSC document acknowledges publishers’ concerns about misuse of licensed materials but then it seems to brush those concerns aside on the grounds that publishers “already can—and do—impose robust and effective contractual restrictions” on such misuses. But the document also admits that “overall fair use of generative AI outputs cannot always be predicted in advance,” which of course is exactly what authors are concerned about—usage by AI that is unpredictable and perhaps impossible to keep track of because of the increasingly sophisticated nature of AI. How does one prove plagiarism when AI systems have ingested and learned from so much content that it’s impossible to tease out what came from each individual source?

From: Rachael G Samberg <[log in to unmask]>

Date: Wed, 13 Mar 2024 05:38:00 -0700

Fair use rights to conduct text mining & use artificial intelligence tools are essential for UC research & teaching. Learn how University of California libraries negotiate to preserve these rights in electronic resource agreements: https://osc.universityofcalifornia.edu/2024/03/fair-use-tdm-ai-restrictive-agreements/

Best,

Rachael G. Samberg

Timothy Vollmer

Samantha Teremi