LISTSERV - LIBLICENSE-L Archives

LIBLICENSE-L Archives

LibLicense-L Discussion Forum

LIBLICENSE-L@LISTSERV.CRL.EDU

	LISTSERV Archives
	LIBLICENSE-L Home

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives
Options:	Use Forum View Use Monospaced Font Show HTML Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]
Subject:	Re: Negotiating to preserve TDM and AI usage / training rights
From:	LIBLICENSE <[log in to unmask]>
Reply To:	LibLicense-L Discussion Forum <[log in to unmask]>
Date:	Wed, 20 Mar 2024 21:51:49 -0400
Content-Type:	multipart/alternative
Parts/Attachments:	text/plain (12 kB) , text/html (35 kB)
From: Rachael G Samberg <[log in to unmask]>
Date: Wed, 20 Mar 2024 15:34:50 -0700

Dear Peter,


Many thanks for reading our blog post. I write now to address what seem to
be misunderstandings in your response, and I hope I manage to address them
in a way that illuminates more alignment than you think. Because legal
concepts can be tricky, and because the readership for this listserv is
mostly non-lawyers, I will also try to parse some concepts and terms that
got confused in your reply.

First, though, I want to make clear that I am writing this reply in my
personal capacity, and not on behalf of the University of California. With
that said:

You refer to the right of authors not to have their copyrighted works
“exploited” by TDM and AI usage. This is not correct. Setting aside the
loaded use of the term “exploited” (I’ll just presume you meant “used”!)
the fair use of copyrighted works is expressly authorized by 17 USC § 107.
And that fair use provision does not afford copyright owners any right to
opt out of allowing other people to use their works fairly. Of course, this
is for good reason: If content creators were able to opt out, the provision
for fair use would be eviscerated, and little content would be available to
build upon for the advancement of science and the useful arts. In all
events, fair use exists as an affirmative statutory right for authors and
readers alike, so that anyone can use copyrighted works fairly—regardless
of whether any individual creator wanted them to be used or not.

In turn, your message suggests that scholarship published with a
CC-BY-NC-ND license should be protected against “derivative uses” like TDM
and AI. I’ll explain why this isn’t so.


   -

   I’ll start with the misplaced reference to “derivative uses.” A use is a
   use, and uses that are fair uses are statutorily protected. Assuming you
   instead meant “derivative work”, then it’s important to understand that
   conducting TDM, and using AI to conduct that TDM, does not create a
   derivative work. A “derivative work” refers specifically to the creation of
   a new work that incorporates a previously-existing work. TDM, and using
   non-generative AI to conduct TDM, provides insights and understandings
   about existing works through the creation of derived data, results,
   metadata, and analysis; these are not “derivative works.” And in all
   events, the right of someone other than the copyright owner to create a
   derivative work is permitted if the creation of the derivative work
   falls under an exception like fair use, as TDM research does. In all
   events, creation of derivative works is typically already expressly
   precluded by content license agreements anyway.



   -

   Turning next to the stated desire to safeguard “CC-BY-NC-ND” works
   against any TDM and AI uses (presumably because you think that such a CC
   license indicates authorial intent not to have the work be used), one
   should understand that the affirmative application of any Creative Commons
   license—or no application of a license at all (i.e. “all rights
   reserved”)—has no bearing at all on whether fair uses are permitted to be
   made of the work. A Creative Commons license applies and comes into play
   only when one is going beyond statutory exemptions like fair use. See:
   https://creativecommons.org/licenses/by/4.0/legalcode.en: “Exceptions
   and Limitations . For the avoidance of doubt, where Exceptions and
   Limitations [e.g., 17 USC § 107],  apply to Your use, this Public License
   does not apply, and You do not need to comply with its terms and
   conditions.” (See also the explanatory FAQ
   <https://creativecommons.org/faq/#do-creative-commons-licenses-affect-exceptions-and-limitations-to-copyright-such-as-fair-dealing-and-fair-use>,
   confirming the same.) As such: Authors cannot use CC licenses to control
   TDM and AI fair uses, and conversely scholars needn’t worry about whether a
   work has a Creative Commons license so long as the scholar is making a fair
   use.


If I may be permitted to reflect on what I think you actually intended to
express: It’s that you wish to prohibit any use of generative AI, because
the outputs of generative AI might exceed fair use (as no one disputes). I’ll
explain why banning all generative AI use and training merely to prevent
certain outputs in this fashion is overreaching in a bit. But in the
meantime, I want to focus on the implications of that “no generative AI”
sentiment for the rest of TDM and non-generative AI research uses; that is,
to underscore how TDM and non-generative AI simply don’t come into play for
what seems to have concerned you.

   -

   TDM: TDM research relies on automated methodologies to surface “latent”
   trends or information across large volumes of data. Every single court case
   that has addressed TDM research in the contexts at issue here has found it
   to be fair use. There is no “profiting” from conducting fair use TDM in the
   manner at issue in our licenses.
   -

   Non-Generative AI: TDM research methodologies can but do not necessarily
   need to rely on AI systems to extract this information. Sometimes
   algorithms can just be used to detect word proximity or frequency, or
   conduct sentiment analysis. In other instances, an AI model might be needed
   as part of the process. For instance, I’ve been working with a professor
   for several years as he studies trends in literature and film. Right now we
   have a Mellon Grant project for him to study such matters as the
   representation of guns in cinema. In order for him to assess how common
   guns are, and the types of circumstances in which guns appear, he has to
   find instances of guns in thousands of hours of movie footage. To do that,
   he needs an algorithm to search for and identify those guns. But first he
   has to show an AI tool what a gun looks like, by showing it some stills of
   guns from a small corpus of films, so that the AI tool can learn how to
   identify a gun before it then goes and looks for other instances of guns in
   a much larger body of works. This is a classification and regression
   technique called discriminative modeling. And it involves AI, but not
   generative AI, as the AI is not creating new images or footage of guns—as
   part of his TDM research. And once again, scholars have lawfully relied on
   this kind of non-generative AI training within TDM for years under the fair
   use doctrine.


So with this understanding, perhaps we can refine what you want: Perhaps
what you actually want to avoid is scholars profiting from using your
licensed products with generative AI to make new works. No problem: We’re
not licensing your content for scholars to do that anyway. We’re licensing
your content for fair scholarly and research uses. Any acts beyond fair
use, or whatever additional rights are carved out in the agreement, would
violate the license agreement anyway.

Okay, let’s refine the wishlist further: Maybe you don’t want scholars to
use generative AI in a way that releases trained AI to the public. No
problem again: Our adaptable licensing language can preclude that. Indeed,
with language that we have already successfully secured with publishers, we
impose commercially reasonable security measures and prohibit the public
release or exchange of any generative AI tool that has been trained, or any
data from such a generative AI tool. Certainly more aggressive licensing
language could preclude the training of a third-party generative AI tool
altogether—though there would be no need for such measures, as long as the
license agreement prohibited the public release or third-party exchange of
any trained tool or its data, and added further assurances of appropriate
security measures.

To that end, I think one thing that is lost in your message is the
difference between use of a generative AI tool, and training of a
generative AI tool. Using a generative AI tool means: You have a corpus of
works, you ask the AI a question about the works, it tells you the
answer. Training
AI differs in that the act of asking the AI questions, and the content you
show it to answer the questions, actually helps the AI learn how to give
better answers or improves the tool in some way. And this is where your
message muddles the notion of “plagiarism.” The mere use of generative AI
without also training the underlying generative AI tool has no implications
for whether a publisher’s content will be (to use your word)
“plagiarized”—i.e. there is no embedding of licensed content in the tool
even if public release of a tool were ever authorized.

The availability of copyrighted works for use in TDM (and AI reliance for
TDM) in scientific research is already a reality that authors and
publishers face in the European Union. The European Parliament considered
whether to let copyright owners opt out of having their works used for TDM
or AI, and decided unequivocally in the scientific research context not to
grant any such ight, and further not to allow contracts to take away these
rights. See article 3 of the EU’s Directive on Copyright in the Digital
Single Market
<https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:32019L0790>
(preserves the right for scholars within research organizations and
cultural heritage institutions to conduct TDM for scientific research);
article 7 (proscribes publishers from invalidating this exception by
license agreements); and the new AI regulations which affirm that publishers
cannot override Article 3 / scientific research AI training rights
<https://papers.ssrn.com/abstract=4740268>. Publishers must preserve fair
use-equivalent research exceptions for TDM and AI within the EU. Through
the licensing protections we’ve outlined, they can do so in the United
States, too.

I hope this response furthers the understanding of how licenses will be
used effectively to safeguard publishers’ (and authors’) financial
interests, while also supporting scholarly research in accordance with
statutory fair use rights.


Best,

Rachael


-- 
Rachael G. Samberg, J.D., MLIS
Scholarly Communication Officer & Program Director
Office of Scholarly Communication Services
University of California, Berkeley
Doe Library, 189 Annex
Berkeley, CA  94720-6000
Pronouns: she/her



On Mon, Mar 18, 2024 at 8:04 PM LIBLICENSE <[log in to unmask]> wrote:

> From: "Potter, Peter" <[log in to unmask]>
> Date: Sun, 17 Mar 2024 23:57:18 +0000
>
> Thanks to Rachael and the team at UC’s OSC for sharing this document. I
> came away from it with a much better understanding of the concerns of
> libraries as they try to account for TDM and AI when negotiating licenses
> for electronic resources.
>
>
>
> It does, however, raise a question for me about the other side of the fair
> use argument—namely, the rights of authors to not have their copyrighted
> works exploited by TDM and AI usage. This is especially pertinent in the
> humanities and social sciences where much of the OA scholarship is
> published with a CC BY NC-ND license because of authors’ (and publishers’)
> concerns about others profiting from derivative use of a work.
> Increasingly, I am hearing from authors who want to know the extent to
> which the “no derivatives” part of a CC license protects them against TDM
> and AI usage, specifically generative outputs. I’m curious to know what
> folks think about the fair use question when it comes to authors
> specifically.
>
>
>
> The UC OSC document acknowledges publishers’ concerns about misuse of
> licensed materials but then it seems to brush those concerns aside on the
> grounds that publishers “already can—and do—impose robust and effective
> contractual restrictions” on such misuses. But the document also admits
> that “overall fair use of generative AI outputs cannot always be predicted
> in advance,” which of course is exactly what authors are concerned
> about—usage by AI that is unpredictable and perhaps impossible to keep
> track of because of the increasingly sophisticated nature of AI. How does
> one prove plagiarism when AI systems have ingested and learned from so much
> content that it’s impossible to tease out what came from each individual
> source?
>
>
>
>
> From: Rachael G Samberg <[log in to unmask]>
>
> Date: Wed, 13 Mar 2024 05:38:00 -0700
>
> Fair use rights to conduct text mining & use artificial intelligence tools
> are essential for UC research & teaching. Learn how University of
> California libraries negotiate to preserve these rights in electronic
> resource agreements:
> https://osc.universityofcalifornia.edu/2024/03/fair-use-tdm-ai-restrictive-agreements/
>
>
>
> Best,
>
> Rachael G. Samberg
>
> Timothy Vollmer
>
> Samantha Teremi
>
ATOM RSS1 RSS2
LISTSERV.CRL.EDU