LISTSERV - LIBLICENSE-L Archives

From: "Potter, Peter" <[log in to unmask]>
Date: Thu, 21 Mar 2024 17:28:12 +0000

Hi Rachael.



Wow! Thanks for responding so thoroughly to my questions. I’ll bet that
others will agree with me in saying that you’ve helped to bring clarity to
any number of concerns that come up again and again in discussions about AI
and copyright.



I won’t go into great detail here because, I’m sure we actually agree in
principle on the major points you’ve made. I simply want to clarify what I
was trying to say (unartfully, it seems) in my last message:



My point about authors being protected by copyright law from having their
works exploited (i.e., used) for TDM and AI was not to question the
principle of fair use—or even to question the University of California’s
position on TDM/AI. As far as I’m concerned, researchers at the University
of California (and anywhere else, for that matter) are entirely within
their right to use TDI/AI to the extent that it falls within the bounds of
fair use. My concern is with those who might use TDI/AI (primarily
generative AI) in ways that go beyond fair use and therefore violate
copyright.



So, you are absolutely correct when you say, a few paragraphs down in your
response, that what I actually want to avoid is others profiting from using
licensed products with *generative AI *to make new works. But, again, my
concern here is definitely not with institutions like the University of
California licensing content for fair scholarly and research uses. My real
concern is with commercial AI systems that are able to ingest the
increasingly massive body of open access HSS scholarship on the web to
generate new works—without crediting the original authors much less
remunerating them in instances where remuneration is required.  In short,
what recourse do authors have should they suspect that their intellectual
property is being used in ways that violate the open license?



Thanks, again.

Peter Potter




From: Rachael G Samberg <[log in to unmask]>

Date: Wed, 20 Mar 2024 15:34:50 -0700

Dear Peter,


Many thanks for reading our blog post. I write now to address what seem to
be misunderstandings in your response, and I hope I manage to address them
in a way that illuminates more alignment than you think. Because legal
concepts can be tricky, and because the readership for this listserv is
mostly non-lawyers, I will also try to parse some concepts and terms that
got confused in your reply.



First, though, I want to make clear that I am writing this reply in my
personal capacity, and not on behalf of the University of California. With
that said:



You refer to the right of authors not to have their copyrighted works
“exploited” by TDM and AI usage. This is not correct. Setting aside the
loaded use of the term “exploited” (I’ll just presume you meant “used”!)
the fair use of copyrighted works is expressly authorized by 17 USC § 107.
And that fair use provision does not afford copyright owners any right to
opt out of allowing other people to use their works fairly.



Of course, this is for good reason: If content creators were able to opt
out, the provision for fair use would be eviscerated, and little content
would be available to build upon for the advancement of science and the
useful arts. In all events, fair use exists as an affirmative statutory
right for authors and readers alike, so that anyone can use copyrighted
works fairly—regardless of whether any individual creator wanted them to be
used or not.



In turn, your message suggests that scholarship published with a
CC-BY-NC-ND license should be protected against “derivative uses” like TDM
and AI. I’ll explain why this isn’t so.



   - I’ll start with the misplaced reference to “derivative uses.” A use is
   a use, and uses that are fair uses are statutorily protected. Assuming you
   instead meant “derivative work”, then it’s important to understand that
   conducting TDM, and using AI to conduct that TDM, does not create a
   derivative work. A “derivative work” refers specifically to the creation of
   a new work that incorporates a previously-existing work. TDM, and using
   non-generative AI to conduct TDM, provides insights and understandings
   about existing works through the creation of derived data, results,
   metadata, and analysis; these are not “derivative works.” And in all
   events, the right of someone other than the copyright owner to create a
   derivative work is *permitted* if the creation of the derivative work
   falls under an exception like fair use, as TDM research does. In all
   events, creation of derivative works is typically already expressly
   precluded by content license agreements anyway.



   - Turning next to the stated desire to safeguard “CC-BY-NC-ND” works
   against any TDM and AI uses (presumably because you think that such a CC
   license indicates authorial intent not to have the work be used), one
   should understand that the affirmative application of any Creative Commons
   license—or no application of a license at all (i.e. “all rights
   reserved”)—has no bearing at all on whether fair uses are permitted to be
   made of the work. A Creative Commons license applies and comes into play
   only when one is going beyond statutory exemptions like fair use. See:
   https://creativecommons.org/licenses/by/4.0/legalcode.en: “Exceptions
   and Limitations . For the avoidance of doubt, where Exceptions and
   Limitations [e.g., 17 USC § 107],  apply to Your use, this Public License
   does not apply, and You do not need to comply with its terms and
   conditions.” (See also the explanatory FAQ
   <https://creativecommons.org/faq/#do-creative-commons-licenses-affect-exceptions-and-limitations-to-copyright-such-as-fair-dealing-and-fair-use>,
   confirming the same.) As such: Authors cannot use CC licenses to control
   TDM and AI fair uses, and conversely scholars needn’t worry about whether a
   work has a Creative Commons license so long as the scholar is making a fair
   use.



If I may be permitted to reflect on what I think you actually intended to
express: It’s that you wish to prohibit any use of *generative AI*, because
the *outputs *of generative AI might exceed fair use (as no one disputes)*.
*I’ll explain why banning all generative AI use and training merely to
prevent certain outputs in this fashion is overreaching in a bit. But in
the meantime, I want to focus on the implications of that “no generative
AI” sentiment for the rest of TDM and non-generative AI research uses; that
is, to underscore how TDM and non-generative AI simply don’t come into play
for what seems to have concerned you.

   - *TDM: *TDM research relies on automated methodologies to surface
   “latent” trends or information across large volumes of data. Every single
   court case that has addressed TDM research in the contexts at issue here
   has found it to be fair use. There is no “profiting” from conducting fair
   use TDM in the manner at issue in our licenses.
   - *Non-Generative AI*: TDM research methodologies can but do not
   necessarily need to rely on AI systems to extract this information.
   Sometimes algorithms can just be used to detect word proximity or
   frequency, or conduct sentiment analysis. In other instances, an AI model
   might be needed as part of the process. For instance, I’ve been working
   with a professor for several years as he studies trends in literature and
   film. Right now we have a Mellon Grant project for him to study such
   matters as the representation of guns in cinema. In order for him to assess
   how common guns are, and the types of circumstances in which guns appear,
   he has to find instances of guns in thousands of hours of movie footage. To
   do that, he needs an algorithm to search for and identify those guns. But
   first he has to show an AI tool what a gun looks like, by showing it some
   stills of guns from a small corpus of films, so that the AI tool can learn
   how to identify a gun before it then goes and looks for other instances of
   guns in a much larger body of works. This is a classification and
   regression technique called discriminative modeling. And it involves AI,
   but not generative AI, as the AI is not creating new images or footage of
   guns—as part of his TDM research. And once again, scholars have lawfully
   relied on this kind of non-generative AI training within TDM for years
   under the fair use doctrine.



So with this understanding, perhaps we can refine what you want: Perhaps
what you actually want to avoid is scholars profiting from using your
licensed products with *generative AI *to make new works. No problem: We’re
not licensing your content for scholars to do that anyway. We’re licensing
your content for fair scholarly and research uses. Any acts beyond fair
use, or whatever additional rights are carved out in the agreement, would
violate the license agreement anyway.



Okay, let’s refine the wishlist further: Maybe you don’t want scholars to
use generative AI in a way that releases trained AI to the public. No
problem again: Our adaptable licensing language can preclude that. Indeed,
with language that we have already successfully secured with publishers, we
impose commercially reasonable security measures and prohibit the public
release or exchange of any generative AI tool that has been trained, or any
data from such a generative AI tool. Certainly more aggressive licensing
language could preclude the training of a third-party generative AI tool
altogether—though there would be no need for such measures, as long as the
license agreement prohibited the public release or third-party exchange of
any trained tool or its data, and added further assurances of appropriate
security measures.



To that end, I think one thing that is lost in your message is the
difference between *use *of a generative AI tool, and *training* of a
generative AI tool. *Using *a generative AI tool means: You have a corpus
of works, you ask the AI a question about the works, it tells you the
answer. *Training AI* differs in that the act of asking the AI questions,
and the content you show it to answer the questions, actually helps the AI
learn how to give better answers or improves the tool in some way. And this
is where your message muddles the notion of “plagiarism.” The mere *use* of
generative AI without also *training *the underlying generative AI tool has
no implications for whether a publisher’s content will be (to use your
word) “plagiarized”—i.e. there is no embedding of licensed content in the
tool even if public release of a tool were ever authorized.



The availability of copyrighted works for use in TDM (and AI reliance for
TDM) in scientific research is already a reality that authors and
publishers face in the European Union. The European Parliament considered
whether to let copyright owners opt out of having their works used for TDM
or AI, and decided unequivocally in the scientific research context not to
grant any such ight, and further not to allow contracts to take away these
rights. See article 3 of the EU’s Directive on Copyright in the Digital
Single Market
<https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:32019L0790>
(preserves the right for scholars within research organizations and
cultural heritage institutions to conduct TDM for scientific research);
article 7 (proscribes publishers from invalidating this exception by
license agreements); and the new AI regulations which affirm that publishers
cannot override Article 3 / scientific research AI training rights
<https://papers.ssrn.com/abstract=4740268>. Publishers must preserve fair
use-equivalent research exceptions for TDM and AI within the EU. Through
the licensing protections we’ve outlined, they can do so in the United
States, too.



I hope this response furthers the understanding of how licenses will be
used effectively to safeguard publishers’ (and authors’) financial
interests, while also supporting scholarly research in accordance with
statutory fair use rights.



Best,

Rachael



-- 

Rachael G. Samberg, J.D., MLIS

Scholarly Communication Officer & Program Director

Office of Scholarly Communication Services

University of California, Berkeley

Doe Library, 189 Annex

Berkeley, CA  94720-6000

Pronouns: she/her





On Mon, Mar 18, 2024 at 8:04 PM LIBLICENSE <[log in to unmask]> wrote:

From: "Potter, Peter" <[log in to unmask]>

Date: Sun, 17 Mar 2024 23:57:18 +0000

Thanks to Rachael and the team at UC’s OSC for sharing this document. I
came away from it with a much better understanding of the concerns of
libraries as they try to account for TDM and AI when negotiating licenses
for electronic resources.



It does, however, raise a question for me about the other side of the fair
use argument—namely, the rights of authors to not have their copyrighted
works exploited by TDM and AI usage. This is especially pertinent in the
humanities and social sciences where much of the OA scholarship is
published with a CC BY NC-ND license because of authors’ (and publishers’)
concerns about others profiting from derivative use of a work.
Increasingly, I am hearing from authors who want to know the extent to
which the “no derivatives” part of a CC license protects them against TDM
and AI usage, specifically generative outputs. I’m curious to know what
folks think about the fair use question when it comes to authors
specifically.



The UC OSC document acknowledges publishers’ concerns about misuse of
licensed materials but then it seems to brush those concerns aside on the
grounds that publishers “already can—and do—impose robust and effective
contractual restrictions” on such misuses. But the document also admits
that “overall fair use of generative AI outputs cannot always be predicted
in advance,” which of course is exactly what authors are concerned
about—usage by AI that is unpredictable and perhaps impossible to keep
track of because of the increasingly sophisticated nature of AI. How does
one prove plagiarism when AI systems have ingested and learned from so much
content that it’s impossible to tease out what came from each individual
source?





From: Rachael G Samberg <[log in to unmask]>

Date: Wed, 13 Mar 2024 05:38:00 -0700

Fair use rights to conduct text mining & use artificial intelligence tools
are essential for UC research & teaching. Learn how University of
California libraries negotiate to preserve these rights in electronic
resource agreements:
https://osc.universityofcalifornia.edu/2024/03/fair-use-tdm-ai-restrictive-agreements/



Best,

Rachael G. Samberg

Timothy Vollmer

Samantha Teremi