From: "Thatcher, Sanford Gray" <[log in to unmask]>
Date: Mon, 29 Jan 2024 08:33:38 +0000

I beg to differ with this claim that "transformative use" supports seeing training AI on material from the New York Times (and the novels of the seven trade book authors who have also sued OpenAI) as fair use.

The basic argument appears to rest on an unquestioning belief in the social utility of AI technology.  Let me remind people that, in deciding the suit against Texaco in 1995, Judge Jon Newman admitted the social utility of photocopying but nevertheless found the use of this mechanical copying to be copyright infringement.  One van also question just how sociallly useful AI is when it learns how to write scientific papers and then attribute their authorship to scientists who did not indeed write them, or the "deep fakes" that AI has been generating to bedevil people ranging from President Biden to Taylor Swift.  Technology is not inherently good. Social media platforms once seemed very promising but have been creating havoc and divisiveness in our political world of late.  And those of who saw the movie about Oppenheimer will remember how mixed his views were about the technology  that led to the creation of the atomic and hydrogen bombs.  And think about how "intelligent" AI is really going to be when it trains on only 3% of the world's stored knowledge, the vast majority of which lies locked behind firewalls controlled by journal publishers. So, you'll excuse me if I don't bow down and worship AI technology uncritically.

Transformative use is a concept invented by Judge Pierre Leval in a now classic Harvard Law Review article published in 1990. It was first applied in a major case by the US Supreme Court in the 1994 case involving rap group 2 Live Crew's parody of the song "Pretty Woman."  The court ruled that if the copying done were itself an act of creativity, it could be seen as fair use because of its transformative nature and Leval's article was quoted to that effect. But then a series of cases in the Silicon Valley friendly Ninth CVircuit applied—misapplied, in my view—this concept to circumstances where socially useful indexes and other products were used algorithmically, allowing for subsequent creators to make use of them, though the acts of making these products were not in themselves creative anymore than the photocopying machines in the Texaco case were.  This mistake was then carried over in the decisions about HathiTrust and Google. In the latter case Judge Chin explicitly argued that it was the later creative uses facilitated by the mass copying that justified calling it fair use, in direct contradiction to the precedent set by the Texas decision in the same circuit. So much for precedent!

The ARL seems so enamored of technology that in its Code of Best Practices in Fair Use for Academic and Research Libraries (2012) it put forward the alarming argument that, because scientific journal articles and novels were not originally aimed at students in the college classroom for which use they have been "repurposed," it therefore should be fair use for libraries to make as many copies as needed to supply these materials to undergraduates in coursepacks with no permission needed from the publishers or authors and no compensation paid to them.  This to my mind is the reductio ad absurdum of the whole line of argument libraries have been using to justify massive amounts of copying with no regard to the fourth factor's emphasis on the effects of such copying on the market. It was the preeminence of the fourth factor that we in the Association of American University Presses (as it was then called) emphasized in my testimony before a Congressional committee hearing about fair use in 1973 that, had it been adopted, would have led to a much clearer and cleaner jurisprudence of fair use than the tangled mess we have on our hands today.

I go into greater detail on all this in an article I wrote for the 50th anniversary issue of the Journal of Scholarly Publishing, which is available "open access" at this site: https://scholarsphere.psu.edu/resources/460d0813-f82b-400a-abc8-137bf9d1f647

Sandy Thatcher


From: Ann Shumelda Okerson <[log in to unmask]>
Date: Fri, 26 Jan 2024 21:35:09 -0500

Training Generative AI Models on Copyrighted Works Is Fair Use

by Katherine Klosek, Director of Information Policy and Federal Relations, Association of Research Libraries (ARL), and Marjory S. Blumenthal, Senior Policy Fellow, American Library Association (ALA) Office of Public Policy and Advocacy | January 23, 2024

. . . But as champions of fair use, free speech, and freedom of information, libraries have a stake in maintaining the balance of copyright law so that it is not used to block or restrict access to information. We [LCA] drafted the principles on AI and copyright in response to efforts to amend copyright law to require licensing schemes for generative AI that could stunt the development of this technology, and undermine its utility to researchers, students, creators, and the public. The LCA principles hold that copyright law as applied and interpreted by the Copyright Office and the courts is flexible and robust enough to address issues of copyright and AI without amendment. The LCA principles also make the careful and critical distinction between input to train an LLM, and output—which could potentially be infringing if it is substantially similar to an original expressive work.

More here: