From: Sandy Thatcher <[log in to unmask]> Date: Tue, 6 Nov 2012 09:44:15 -0600 The amicus brief cited here makes a reasonable case for treating mass digitization as fair use. It is not too much of a stretch to see the type of uses made by scholars of the data generated by mass digitization as "transformative" in the way that concept has come to be introduced into copyright law by the pioneering work of Judge Pierre Leval (whom the authors cite). I would urge these caveats, however: First, the authors of the brief make much of the fact/expression dichotomy that has come to be embedded in copyright jurisprudence and is explicitly sanctioned in Sec. 102 of the Copyright Act of 1976. But this concept can be pushed too far, in making all scholarly writing seem more akin to the "factual" data of a telephone directory (Feist 1991) than to the "expressive" prose of fiction. Should scholars be deprived of all copyright protection just because their work is more factual than expressive? And I daresay some scholarly writing is more creatively expressive than some dull fiction writing. (See point # 3 below.) Second, they cite the line of cases interpreting fair use by the Ninth Circuit and (building on the Ninth's interpretation) the Fourth Circuit. But the Ninth Circuit cases involving the digitization of images, especially Perfect 10, do not work so comfortably as precedents as the authors of the brief seem to think. They argue that "Allowing Intermediate Copying in Order to Enable Nonexpressive Uses Does Not Harm the Market for the Original Works in a Legally Cognizable Manner, As The Practice Does Not Implicate the Works' Expressive Aspects in Any Way." But the court on the Perfect 10 case chose to ignore a real market for thumbnail images that Perfect 10 was already developing by licensing their use on cell phones. That use would have been for exactly the same "expressive" purpose as the original, not just for indexing. By allowing Google to assemble a collection of such images, the court effectively killed Perfect 10's licensing business. Even the Fourth Circuit's decision about Turnitin can be questioned in this manner. While Turnitin can generate a finding of possible plagiarism from its database of student papers, any examination of actual infringement of any particular paper would require reading it in detail and making a line-by-line comparison with the allegedly infringing paper, hence making the same use of it as the original (though admittedly for a somewhat different purpose). Third, many people including most reporters have thought that the objection of publishers to Google's library project had something to do with the "snippets" that Google allowed users to see. It did not. The primary objection was to Google's delivery of a digital file of each book copied to the library that provided it (as well as Google's effort to substitute "opt out" for "opt in" as the standard approach to copyright). That had a direct impact on the potential market for digitized copies that publishers could have sold to libraries. Now, admittedly, a lot depends on what the libraries felt they could do with those copies. But, as we have seen with the HathiTrust case, libraries are expanding their ideas of what uses they can make under fair use, including potentially uses of orphan works that are the same kinds of uses that are made of the originals in their expressive capacity, as the brief's authors would put it. As we move along this slippery slope, we eventually get to the position enunciated in the ARL's Code of Best Practices in Fair Use for Academic and Research Libraries where uses of scholarly monographs (and journal articles) through e-reserves are to be considered fair use because these works are being used, so it is claimed, for a purpose different from the use originally intended by the authors even though the kind of use--the reading of the actual content line by line--is exactly of the same kind! The result of this kind of approach would be the destruction of almost the entire market for paperbacks issued by academic publishers for course use. While few authors of monographs make much, if any, money from the sale of their books in hardback to libraries, quite a few of them derive significant income from the sale of their books in paperback for course use (some of them even earning amounts into six figures) and I would be surprised if they would be happy about an interpretation of fair use that deprived them of such income--not to mention the publishers who depend on this income to sustain the whole system of scholarly publishing. Fourth, rather than head down this slippery slope and turn fair use in general, and "transformative use" in particular, into a completely muddled and all-expansive umbrella concept for justifying just about every type of copying imaginable, a saner approach would be to do what Public Knowledge has recommended in its Copyright Reform Act project, viz., urging Congress to amend the law by explicitly sanctioning certain limited but important kinds of transient or incidental copying. Here is an excerpt from a white paper written by people associated with the Berkeley Law group that summarizes this approach: Specifically, the proposed reform provides an exemption to the exclusive right of reproduction provided to copyright owners under § 106 of the Copyright Act for some incidental copies. Not all intermediate copies are covered by the reform; there are three targeted limitations that ensure that the reform effectively protects the interests of copyright owners. First, the exemption is limited to incidental or transient copies. This restriction prevents potential infringers from creating copies, such as permanent or secondary duplications, that possess substantial value outside of their necessity to a particular end use. Second, these copies must be an integral and essential part of a technological process. This condition prevents copyists from circumventing copyright protection by secondarily attaching incidental or transient copies to some technological process. Finally, the primary purpose of the copy must be to enable a lawful use. This restriction forces evaluation of the end use that the copy facilitates, requiring that the end use be evaluated in light of the property rights of copyright owners. By limiting the exemption in this fashion, Congress can protect the interests of both copyright holders and consumers. Dena Chen et al., "Providing an Incidental Copies Exemption for Service Providers and End-Users," March 31, 2011. Click on Report 5 here: http://www.publicknowledge.org/cra/ Sandy Thatcher From: Ann Shumelda Okerson <[log in to unmask]> Date: Mon, 5 Nov 2012 04:54:29 -0500 Forwarded by Paul Zarins, of Stanford University Library, below is a message from Glen Worthy, Stanford's Digital Humanities LIbrarian. ________________________________ From: "Glen Worthey" <[log in to unmask]> To: "Pavils Zarins" <[log in to unmask]> Sent: Thursday, November 1, 2012 5:02:54 PM Subject: Re: Fwd: Suggested Readings in Text Mining? My bias will be pretty obvious to you -- but as far as I'm concerned, regarding text mining specifically for humanities research, Matt Jockers is the very best. Here is a set of several highly relevant blog posts from him: http://www.matthewjockers.net/category/tm/ the best and most entertaining of which is basically a chapter from his book /Macroanalysis: Digital Methods and Literary History /(due out early next year): http://www.matthewjockers.net/2011/09/29/the-lda-buffet-is-now-open-or-latent-dirichlet-allocation-for-english-majors/ I suspect that Ann (and others on the Liblicence list) may be especially interested in this: Matt was also co-author (on behalf of digital humanities and legal scholars) of an amicus brief that was filed in the Authors Guild v. HathiTrust case: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2102542 and which was frequently cited by the judge in his decision. Obviously, text mining is not the main focus of this brief, but it does play a strikingly prominent role in what turned out to be a very important legal document. Finally, as just a portal into the huge world of text mining for humanities research, see this very helpful "progressive" (that is, progressing from "beginner" to "expert" level) review article with links aplenty: "Topic Modeling for Humanists: A Guided Tour" http://www.scottbot.net/HIAL/?p=19113 (Note that, for some purposes -- though not all! -- "topic modeling" is rough synonym for "text mining." It's probably better characterized as a subset of text mining, but I believe at the moment it's one of the more actively-pursued subsets, at least in digital humanities.) Hope this helps, Glen