From: Ann Shumelda Okerson <[log in to unmask]> Date: Mon, 5 Nov 2012 04:54:29 -0500 Forwarded by Paul Zarins, of Stanford University Library, below is a message from Glen Worthy, Stanford's Digital Humanities LIbrarian. ________________________________ From: "Glen Worthey" <[log in to unmask]> To: "Pavils Zarins" <[log in to unmask]> Sent: Thursday, November 1, 2012 5:02:54 PM Subject: Re: Fwd: Suggested Readings in Text Mining? My bias will be pretty obvious to you -- but as far as I'm concerned, regarding text mining specifically for humanities research, Matt Jockers is the very best. Here is a set of several highly relevant blog posts from him: http://www.matthewjockers.net/category/tm/ the best and most entertaining of which is basically a chapter from his book /Macroanalysis: Digital Methods and Literary History /(due out early next year): http://www.matthewjockers.net/2011/09/29/the-lda-buffet-is-now-open-or-latent-dirichlet-allocation-for-english-majors/ I suspect that Ann (and others on the Liblicence list) may be especially interested in this: Matt was also co-author (on behalf of digital humanities and legal scholars) of an amicus brief that was filed in the Authors Guild v. HathiTrust case: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2102542 and which was frequently cited by the judge in his decision. Obviously, text mining is not the main focus of this brief, but it does play a strikingly prominent role in what turned out to be a very important legal document. Finally, as just a portal into the huge world of text mining for humanities research, see this very helpful "progressive" (that is, progressing from "beginner" to "expert" level) review article with links aplenty: "Topic Modeling for Humanists: A Guided Tour" http://www.scottbot.net/HIAL/?p=19113 (Note that, for some purposes -- though not all! -- "topic modeling" is rough synonym for "text mining." It's probably better characterized as a subset of text mining, but I believe at the moment it's one of the more actively-pursued subsets, at least in digital humanities.) Hope this helps, Glen