From: kalev leetaru <[log in to unmask]>
Date: Thu, 21 Jun 2018 22:31:43 -0400

Christina, indeed and I was actually behind two of the large data grants programs, as well as advising several of the large publishers on their TDM offerings (and what may eventually be rolled into public TDM offerings).

My apologies, I should have clarified that I was speaking with regards to the upper tier of DHASS - that last mile of the large and ultra-large projects that aren't supported under typical TDM offerings, as well as the legal and technical limitations posed by many of the current TDM models. Ie, the kinds of projects that have a need to bulk ingest the entirety of Proquest's Historical Newspaper archive or the entirely of Nexis' national and international holdings or the live stream of all Bloomberg content or year-long access to the Decahose and so on.

TDM for run of the mill DHASS projects, as you correctly point out, was a solved problem even when I began working with HASS communities just under two decades ago. The challenge is in those final large scale projects, few of which align themselves with current models. To put it another way, HathiTrust's wonderful APIs work spectacularly for 99% of projects, but there's a reason they support both a research cloud and boxing up their entire PD corpus and shipping it to researchers - there are those projects that legitimately need access to absolutely every page of the entire PD corpus.

Precious few publishers support projects that involve the export of their entire in-copyright holdings, especially those spanning large numbers of legal jurisdictions and there has been a lot of complexity and legal conflicts, ranging from content licensing to authorship agreements by the original publishers. It's an immensely complex landscape that looks very different from the publisher and librarian sides of the equation, and thus as I said in my note I'm primarily interested in hearing from publishers. Though, academic libraries that have successfully negotiated large TDM offerings, involving the shipment or cloud access to corpuses on the order of many tens or hundreds of millions or billions of documents for a single analysis - I would also be very interested in hearing from in terms of the legal and technical solutions they found.

Kalev


On Thu, Jun 21, 2018 at 8:00 PM, LIBLICENSE <[log in to unmask]> wrote:
From: "Pikas, Christina K." <[log in to unmask]>
Date: Thu, 21 Jun 2018 11:12:40 +0000

(Not endorsing any vendor listed below)

 

I feel like you may have missed the past 5-10 years of activity on this front? Digital humanities is a well-established field with lots of active participants. CrossRef has TDM support (http://tdmsupport.crossref.org/). PubMed Central is of course used a lot and there was talk about the public access requirement creating new opportunities. Elsevier has a number of APIs although most of their use cases are for research evaluation and institutional repository uses.

 

Maybe publishers on the list would like to point you to their help pages.

 

Christina

 


From: kalev leetaru <[log in to unmask]>

Date: Wed, 20 Jun 2018 14:53:30 -0400

Thought many of you on the list would find of interest my latest piece, which summarizes a number of thoughts, comments, situations and ideas I've heard from both the academic and publisher worlds regarding the state and future of publishers supporting TDM:

 

 

Would be very interested in hear offline from publishers on this list as to any TDM offerings they currently have, ones they are planning or considering launching or past offerings they shut down and why.

 

Kalev