From: kalev leetaru <[log in to unmask]>
Date: Thu, 21 Jun 2018 22:31:43 -0400

Christina, indeed and I was actually behind two of the large data grants
programs, as well as advising several of the large publishers on their TDM
offerings (and what may eventually be rolled into public TDM offerings).

My apologies, I should have clarified that I was speaking with regards to
the upper tier of DHASS - that last mile of the large and ultra-large
projects that aren't supported under typical TDM offerings, as well as the
legal and technical limitations posed by many of the current TDM models.
Ie, the kinds of projects that have a need to bulk ingest the entirety of
Proquest's Historical Newspaper archive or the entirely of Nexis' national
and international holdings or the live stream of all Bloomberg content or
year-long access to the Decahose and so on.

TDM for run of the mill DHASS projects, as you correctly point out, was a
solved problem even when I began working with HASS communities just under
two decades ago. The challenge is in those final large scale projects, few
of which align themselves with current models. To put it another way,
HathiTrust's wonderful APIs work spectacularly for 99% of projects, but
there's a reason they support both a research cloud and boxing up their
entire PD corpus and shipping it to researchers - there are those projects
that legitimately need access to absolutely every page of the entire PD
corpus.

Precious few publishers support projects that involve the export of their
entire in-copyright holdings, especially those spanning large numbers of
legal jurisdictions and there has been a lot of complexity and legal
conflicts, ranging from content licensing to authorship agreements by the
original publishers. It's an immensely complex landscape that looks very
different from the publisher and librarian sides of the equation, and thus
as I said in my note I'm primarily interested in hearing from publishers.
Though, academic libraries that have successfully negotiated large TDM
offerings, involving the shipment or cloud access to corpuses on the order
of many tens or hundreds of millions or billions of documents for a single
analysis - I would also be very interested in hearing from in terms of the
legal and technical solutions they found.

Kalev


On Thu, Jun 21, 2018 at 8:00 PM, LIBLICENSE <[log in to unmask]> wrote:

> From: "Pikas, Christina K." <[log in to unmask]>
> Date: Thu, 21 Jun 2018 11:12:40 +0000
>
> (Not endorsing any vendor listed below)
>
>
>
> I feel like you may have missed the past 5-10 years of activity on this
> front? Digital humanities is a well-established field with lots of active
> participants. CrossRef has TDM support (http://tdmsupport.crossref.org/).
> PubMed Central is of course used a lot and there was talk about the public
> access requirement creating new opportunities. Elsevier has a number of
> APIs although most of their use cases are for research evaluation and
> institutional repository uses.
>
>
>
> Maybe publishers on the list would like to point you to their help pages.
>
>
>
> Christina
>
>
>
>
> From: kalev leetaru <[log in to unmask]>
>
> Date: Wed, 20 Jun 2018 14:53:30 -0400
>
> Thought many of you on the list would find of interest my latest piece,
> which summarizes a number of thoughts, comments, situations and ideas I've
> heard from both the academic and publisher worlds regarding the state and
> future of publishers supporting TDM:
>
>
>
> https://www.forbes.com/sites/kalevleetaru/2018/06/20/how-the
> -cloud-could-empower-the-future-of-academic-publisher-text-a
> nd-data-mining/
>
>
>
> Would be very interested in hear offline from publishers on this list as
> to any TDM offerings they currently have, ones they are planning or
> considering launching or past offerings they shut down and why.
>
>
>
> Kalev
>