From: "Evans, Gwen" <[log in to unmask]> Date: Thu, 5 May 2016 15:49:50 +0000 OhioLINK users (which include almost all higher ed in the state of Ohio, with Ohio State being the largest, and including the Cleveland Clinic which is a heavy user) downloaded 11.5 million articles in 2015 (publisher and our locally loaded Electronic Journal Center). This is only from the OhioLINK consortial licensed content, but that includes several big publishers (Elsevier, Wiley, Springer, ACS et al). As a system, we don’t have the same research intensity that CDL covers, but 11.5 million for one state versus 47 million globally for SciHub is 24% of the total activity recorded for SciHub. Whoever is downloading from Scihub in Columbus (no other Ohio region has similar activity) represents less than 1% of the comparable statewide academically sanctioned activity in just the OhioLINK collective packages, and the percentage would shrink if sanctioned activity for the locally subscribed Ohio State resources were added. We are interested in trying to figure out that how much of it really is frustrated academics driving the use. We downloaded the data set because we were curious to see whether we could identify the use in Columbus OH. Columbus is the home of Ohio State, sure, but it is the state capital and home to Battelle Research, a number of large medical systems, Chemical Abstracts Service, several corporate headquarters in manufacturing, and to paraphrase a failed US VP candidate, “I can see a large Abbott Labs facility from my window!” The dataset doesn’t include the actual IP addresses (for obvious reasons laid out in the article), just the geolocation tag, so we couldn’t do a comparison without a lot of extra work. But I fully agree that trying to easily find and get sanctioned articles is a miserable experience for academics, and it is certainly due to publisher restrictions and the authentication mechanisms required by them. It's infuriating that the publisher quoted in the article blames libraries. We have clear data and studies that show that many, many researchers go directly to big publisher platforms, not through the library website or other sanctioned portals. Michael Clarke’s blog post on Scholarly Kitchen of yesterday is precisely to the point https://scholarlykitchen.sspnet.org/2016/05/04/accessing-publisher-resources-via-a-mobile-device-a-users-journey/. It’s the publishers that throw up the very first hurdles to sanctioned access for the most common path to the content. If a user goes directly from a google search to the publisher platform, and can’t get it, why are you pointing the finger at us? We didn’t build those systems. And they aren’t necessarily standardized across all vendors, and certainly campus IT authentication and proxy sytems aren’t standardized (and if you think libraries are always consulted before a campus chooses an auth system, think again) and even in states and systems that are trying to implement eduroam, that is a long and lengthy process and involves actual IT investment and cooperation across many different institutions. Notice that it is institutions having to invest more real dollars in trying to make content more seamlessly accessible to the very researchers that provided it without cost to the publishers, in terms of authentication, proxies, VPNs, staffing and management, and discovery tools. So to Ivy’s, Ann’s, and Lisa’s points, there are a lot of literally uncalculated cost benefits to a fully open access model for higher education that go beyond the obvious ones. I also think it’s a bit unfair to expect the relatively narrow higher ed library market to deliver results comparable to Google. Reliable article level indexing and normalization at scale isn’t cheap or easy. And OhioLINK (since we locally load content from publishers in our own multi-publisher journal platform) can attest that the metadata delivered by publishers varies in quality, type, standardization, etc. and requires a lot of work. And we have no idea how often SciHub misses in delivering an article from a search even if it is in fact in their database. Their ease of use comes from the lack of authentication barriers primarily — if I find it, I get it. Best, Gwen (Okay, I can no longer see Abbott from my window because we moved offices, but I used to be able to!) Gwen Evans Executive Director, OhioLINK http://www.ohiolink.edu/ [log in to unmask] 1224 Kinnear Rd Columbus, Ohio 43212 ORCID ID:0000-0002-4560-0435 Per Ohio Revised Code, this communication and any attachments may constitute a public record. (http://codes.ohio.gov/orc/149.43) To: LibLicense-L Discussion Forum <[log in to unmask]> Subject: Re: “Who’s Downloading Pirated Papers? Everyone” (Sci-Hub Data) From: kalev leetaru <[log in to unmask]> Date: Wed, 4 May 2016 08:51:30 -0400 In case it is of interest, here is my take on SciHub and the trend it represents in academic publishing: http://www.forbes.com/sites/kalevleetaru/2016/04/29/the-future-of-open-access-why-has-academia-not-embraced-the-internet-revolution/ To second Ann's comments, one of the most striking things to me about that Science piece is just how heavily SciHub is apparently being used at Western academic institutions which likely have legal subscriptions to the journals in question. That to me stands testament to just how awful current academic library journal subscription search systems are. I can personally attest from 15 years at various institutions public to private just how impossible it can be to just find the fulltext of a particular journal article even when you know your institution subscribes to the journal and issue in question. Or you search and find 10 different copies from 10 different services the institution subscribes to, but some are abstracts only, some are ASCII text only with no figures, and so on. Or you want to see an entire issue of a journal and you find multiple subscriptions that purport to include the journal, but then when you browse through, after having clicked through screen after screen, you find that some subscriptions have time delays so don't include the most recent issue or end or start on a particular date or only have samples of the journal, etc. Its a huge huge mess today. Its not librarys' exclusive faults, but I do think there is immense room for improvement - even I ended up in the habit of going to Google Scholar first to have it link me into my library's subscriptions, since it at least seemed to be able to track down whether my library had a copy and connect me directly to the best copy that had fulltext and images. Google Scholar is far from perfect, but as a researcher who does intense deep dives into the literature, it is the model that I think libraries simply have to adopt to stay relevant and serve their communities. It simply can't be that I can spend half a hour to an hour (sometimes several hours) just trying to track down a journal article in the myriad mess of a typical academic library's esubscriptions system - I should be able to search for the article and jump right to the best available copy with a mouse click. While slightly tangent, the Science article also alludes to the possibility that SciHub downloaders are using it for text mining. That is another area where academic libraries need to play a much bigger role in helping academics. I myself have always found libraries to be highly adversarial when it comes to connecting researchers with publishers to explore possible collaborations and in fact libraries have always been the primary obstacle for me in my 15 years data mining in the academic world. Instead I've always had to reach out directly to publishers after my home institution library would push back saying it was not their job to help connect researchers or would otherwise not invest any effort of any kind in helping make those connections. I've found publishers to be extremely helpful and open in supporting large-scale data mining when approached - from 21 billion words of academic literature (http://dlib.org/dlib/september14/leetaru/09leetaru.html) to my dissertation (http://www.kalevleetaru.com/Publish/Leetaru_Dissertation_Can_We_Forecast_Conflict-Dissertation.pdf) to a myriad other initiatives I oversee (http://blog.gdeltproject.org/) - see more in my NFAIS opening keynote: (http://kalevleetaru.com/Publish/ISU2015-Leetaru-Mining-Libraries.pdf), but in every case I reached out directly to the publishers after my home institutional library failed to be of any help in forging connections and collaborations. Libraries have a lot they can offer there in helping to connect scholars with publishers and assisting in that process to ensure legal data mining that benefits all sides, but they need to recognize that if they put their foot down and say it is not their role, researchers will simply go right around them directly to the publishers, further reducing the library's role in academic life. To me SciHub appears to be less a service for poorer nations to access scholarship financially inaccessible to their institutions and more a reaction to the just plain horrific state of access to academic scholarship today, from extreme costs of subscriptions to the awful state of library access portals. ~Kalev http://kalevleetaru.com http://blog.gdeltproject.org/