From: Marcin Wojnarski <[log in to unmask]>
Date: Wed, 5 Feb 2014 15:10:38 +0100
As a data mining specialist, I've followed the different discussions
about mining scholarly publications for some time already, and I've
noticed that there is a big confusion about the legal nature of text
mining and the true origin of restrictions related to it.
1) Restrictions imposed on text mining are technical, not legal.
Publishers impose technical limits on how much content can be
downloaded in a given period of time, and if someone downloads too
much, the university may get cut off from publisher's servers. This is
regulated legally, of course, but only in the agreement signed between
the university and the publisher, not by general law, the least by
copyright. What exact terms are signed is a matter of mutual agreement
between parties - they can agree on whatever they want - so blaming
copyright for limited bandwidth to publisher's server, as often done
in discussions about data mining of academic papers, is unreasonable.
2) Restrictions are related to subscription content alone. There are
no ways to impose restrictions on mining Open Access content, even if
OA means only "free" OA. Even more: if I get access to a paper
illegally and mine it, I can only be accused of illegal copying, but
not of text mining. That's because copyright law has nothing to do
with mining, these are two different things.
Data mining is related to *information* contained in the paper, and
not to the paper itself; whereas the copyright protects only the paper
as a creative work, in its literal and graphical form, not the
information contained in it. It's important to see the distinction.
It's true what Ross Mounce said that "the right to read is the right
to mine". I would say even more: mining does NOT need any right. Data
mining is just another name for collecting statistics. And it's my
*personal freedom* to collect whatever stats I want, from whatever
papers I want, nobody can forbid me to do this. Thus, if I'm lucky
enough to see the paper - on whatever legal basis, or even none at all
- it's only my business what I do with information that I obtained in
Marcin Wojnarski, Founder and CEO, TunedIT
TunedIT - Online Laboratory for Intelligent Algorithms
On 02/05/2014 12:20 AM, LIBLICENSE wrote:
> From: Ivy Anderson <[log in to unmask]>
> Date: Tue, 4 Feb 2014 05:32:56 +0000
> This short article from Nature News may be of interest to LibLicense readers:
> Elsevier opens its papers to text-mining
> Researchers welcome easier access for harvesting content, but some
> spurn tight controls.
> Richard Van Noorden
> 03 February 2014
> Ivy Anderson
> Director of Collections, California Digital Library
> University of California