From: Bernie Reilly <[log in to unmask]>
Date: Fri, 7 Feb 2014 19:03:10 +0000
Good discussion. It is interesting to watch the legal and technical
responses to text mining, and attempts to circumvent them, outside the
scholarly journals world. Linguists at the University of
Pennsylvania, for example, have been successfully capturing and mining
the content of news broadcasts from the Middle East, although on a
relatively small scale. In the commercial sector aggregators, news
reader services, and even data journalists routinely mine and scrape
content from open web sites. This has mostly been in the service of
repackaging the information and creating new products and services.
There's definitely been pushback from some media organizations whose
content is being mined. A number of European news organizations, for
example, have sued and/or set up technical barriers against Google,
which crawls and mines thousands of online news sites and displays
headlines and other content in its Google News service.
There is also a lot of this in the financial world, where, using
proprietary applications, Bloomberg, Reuters, Dow Jones, and others
regularly mine news feeds and big data from government sources.
Interesting times and a complex issue.
From: Sandy Thatcher <[log in to unmask]>
Date: Thu, 6 Feb 2014 13:54:55 -0600
Maybe so, but as Michael Carroll cautioned in his post on this
subject, "In Europe, the legal situation is more complicated because
of database rights and the absence of fair use."
> From: Marcin Wojnarski <[log in to unmask]>
> Date: Wed, 5 Feb 2014 15:10:38 +0100
> As a data mining specialist, I've followed the different discussions
> about mining scholarly publications for some time already, and I've
> noticed that there is a big confusion about the legal nature of text
> mining and the true origin of restrictions related to it.
> 1) Restrictions imposed on text mining are technical, not legal.
> Publishers impose technical limits on how much content can be
> downloaded in a given period of time, and if someone downloads too
> much, the university may get cut off from publisher's servers. This is
> regulated legally, of course, but only in the agreement signed between
> the university and the publisher, not by general law, the least by
> copyright. What exact terms are signed is a matter of mutual agreement
> between parties - they can agree on whatever they want - so blaming
> copyright for limited bandwidth to publisher's server, as often done
> in discussions about data mining of academic papers, is unreasonable.
> 2) Restrictions are related to subscription content alone. There are
> no ways to impose restrictions on mining Open Access content, even if
> OA means only "free" OA. Even more: if I get access to a paper
> illegally and mine it, I can only be accused of illegal copying, but
> not of text mining. That's because copyright law has nothing to do
> with mining, these are two different things.
> Data mining is related to *information* contained in the paper, and
> not to the paper itself; whereas the copyright protects only the paper
> as a creative work, in its literal and graphical form, not the
> information contained in it. It's important to see the distinction.
> It's true what Ross Mounce said that "the right to read is the right
> to mine". I would say even more: mining does NOT need any right. Data
> mining is just another name for collecting statistics. And it's my
> *personal freedom* to collect whatever stats I want, from whatever
> papers I want, nobody can forbid me to do this. Thus, if I'm lucky
> enough to see the paper - on whatever legal basis, or even none at all
> - it's only my business what I do with information that I obtained in
> this way.
> Marcin Wojnarski
> Marcin Wojnarski, Founder and CEO, TunedIT http://tunedit.org
> http://www.facebook.com/TunedIT http://twitter.com/TunedIT