LIBLICENSE-L Archives

LibLicense-L Discussion Forum

LIBLICENSE-L@LISTSERV.CRL.EDU

Options: Use Forum View

Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
LIBLICENSE <[log in to unmask]>
Reply To:
LibLicense-L Discussion Forum <[log in to unmask]>
Date:
Sun, 17 Jun 2012 19:25:41 -0400
Content-Type:
text/plain
Parts/Attachments:
text/plain (77 lines)
From: Bernie Reilly <[log in to unmask]>
Date: Fri, 15 Jun 2012 21:44:19 +0000

An interesting item on text mining and analysis, specifically of
Wikipedia, in today's New York Times.
http://bits.blogs.nytimes.com/2012/06/14/how-big-data-sees-wikipedia/

Demonstrates the advantages of open content for this kind of
application.  Kalev Leetaru is the programmer behind the Cline Center
project mentioned in my earlier posting.

Bernie Reilly
CRL Global Resources

-----Original Message-----
From: Bernie Reilly <[log in to unmask]>
Date: Thu, 14 Jun 2012 02:35:46 +0000

CRL has seen a growing demand for text-mining outside of the world of
science e-journals.  That demand seems to be circumventing the
standard licenses and permissions, but is happening nonetheless.  The
efforts I speak of are in the humanities, social sciences, and
business research communities. Much of it is being done with open Web
and commercially published content, particularly published news texts.

The most sophisticated and ambitious work we've seen is in the fields
of business, public affairs, and political science.  The pioneers in
this, not suprisingly, are the U.S. intelligence agencies and their
contractors, who are continuously mining the Web for evidence of
threats to American interests in foreign postings and broadcasts.  But
business researchers are close behind them.  Matthew Gentzkow and
Jesse M. Shapiro at the University of Chicago's Booth School of
Business, in their 2010 study "What drives media slant? Evidence from
U.S. Daily Newspapers," used automated processing of texts from
speeches published in The Congressional Record (online version) and
newspaper articles from a large ProQuest database to systematically
map demographic trends affecting big-city newspaper publishing. (See
the article here:
http://faculty.chicagobooth.edu/jesse.shapiro/research/biasmeas.pdf)

Researchers at the Berkman Center for the Internet and Society at
Harvard have done some large-scale computer-assisted processing of
open Web content (i.e., blog, Twitter feeds), relying heavily on tools
and processing services originally developed for the business world.
In 2008 the Berkman Center, assisted by a firm called Morningside
Analytics, published an impressive high-level analysis of political
chatter on the Iranian blogosphere.  They "mapped" the topics
discussed in more than 60,000 blogs over a period of several months,
detecting thought trends, spheres of influence, and so forth.  (See
the report at http://cyber.law.harvard.edu/publications/2008/Mapping_Irans_Online_Public.)

Then there are the home-grown academic projects, one of the most
robust being the Cline Center for Democracy's Societal Infrastructure
and Development Project (SID). For several years now, the Cline Center
has been mining a massive corpus of post-World War II news text,
gathered from various commercial publishers, in order to test certain
theories about economic development and civil society.  (See
http://www.clinecenter.illinois.edu/research/sid-project.html.)

This activity suggests that building the rights to text mining and
computer-assisted analysis into standard licenses for commercial
databases is going to be difficult without knowing a lot more about
the practices of this kind of research.  CRL recently hosted a forum
"New Horizons in Primary Source Research" that featured two very good
presentations about mining and analysis of primary text content by
researchers in the humanities and social sciences:  Analysis and
Visualization Using Large Bodies of Electronic Text, by Elizabeth Long
and Peter Leonard of the University of Chicago, and Old News, New
Research: Observations from the Field, by Debora Cheney, of Penn
State.  (You can find links to the presentations here:
http://www.crl.edu/events/7447.)

Hope you don't consider this too tangential.

Bernie Reilly
CRL Global Resources

ATOM RSS1 RSS2