LISTSERV - LIBLICENSE-L Archives

From: Sally Morris <[log in to unmask]>
Date: Tue, 19 Feb 2013 21:21:56 +0000

This type of error is going to make text mining very difficult...

Sally Morris
South House, The Street, Clapham, Worthing, West Sussex, UK  BN13 3UU
Email:  [log in to unmask]

-----Original Message-----
From: Jan Velterop <[log in to unmask]>
Date: Tue, 19 Feb 2013 08:37:47 +0000

Poor language and spelling errors are rife in the published
literature, regardless of the business model. Errors range from
author-originated to typesetting-introduced and clearly peer review
and copy editing (if any) are not adequate to deal with them. As an
example, because it is very easy to check, I'd like to mention the β
vs ß problem (using the latter, the German sharp s, for the former,
the bèta). Just search any publisher platform for ß and you'll find
plenty of instances where it obviously should have been β. Errors like
this, and in e.g. the spelling of chemical structures, require extra,
sometimes extraordinarily complicated, efforts to interpret them
properly when the literature is being machine-read. And the literature
will have to be machine-read more and more due to the 'overwhelm' of
scientific articles being published, beyond the reasonable ability for
most researchers to read, making machine analysis imperative. (This is
an interesting reference in regard of the 'overwhelm': Alan G Fraser
and Frank D Dunstan "On the impossibility of being expert" BMJ 2010;
341 doi: http://dx.doi.org/10.1136/bmj.c6815 — Published 14 December
2010)

Fortunately there are extremely clever people able to develop
algorithms to deal with many such errors, but it is a great shame that
they make it into the literature — into the 'version of record' — in
the first place at the scale they do.

Jan Velterop