LISTSERV - LIBLICENSE-L Archives

From: Jan Velterop <[log in to unmask]>
Date: Fri, 22 Feb 2013 09:51:07 +0000

Well, indeed, Sally. And it can be much worse still. See Peter
Murray-Rust's example of a fake epsilon that's actually a doubly
mirrored italic 3:

http://blogs.ch.cam.ac.uk/pmr/2013/02/21/why-should-we-continue-to-pay-typesetterspublishers-lots-of-money-to-process-and-even-destroy-science-and-a-puzzle-for-you/

Jan Velterop

On 20 Feb 2013, at 21:01, LIBLICENSE wrote:

> From: Sally Morris <[log in to unmask]>
> Date: Tue, 19 Feb 2013 21:21:56 +0000
>
> This type of error is going to make text mining very difficult...
>
> Sally Morris
> South House, The Street, Clapham, Worthing, West Sussex, UK  BN13 3UU
> Email:  [log in to unmask]
>
> -----Original Message-----
> From: Jan Velterop <[log in to unmask]>
> Date: Tue, 19 Feb 2013 08:37:47 +0000
>
> Poor language and spelling errors are rife in the published
> literature, regardless of the business model. Errors range from
> author-originated to typesetting-introduced and clearly peer review
> and copy editing (if any) are not adequate to deal with them. As an
> example, because it is very easy to check, I'd like to mention the β
> vs ß problem (using the latter, the German sharp s, for the former,
> the bèta). Just search any publisher platform for ß and you'll find
> plenty of instances where it obviously should have been β. Errors like
> this, and in e.g. the spelling of chemical structures, require extra,
> sometimes extraordinarily complicated, efforts to interpret them
> properly when the literature is being machine-read. And the literature
> will have to be machine-read more and more due to the 'overwhelm' of
> scientific articles being published, beyond the reasonable ability for
> most researchers to read, making machine analysis imperative. (This is
> an interesting reference in regard of the 'overwhelm': Alan G Fraser
> and Frank D Dunstan "On the impossibility of being expert" BMJ 2010;
> 341 doi: http://dx.doi.org/10.1136/bmj.c6815 — Published 14 December
> 2010)
>
> Fortunately there are extremely clever people able to develop
> algorithms to deal with many such errors, but it is a great shame that
> they make it into the literature — into the 'version of record' — in
> the first place at the scale they do.
>
> Jan Velterop