From: Jan Velterop <[log in to unmask]> Date: Fri, 22 Feb 2013 09:51:07 +0000 Well, indeed, Sally. And it can be much worse still. See Peter Murray-Rust's example of a fake epsilon that's actually a doubly mirrored italic 3: http://blogs.ch.cam.ac.uk/pmr/2013/02/21/why-should-we-continue-to-pay-typesetterspublishers-lots-of-money-to-process-and-even-destroy-science-and-a-puzzle-for-you/ Jan Velterop On 20 Feb 2013, at 21:01, LIBLICENSE wrote: > From: Sally Morris <[log in to unmask]> > Date: Tue, 19 Feb 2013 21:21:56 +0000 > > This type of error is going to make text mining very difficult... > > Sally Morris > South House, The Street, Clapham, Worthing, West Sussex, UK BN13 3UU > Email: [log in to unmask] > > -----Original Message----- > From: Jan Velterop <[log in to unmask]> > Date: Tue, 19 Feb 2013 08:37:47 +0000 > > Poor language and spelling errors are rife in the published > literature, regardless of the business model. Errors range from > author-originated to typesetting-introduced and clearly peer review > and copy editing (if any) are not adequate to deal with them. As an > example, because it is very easy to check, I'd like to mention the β > vs ß problem (using the latter, the German sharp s, for the former, > the bèta). Just search any publisher platform for ß and you'll find > plenty of instances where it obviously should have been β. Errors like > this, and in e.g. the spelling of chemical structures, require extra, > sometimes extraordinarily complicated, efforts to interpret them > properly when the literature is being machine-read. And the literature > will have to be machine-read more and more due to the 'overwhelm' of > scientific articles being published, beyond the reasonable ability for > most researchers to read, making machine analysis imperative. (This is > an interesting reference in regard of the 'overwhelm': Alan G Fraser > and Frank D Dunstan "On the impossibility of being expert" BMJ 2010; > 341 doi: http://dx.doi.org/10.1136/bmj.c6815 — Published 14 December > 2010) > > Fortunately there are extremely clever people able to develop > algorithms to deal with many such errors, but it is a great shame that > they make it into the literature — into the 'version of record' — in > the first place at the scale they do. > > Jan Velterop