From: Sally Morris <[log in to unmask]> Date: Tue, 19 Feb 2013 21:21:56 +0000 This type of error is going to make text mining very difficult... Sally Morris South House, The Street, Clapham, Worthing, West Sussex, UK BN13 3UU Email: [log in to unmask] -----Original Message----- From: Jan Velterop <[log in to unmask]> Date: Tue, 19 Feb 2013 08:37:47 +0000 Poor language and spelling errors are rife in the published literature, regardless of the business model. Errors range from author-originated to typesetting-introduced and clearly peer review and copy editing (if any) are not adequate to deal with them. As an example, because it is very easy to check, I'd like to mention the β vs ß problem (using the latter, the German sharp s, for the former, the bèta). Just search any publisher platform for ß and you'll find plenty of instances where it obviously should have been β. Errors like this, and in e.g. the spelling of chemical structures, require extra, sometimes extraordinarily complicated, efforts to interpret them properly when the literature is being machine-read. And the literature will have to be machine-read more and more due to the 'overwhelm' of scientific articles being published, beyond the reasonable ability for most researchers to read, making machine analysis imperative. (This is an interesting reference in regard of the 'overwhelm': Alan G Fraser and Frank D Dunstan "On the impossibility of being expert" BMJ 2010; 341 doi: http://dx.doi.org/10.1136/bmj.c6815 — Published 14 December 2010) Fortunately there are extremely clever people able to develop algorithms to deal with many such errors, but it is a great shame that they make it into the literature — into the 'version of record' — in the first place at the scale they do. Jan Velterop