Xerox Scanners / Photocopiers Randomly Alter Numbers

News|The FontFeed|Yves Peters 2013-08-06 02:16:17

Thanks to a tip from Jörg Haubrichs, software developer at FontShop International, I read an insane article this weekend that frankly blew my typographic mind. Last Wednesday German computer scientist David Kriesel did a bizarre discovery. After scanning a construction plan on a Xerox Workcentre and printing it, he noticed the perfectly reproduced plan suddenly contained incorrect numbers. And he only found out the mistake because the copy of the construction plan told him one room was about 22 square meters large, whereas the adjacent – visibly larger – room was labeled only 14 square meters. The Xerox Workcentre somehow changed the numbers whilst scanning. At first glance the scanned images looked perfectly fine, but only upon closer inspection they found some numbers on those images turned out to be incorrect. The implications of these baffling substitutions are far more serious and far-ranging than one would suspect, because the issue trumps the expectations of the user – copies that are supposed to be identical are far from it.

On his website David Kriesel explains at length and in detail how he stumbled upon the anomalies in Xerox Workcentre 7535 and 7556 machines, and analyses what causes the problem – a compression algorithm randomly replaces patches of pixel data in an almost unnoticeable way. Apparently Xerox machines use JBIG2 for compression, an algorithm that creates a dictionary of image patches it considers similar. As long as the error generated by these patches is not too high, the machine reuses them instead of using the original image data. This also would explain why the error occurs when letters or numbers are scanned in moderate resolution, yet still readable for the human eye. When the letter size is close to the patch size of JBIG2, complete letters and even blocks of letters that look similar to the machine are replaced.

A cost overview scanned on the WorkCentre 7535. At first glance the copy looks correct, but when you realise values in such tables are usually sorted in ascending order you notice the wrong numbers. This is not a simple pixel error either, because you can clearly see the characteristic dent on the left side of the 8 in contrast with the smooth curve on the 6.

Image courtesy of David Kriesel's website

Why is this issue so crucial? First of all, these are widespread machines, commonly used in service centres and copy shops, and Xerox seemed to be unaware of the issue until David Kriesel notified them last Wednesday. Second, the error existed in a very old version of the software that was installed on the machine, and had not been solved in the most recent software update. If you relied on Xerox Workcentres for your copies you can only wonder how many incorrect documents – even though they looked correct – you produced these past few years. Did you pass them on to others? What dangers do errors in the numbers on those documents represent? Can you be sued for such errors? Indeed, you have to appreciate the issue goes well beyond merely financial problems created by swapped numbers on invoices, accounting spreadsheets and other financial documents. Numbers on documents can also have life-or-death importance which cannot be underestimated. Imagine copies of construction plans for a building or a bridge where the numbers have been altered. In a worst-case scenario architectural structures like these could collapse and claim victims. The same goes for medication, where incorrect doses could lead to serious consequences for the patient's health, even causing one's death.

As a test David Kriesel printed series of numbers in 7pt Arial, scanned them, OCRed them and compared them to the original ones. Observe how the sixes around the false eights look correct. Here too the false eights contain the characteristic dent, meaning complete image patches were replaced again.

Image courtesy of David Kriesel's website

Part of the problem also lies in the widespread use of Helvetica and its clone Arial as default typefaces in digital documents. This ties in with the ongoing criticism they receive, most recently for the persistent use of Helvetica as the system font in iOS. Even though the resolution on David's construction plan is not very fine, the numbers are perfectly distinguishable. That is, for the human eye, because we can interpret character forms and detect subtle differences in them far better than machines – that's why captchas exist. The Xerox WorkCentre however has more difficulties reading the closed forms of the neo-grotesque numbers, which causes it to mix up 6, 8 and 9 for example. This once again suggests that – contrary to the misguided belief of hordes of turtle-necked goatee-stroking hipsters – Helvetica is a pretty crappy typeface, especially for text, and so is Arial by association. And sorry, my dear architect friends – Eurostile is even worse.

To figure out (pardon the pun) what clear number shapes should look like, all you need to do is look at typefaces specifically designed to be used in applications where the numbers are of utmost importance. One of the seminal designs in that respect is Chauncey H. Griffith's Bell Gothic, first used in Manhattan's Fall 1937 phone directory. Notice how different those same three numbers are, and how they can be easily told apart by both man and machine.

And now that we've broached the subject of numbers that need to be read correctly by machines, the yardstick here is Adrian Frutiger's iconic OCR-B. Here again you can see how the 6, 8 and 9 are completely different from each other.

When numbers have to distinguished when traveling at high speed, instant recognition is equally imperative. This is why Interstate for example also is a good reference for well-shaped numerals.

So next time you need to specify type for text, look beyond the letters and also examine the shapes of the numerals. As for me, I will follow the development of this fascinating story with interest.

For more information on numerals read my two reference posts on The FontFeed:

The FontFeed | Figuring Out NumeralsThe FontFeed | Figuring Out Numerals – The Sequel

Header image:Xerox Workcentre 7545, courtesy of Xerox

China Dragon

China Dragon

Pretty

Pretty

FontShop

FontShop