Viewing a single comment thread. View all comments

SketchyApothecary t1_j8qjemm wrote

This isn't limited to Xerox. Lots of scanning recognition devices/programs have trouble differentiating 6s and 8s in some fonts when trying to convert images to text fields, and occasionally other numbers get mixed up as well.

33

n1gr3d0 t1_j8qt5x3 wrote

The fun part is that was not about recognition. Scanning shouldn't do any OCR, so in that context any meaningful character manipulation (like replacing one character with another one) look shady as hell. Thankfully it turned out to be just an overly zealous compression algorithm.

63

surelythisisfree t1_j8rdhnl wrote

Most copier manufacturers have their own pdf compression that generally puts a scanned page between 50 and 150kB, down from about a Meg if they don’t do anything fancy. I only realised after years of working with them how that isolated out things that looked like letters and basically averaged each letter representation on the page to slow better compression. The only reason I realised was due to a big in a released firmware that only affected compact pdf (that was quickly pulled within 24 hours). The big basically made all the letters not line up in a row on each page so they’d move up and down the line a bit.

10

Gathorall t1_j8r7dsg wrote

That tracks, magnitudes easier "though shouldn't matter nowadays" to tell the head to put "8 in black" in a certain spot rather than tell the precise location and color of every constituting dot.

−3

purchankruly t1_j8rxyk1 wrote

Why oh why did we switch from Roman numerals?!

3

girhen t1_j8soe26 wrote

MCCXXXIVDLXVII - >!1,234,567!<

MMMCDLVMDCCLXXXIX - >!3,456,789!<

I know you're joking, but it's always nice to add perspective.

Also, some of my coworkers frequently have to read legal documents in paragraphs, sections, etc and convert lines to outlines. Nothing like deciphering section i from section i in an outline - meaning a section after h and before j at one level vs the first Roman numeral at another. It's a PITA.

9