Viewing a single comment thread. View all comments

n1gr3d0 t1_j8qt5x3 wrote

The fun part is that was not about recognition. Scanning shouldn't do any OCR, so in that context any meaningful character manipulation (like replacing one character with another one) look shady as hell. Thankfully it turned out to be just an overly zealous compression algorithm.

63

surelythisisfree t1_j8rdhnl wrote

Most copier manufacturers have their own pdf compression that generally puts a scanned page between 50 and 150kB, down from about a Meg if they don’t do anything fancy. I only realised after years of working with them how that isolated out things that looked like letters and basically averaged each letter representation on the page to slow better compression. The only reason I realised was due to a big in a released firmware that only affected compact pdf (that was quickly pulled within 24 hours). The big basically made all the letters not line up in a row on each page so they’d move up and down the line a bit.

10

Gathorall t1_j8r7dsg wrote

That tracks, magnitudes easier "though shouldn't matter nowadays" to tell the head to put "8 in black" in a certain spot rather than tell the precise location and color of every constituting dot.

−3