Date : Mon, 23 Jun 2008 00:47:45 +0100
From : jgh@... (Jonathan Graham Harston)
Subject: The Micro User
"Chris Thornley" wrote:
> You could OCR them instead. With recent improvements, like with Omni page 16
> that seems to come with most recent scanners like the canon range. The
> recognition quality is very good and the output could easily be compressed
> as pdf for archive purposes.
Now that I've retypeset the article I've removed the page scans I
had on the Electron second processor which showed the sort of
images I use. At http://mdfs.net/Info/Comp/BBC/Display is a full
page scan of another article.
What I do is scan the page at 600 dpi 256-tone greyscale. I then
OCR from that image. I save the content as plain text and edit the
text to correct it.
To extract the diagrams I then resample the 600dip image down to
300dpi, reduce the colour depth from 256 to 16. Then I manually
edit the remaining 16-entry colour pallette so the first n colours
are full black and the final 12-n colours are full white, giving
4-level greyscale. I then convert back up to 256 colours and then
back down to 16 colours to merge all the blacks and whites to
single palette values, and save as a GIF. This gives 2 bits per
colour which squashes down very efficiently into whole bytes.
The final stage is to typeset an MSWord version, importing the
previously extracted text and diagrams. This is usually the
fiddliest, fighting with Word to get it to behave itself. I don't
aim for a photographic reproduction of the original, but a
reproduction of the contents and layout. For instance, I've got
better things to waste my life on than fighting to get columns of
text to hold exactly the same as the original and the pagination
to be exactly the same.
The biggest part of the job is proofreading the text, confirming
scanned listings are corrent and ensuring circuits diagrams are
correct. Micro User seems notorious for publishing incorrect
schematics where I have to essentially redesign the thing to
correct it.
So far I've done this with mainly with coprocessor manuals and
documentation. See
http://mdfs.net/Software/Tube/32016
http://mdfs.net/Software/Tube/6502/Electron
http://mdfs.net/Software/Tube/ARM
http://mdfs.net/Docs/Books/IEEEFS
http://mdfs.net/Info/Comp/BBC/Display
I'm currently trawling through the SJ MDFS manual at
http://mdfs.net/Docs/Books/SJMDFS
--
J.G.Harston - jgh@... - mdfs.net/User/JGH
There are three food groups: brown, green and ice cream.