Date : Mon, 23 Jun 2008 21:11:10 +0100
From : C.J.Thornley@... (Chris Thornley)
Subject: The Micro User
You don't have to do all these stages. OCR is really advanced these days
making this all very simple to do without all this tedious messing around.
Scanning, Layout and Prof reading built in. Preserving the layout or
selected different types of finally layout. OCR has moved on from what it
was in the past.
Chris
/> Christopher J. Thornley is cjt@...
( //------------------------------------------------------,
(*)OXOXOXOXO(*>=*=O=S=U=0=3=6=*=--------- >
( \\------------------------------------------------------'
\> Home Page :-http://www.coolrose.fsnet.co.uk
-----Original Message-----
From: bbc-micro-bounces+c.j.thornley=coolrose.fsnet.co.uk@...
[mailto:bbc-micro-bounces+c.j.thornley=coolrose.fsnet.co.uk@...
uk] On Behalf Of Jonathan Graham Harston
Sent: 23 June 2008 00:48
To: bbc-micro@...
Subject: Re: [BBC-Micro] The Micro User
"Chris Thornley" wrote:
> You could OCR them instead. With recent improvements, like with Omni
> page 16 that seems to come with most recent scanners like the canon
> range. The recognition quality is very good and the output could
> easily be compressed as pdf for archive purposes.
Now that I've retypeset the article I've removed the page scans I had on the
Electron second processor which showed the sort of images I use. At
http://mdfs.net/Info/Comp/BBC/Display is a full page scan of another
article.
What I do is scan the page at 600 dpi 256-tone greyscale. I then OCR from
that image. I save the content as plain text and edit the text to correct
it.
To extract the diagrams I then resample the 600dip image down to 300dpi,
reduce the colour depth from 256 to 16. Then I manually edit the remaining
16-entry colour pallette so the first n colours are full black and the final
12-n colours are full white, giving 4-level greyscale. I then convert back
up to 256 colours and then back down to 16 colours to merge all the blacks
and whites to single palette values, and save as a GIF. This gives 2 bits
per colour which squashes down very efficiently into whole bytes.
The final stage is to typeset an MSWord version, importing the previously
extracted text and diagrams. This is usually the fiddliest, fighting with
Word to get it to behave itself. I don't aim for a photographic reproduction
of the original, but a reproduction of the contents and layout. For
instance, I've got better things to waste my life on than fighting to get
columns of text to hold exactly the same as the original and the pagination
to be exactly the same.
The biggest part of the job is proofreading the text, confirming scanned
listings are corrent and ensuring circuits diagrams are correct. Micro User
seems notorious for publishing incorrect schematics where I have to
essentially redesign the thing to correct it.
So far I've done this with mainly with coprocessor manuals and
documentation. See
http://mdfs.net/Software/Tube/32016
http://mdfs.net/Software/Tube/6502/Electron
http://mdfs.net/Software/Tube/ARM
http://mdfs.net/Docs/Books/IEEEFS
http://mdfs.net/Info/Comp/BBC/Display
I'm currently trawling through the SJ MDFS manual at
http://mdfs.net/Docs/Books/SJMDFS
--
J.G.Harston - jgh@... - mdfs.net/User/JGH There are three
food groups: brown, green and ice cream.
_______________________________________________
bbc-micro mailing list
bbc-micro@...
http://lists.cloud9.co.uk/mailman/listinfo/bbc-micro