Date : Sun, 05 Jun 2005 19:27:57 +0000
From : Jules Richardson <julesrichardsonuk@...>
Subject: Re: Best prog for scanning manuals etc?
On Sun, 2005-06-05 at 16:20 +0100, neil f wrote:
> I have a few interesting, possibly unique, pieces of
> literature/manuals I'd like to scan (mostly Beeb robotics/tech
> related) and put into a Beeb robotics section on my website - and
> thereby do my bit for the cause, so to speak.
>...
> Anyway, what software do other people use or recommend?
Can't comment on the OCR side of it, but for scans of documents I prefer
seperate TIFF images, preferably in a TAR archive (as more machines will
read TAR straight off than something like ZIP - Winzip handles TAR
format under Windows for anyone who didn't know, and of course it's a
standard thing in Unix-land).
Scan at 300dpi or better and use several shades of grey - even on pages
with just text - if you want to give a later OCR process a better chance
of doing a good job (if you have to save at bi-level be *very* sure
you've got threshold settings right and that the source pages are in
good condition - free from marks etc.)
Regarding documents that *have* been OCRed, I suppose I prefer RTF or
HTML formats - because at least those are pretty much raw text with a
bit of markup, and so can be indexed and searched easily using 3rd party
tools (unlike Word, PDF etc. where it can be more complex).
Not sure what the status of RTF is regarding images though; anyone know
if it provides a way of referencing images within the text?
cheers
Jules