Date : Mon, 16 Aug 2004 16:39:35 +0000
From : "W.Scholten" <whs@...>
Subject: compression of scans
Jules Richardson wrote:
> I can probably get a 20-30% reduction in storage space needed if I
> adjust the black / white thresholds on the greyscale images - because
> some of the source paper was pretty thin there's some text bleed through
> from the opposite sides of pages; adjusting the thresholds should get
> rid of that
Not quite, usually. For this you need to use black background when
scanning, or an image analysis program.
> and result in better file compression. I just need to pick
> the worst-quality page I can find and make sure that the text is still
> OCR-able if I do that.
My compression program does that, 4 to 8 times reduction in size from
compressed (LZW) gif of the original, to a (LZW) gif of of the editied
image.
I only use unix (FreeBSD/OpenBSD) so if you're interested, you'd need to
compile it for whatever OS you use. It also only handles gif as it was
easier to use the IO routines and as tiffs are stored by most program
uncompressed. I use a simple script to convert all tiffs from the
scanner in a given directory to gif (Image magick), then enhance them.
See the message about the scans being back online for examples. A4 at
300dpi takes about 250K per image on average in my applications.
Regards,
Wouter
--
BBC/Atom/magazine scans:
http://8-bit.summerfield-technology.co.uk/