Date : Mon, 25 Jun 2001 12:24:03 +0100
From : "Fewell, Steve" <Steve.Fewell@...>
Subject: Re: Scanning of BBC Magazines
Well, I would suggest listing the OCR images by Magazine Name, Magazine
Issue, and article. Maybe the regular features (hints & tips, etc...) and
features (relating to the programs on the disk) could be OCR'd first? Then
anyone missing a particular issue could quickly browse the articles in that
issue.
Maybe to save space, the images could be converted to text, and made into
Word or HTML documents. This would loose the original magazine format
though, and would require a lengthy process to extract the text and pictures
from the image.
I guess that for AU we wouldn't need to OCR the yellow pages, as the
programs are available on disk. What do the others think?
Having an index of all articles with search words, sounds like a lengthy
task, but I'm sure it would prove to be very useful.
Steve.
-----Original Message-----
From: Mark Usher [mailto:mu.list@...]
Sent: 23 June 2001 11:06
To: bbc-micro@...
Subject: [BBC-Micro] Scanning of BBC Magazines
Hi,
I suppose it is about time I jumped in here.
I have been considering how to scan and put the magazines on-line. As you
can imagine there is rather alot of material. It is also mostly in colour.
If we are just talking scans of magazine pages to browse through, as if it
were a real magazine, only online, then each image would have to be approx.
500-600 kb, so as to allow sufficient quality when zooming in.
While this is not prohibitive for some people to view over the web, it would
be for others, especially when some of the magazines have more that 150
pages.
There have been some recent advances with viewing technology, especially as
the US censuses from 1790-1920 are all being scanned and put on line, and
LizardTech.com have developed some very good compression systems and a
viewer.
If high quality scans were to be made, the best way to do it would be to
disect each magazine and then use a sheet feeder device. That would cut the
amount of work tremendously, and get rid of the annoying "center folds",
although some center folds can be quite pleasing I am told :-)
Once images had been done and stored, then any OCR'ing of the most
interesting articles could be done, aswell as program listings that haven't
already been put on disk.
Another alternative is to use a document archival system that is online, and
use that to index the individual scans by keyword etc.
These are just my current thoughts on the matter.
Mark