Date : Fri, 01 May 2009 22:18:34 +0100
From : philpem@... (Philip Pemberton)
Subject: Preservation of information (floppy discs, etc.)
('Scuse me if my usual sunny demeanour is absent, I'm currently cursing at
Thunderbird on account of it seeming being completely unable to thread
messages from this list properly)
All this talk of the Domesday data recovery has got me thinking about another
'dead or dying' format -- the venerable floppy disc. Or rather, I've been
thinking about dealing with the issue for some time. At this point, the
project is sitting at the top of my List Of Things To Do When I Have Round
Tuits Available...
It seems that besides the Catweasel (which IMO is a half-assed bodge of a
"solution" with too much extraneous fluff) there really isn't anything that
can do a decent job of reading data from floppy discs of an unknown format.
There's basically no API or hardware documentation for the newer models, and
the readback accuracy is pretty poor as well.
But first, a quick introduction for those who might not know about the
low-level aspects of floppy discs...
A floppy disc consists of a spinning disc coated in magnetic material, which
is split up into a given number of "tracks" (usually 40 or 80 for BBC discs).
If it makes it any easier, think of this as 40 or 80 endless-loop tapes glued
together. All of the tapes move at the same rate, and you get a signal when
they're at the beginning. For the sake of this example, let's assume we have a
mythical disc with only one track.
Now, let's say you want to store some data on one track of that disc:
10101010 00000000 00001110 10100110
If you store that data exactly as-is and read it back, assuming your drive and
disc are absolutely perfect, you'll get exactly what you wrote. There's a
catch here, of course -- in the real world, things are rarely perfect. That
spinning disc won't always spin at the same rate, because of:
* Friction between the disc and the liner / casing
* Friction inside the drive motor
* Subtle variations in the speed controller's reference frequency
* Temperature variations, noise, etc.
So when you read back that first block of zeroes, it's difficult to tell if
you have 12, 13 or even 14 zeroes. If you have the original data, you know
that there were 13, but in most situations that's not going to be the case.
So data on discs is encoded first. The two "big dogs" are FM and MFM. FM
replaces "1"s with a "11" sequence, and replaces "0"s with a "10" sequence.
What this means is that you decode data by using a shift register. Shift in
two bits; if the first bit is a '0' then you're out of sync and need to drop a
bit; if it's a '1' then your second bit is the actual data. Simple enough, but
you lose half your storage capacity. MFM increases the storage density by
relaxing the rules a little (but I can't remember the rules off-hand).
Although strictly speaking, you don't just store a "1" and a "0", you store a
magnetic transition instead. Your bit rate determines how many transitions you
have in a second.
My plan was to measure the timing between bits at a resolution of around 14
bits, plus a few flag bits. You can store the timing info in a file, send this
over't'internet (sorry, Yorkshire accent creeping in again -- "over the
internet"), and someone else with another one of these boxes can use the
timing information to make a copy of your disc. The accuracy depends on how
well the reading and writing drives hold their speed (and how close they are
to each other), but most standard disc controllers will have no issues reading
or writing discs.
Of course, if the discs are standard FM or MFM (or even something else), you
can just decode the data (i.e. go from FM to bits), re-encode the bits and
write them back. Effectively this is re-timing, and is what those little
active USB repeaters do -- decode the signal, then re-encode it and spit it
back out. Think of it like dubbing a computer game tape (the old school
playground pastime). Every time you copy the tape, you get more hiss. So by
the time you've gotten to a 5th, 6th, whatever'th generation copy, the
computer won't boot the tape ("R Tape Loading Error"). You get the same
problem low-level copying discs.
I've got most of this figured out on a theoretical level, and I actually have
a piece of software on my laptop that can do MFM decoding and byte
synchronisation (but only for discs that follow the IBM System34 format, i.e.
PC and BBC discs and others written with NEC 765 or WD1770 controllers in MFM
mode). That works by looking at the timing data, keeping track of the current
offset from "ideal", and figuring out which bit value is most likely.
My plan was to get the hardware running, document it, deal with the software
as part of my university final-year project, then bully the Dean of School
into signing a copyright release and release the whole thing as OSS/FS.
Comments and criticism welcome at the usual address...
--
Phil.
philpem@...
http://www.philpem.me.uk/