<< Previous Message Main Index Next Message >>
<< Previous Message in Thread This Month Next Message in Thread >>
Date   : Fri, 01 May 2009 22:18:34 +0100
From   : philpem@... (Philip Pemberton)
Subject: Preservation of information (floppy discs, etc.)

('Scuse me if my usual sunny demeanour is absent, I'm currently cursing at 
Thunderbird on account of it seeming being completely unable to thread 
messages from this list properly)

All this talk of the Domesday data recovery has got me thinking about another 
'dead or dying' format -- the venerable floppy disc. Or rather, I've been 
thinking about dealing with the issue for some time. At this point, the 
project is sitting at the top of my List Of Things To Do When I Have Round 
Tuits Available...

It seems that besides the Catweasel (which IMO is a half-assed bodge of a 
"solution" with too much extraneous fluff) there really isn't anything that 
can do a decent job of reading data from floppy discs of an unknown format. 
There's basically no API or hardware documentation for the newer models, and 
the readback accuracy is pretty poor as well.

But first, a quick introduction for those who might not know about the 
low-level aspects of floppy discs...

A floppy disc consists of a spinning disc coated in magnetic material, which 
is split up into a given number of "tracks" (usually 40 or 80 for BBC discs). 
If it makes it any easier, think of this as 40 or 80 endless-loop tapes glued 
together. All of the tapes move at the same rate, and you get a signal when 
they're at the beginning. For the sake of this example, let's assume we have a 
mythical disc with only one track.

Now, let's say you want to store some data on one track of that disc:
   10101010 00000000 00001110 10100110

If you store that data exactly as-is and read it back, assuming your drive and 
disc are absolutely perfect, you'll get exactly what you wrote. There's a 
catch here, of course -- in the real world, things are rarely perfect. That 
spinning disc won't always spin at the same rate, because of:
   * Friction between the disc and the liner / casing
   * Friction inside the drive motor
   * Subtle variations in the speed controller's reference frequency
   * Temperature variations, noise, etc.

So when you read back that first block of zeroes, it's difficult to tell if 
you have 12, 13 or even 14 zeroes. If you have the original data, you know 
that there were 13, but in most situations that's not going to be the case.

So data on discs is encoded first. The two "big dogs" are FM and MFM. FM 
replaces "1"s with a "11" sequence, and replaces "0"s with a "10" sequence. 
What this means is that you decode data by using a shift register. Shift in 
two bits; if the first bit is a '0' then you're out of sync and need to drop a 
bit; if it's a '1' then your second bit is the actual data. Simple enough, but 
you lose half your storage capacity. MFM increases the storage density by 
relaxing the rules a little (but I can't remember the rules off-hand).

Although strictly speaking, you don't just store a "1" and a "0", you store a 
magnetic transition instead. Your bit rate determines how many transitions you 
have in a second.

My plan was to measure the timing between bits at a resolution of around 14 
bits, plus a few flag bits. You can store the timing info in a file, send this 
over't'internet (sorry, Yorkshire accent creeping in again -- "over the 
internet"), and someone else with another one of these boxes can use the 
timing information to make a copy of your disc. The accuracy depends on how 
well the reading and writing drives hold their speed (and how close they are 
to each other), but most standard disc controllers will have no issues reading 
or writing discs.

Of course, if the discs are standard FM or MFM (or even something else), you 
can just decode the data (i.e. go from FM to bits), re-encode the bits and 
write them back. Effectively this is re-timing, and is what those little 
active USB repeaters do -- decode the signal, then re-encode it and spit it 
back out. Think of it like dubbing a computer game tape (the old school 
playground pastime). Every time you copy the tape, you get more hiss. So by 
the time you've gotten to a 5th, 6th, whatever'th generation copy, the 
computer won't boot the tape ("R Tape Loading Error"). You get the same 
problem low-level copying discs.

I've got most of this figured out on a theoretical level, and I actually have 
a piece of software on my laptop that can do MFM decoding and byte 
synchronisation (but only for discs that follow the IBM System34 format, i.e. 
PC and BBC discs and others written with NEC 765 or WD1770 controllers in MFM 
mode). That works by looking at the timing data, keeping track of the current 
offset from "ideal", and figuring out which bit value is most likely.

My plan was to get the hardware running, document it, deal with the software 
as part of my university final-year project, then bully the Dean of School 
into signing a copyright release and release the whole thing as OSS/FS.

Comments and criticism welcome at the usual address...

-- 
Phil.
philpem@...          
http://www.philpem.me.uk/
<< Previous Message Main Index Next Message >>
<< Previous Message in Thread This Month Next Message in Thread >>