Date : Wed, 23 Aug 2006 21:07:27 +0100
From : Sprow <info@...>
Subject: Re: Hard disk backup - another challenge
In article <44ECA2AA.2090909@...>,
Philip Pemberton <philpem@...> wrote:
> Sprow wrote:
> > Just a crude run length encoding is required, can't be more than a
> > couple of lines of code.
> > Reserve a byte as a token eg. FF. Output the token plus the repeated
> > byte plus the number of repeats (maybe a 2 byte field?). Escape the
> > token with itself to send just the token.
>
> Why not use Packbits?
Indeed, that's equally applicable, and I'm sure there are a good number of
other algorithms which are lightweight (for handy BASIC implementation) and
taylored to the typical contents of a harddisc.
[snip]
> It's a lot more efficient than normal RLE - in RLE, if you have to encode
> a group of (say) ten different bytes, you prefix all ten of them with a
> 0x01 byte (which means "a one byte long run follows"). That ends up
> doubling the size of the run. With Packbits, you encode that as a single
> -9 byte, followed
> by the ten bytes of data, which means you're only adding another byte to
> the data. Only a ten percent increase, versus 100% for RLE.
That would be an especially lame RLE implementation. Reread my random
proposal and note the only pathalogical case is a run of FF's, or more
generally a run of your chosen tokens. As you have (some) a priori knowledge
of the contents of your disc a sensible token can be chosen for a good
overall gain: choosing 00 and EE are probably both just as bad.
So a run of 10 random bytes (not including the token) encodes as 10 bytes in
the output stream,
Sprow.