Date : Sun, 17 Sep 2006 08:47:46 +0100
From : Sprow <info@...>
Subject: Re: Compressed ROMFS?
In article <20060916095434.53696.qmail@...>,
Greg Cook <debounce@...> wrote:
> Did anyone ever come up with a system for compressing files in ROMFS
> images? I sketched out a dictionary based algorithm yesterday but the
> hard part is finding the repeating substrings.
I thought about it, and proposed it as a means of getting more ARM code into
ROMs which are used to store apps for the ARM coprocessor (though this was
more for ROMs holding ARM code than ROMFS format data):
http://www.sprow.co.uk/bbc/hardware/armcopro/armroms.pdf
It's always going to be a trade off between getting a good enough
compression ratio to be able to fit in the extra code you need to do the
compression (and any tables or dictionaries).
I had a quick bash this morning at Huffman coding some data, using about
100k of 6502 binaries (ROMs) and 100k of BASIC (various) for statistics
gathering. The frequency distributions of the symbols are quite different,
I've included some screenshots and the BASIC program in
http://www.sprow.co.uk/bbc/software/zips/huffstat.zip
sorry no manual yet!
For tokenised BASIC you can save 21%
For 6502 binaries you can save 10%
Assuming it's a 16k ROM, and it's full, and the ROMFS image doesn't span
multiple images, you save about 3440 bytes and 1630 bytes respectively. I'd
be pretty sure I could write a Huffman decoder in that space, though the
table is going to be about 700 bytes I think.
Maybe I've missed something in the generation of the binary tree, as &20 was
the most frequent symbol I'm surprised it got assigned a 4 bit code. Can
anyone spot the mistake?
Sprow.