Memory-efficient decompression for embedded computers

There are many compression systems that perform well, but need a signficant amount of RAM while decompressing. This is not always available on small computers.

Here the cost of compression can be quite high, because we're always compressing on a powerful host and decompressing on a much smaller machine. This is in contrast with fast LZ schemes like LZJB, which attempt to make compression as fast as possible.

This is a simple LZ-type compression system that is quite efficient, but uses almost no RAM for decompression. The scheme is part of the LZ family of algorithms: tokens in the compressed stream are either literal bytes, or backwards references to previously uncompressed data. Each backwards reference consists of an offset and a length field. When the decompressor encounters this reference, it loops for the given number of bytes, copying earlier data.

So pseudocode for a decompressor is simply:

while not done:
  get next token
  if token is a literal byte:
    append to output
  else:
    get offset (O) and length (L)
    for L bytes:
      copy preceding byte from -O to output

Simple LZ algortithms (e.g. LZRW1 and LZJB) use fixed-width offset and length fields. More sophisticated schemes (e.g. RNC) Huffman encode the offset and length values. However, this requires that a Huffman decode structure be loaded into memory. In the case of RNC the structure occupies several hundred bytes. This scheme attempts to match RNC's efficiency by leaving the field widths flexible: their values are specified at the start of the compressed data. However, by avoiding Huffman coding, the compressor's RAM requirement is kept tiny.

Because of this flexibility, the compressor may choose any combination of bit sizes for the offset and length fields. By executing a brute-force search, it can find the optimum field sizes for any given source.

An example

Given the 613 byte opening sentence of A Tale of Two Cities:

It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of Light, it was the season of Darkness, it was the spring of hope, it was the winter of despair, we had everything before us, we had nothing before us, we were all going direct to heaven, we were all going direct the other way - in short, the period was so far like the present period, that some of its noisiest authorities insisted on its being received, for good or for evil, in the superlative degree of comparison only.

the compressor can compute total size in bytes for various bit-widths for the L and O fields:

  L=1 L=2 L=3 L=4 L=5 L=6 L=7
O=0 687 688 689 690 691 692 692
O=1 677 680 682 685 688 691 693
O=2 604 620 635 651 666 682 683
O=3 560 585 613 641 669 671 673
O=4 534 569 608 646 650 655 659
O=5 445 455 478 471 474 481 488
O=6 451 417 404 398 401 409 418
O=7 414 391 382 379 382 394 405
O=8 414 390 379 379 384 397 410

The best result is 379 bytes for O=7 and L=4. These paramaters give a lookback of 128 bytes and a length of 2-17 bytes (see below). Looking at the start of the compressed output:

I         # Appending literal bytes at first...
t
(space)
w
a
s
(space)
t
h
e
(space)
b
e
s
(-13, 2)  # Copy 2 bytes at offset -13, so appends "t "
o
f
(-12, 2)  # Copy 2 bytes at offset -12, so appends " t"
i
m
(-10, 2)  # Copy 2 bytes at offset -10, so appends "es"
,
(space)
i
(-26, 10) # Copy 10 bytes at offset -26, so appends "t was the "

In this case, each literal byte takes 9 bits and each backward reference takes (1 + 7 + 4 = 12) bits, so even backwards references of just two bytes are worthwhile, giving 12 bits against 18. Depending on field sizes this break-even point can be 1, 2 or 3 bytes. The decoder needs to know this value (M) so that it can efficiently represent lengths.

Encoded data format

The encoded data is treated as a stream of packed bits. The compressed header is:

4 bits offset size (O)
4 bits length size (L)
2 bits minimum length (M)
16 bits length of uncompressed data

Then each token is either a literal byte:

1 bit zero, meaning literal byte
8 bits byte value

or a backwards reference:

1 bit one, meaning backwards reference
O bits offset
L bits length

Since the encoder will never emit a backwards reference with a length less than M, it can squeeze a litte more information into the length field by subtracting M from the length when encoding, and adding M in decode. For example a 4-bit length field with M=2 (as in the text example above) can represent lengths from 2-17 bytes inclusive.

Python compressor

Using this Python script compress.py, you can compress from a source file with:

compress.py --lookback 7 --length 4 --name itwas sourcefile itwas.h

This writes a C++ initializer to file itwas.h:

static PROGMEM prog_uchar itwas[] = {

0x2e,  0x01,  0xd2,  0x92,  0xe4,  0x82,  0x80,  0x3b,  0x43,  0xce,  0x08,  0xb8,  0xb0,  0x60,  0x8a,  0x80,
0x11,  0x53,  0xce,  0x31,  0xc0,  0x9e,  0x59,  0x34,  0xb0,  0x64,  0x1b,  0x09,  0x68,  0x10,  0xb0,  0xcc,
0x0c,  0xee,  0xec,  0x39,  0x65,  0x7d,  0x8b,  0x32,  0x64,  0xde,  0x01,  0xcd,  0xc8,  0x9d,  0x25,  0x67,
0x26,  0xec,  0xd9,  0xb6,  0x74,  0xa6,  0x32,  0x63,  0xcf,  0x9e,  0x4d,  0x0e,  0xb0,  0x60,  0xb7,  0x04,
0x9c,  0xf3,  0x9a,  0x29,  0x07,  0xf6,  0x8c,  0x59,  0x20,  0xe8,  0xc2,  0x37,  0xc8,  0x07,  0xa6,  0xcc,
0xb2,  0x3f,  0x9b,  0x59,  0xb2,  0x63,  0xcc,  0x89,  0x29,  0x13,  0xae,  0x05,  0xc0,  0x85,  0x67,  0x41,
0x73,  0x66,  0xfa,  0x07,  0xec,  0xd9,  0xf5,  0x12,  0x19,  0x96,  0xcc,  0x59,  0x70,  0xc9,  0xfe,  0x6c,
0x21,  0xc2,  0x90,  0x13,  0xeb,  0xf7,  0xe7,  0xcc,  0x81,  0xd3,  0x2a,  0x30,  0xef,  0x22,  0x16,  0xec,
0x39,  0x30,  0xed,  0x6c,  0xee,  0x19,  0xc0,  0x85,  0x29,  0xa7,  0x2c,  0x62,  0xc2,  0xb4,  0x0c,  0x18,
0xb2,  0xe4,  0x94,  0x0b,  0xdc,  0x99,  0x22,  0x60,  0xc1,  0x90,  0x09,  0x02,  0xa6,  0xdc,  0xd2,  0x81,
0x27,  0x17,  0x16,  0x1d,  0xc4,  0x88,  0x29,  0x33,  0xf6,  0x9c,  0x52,  0x81,  0x2b,  0xe7,  0x1c,  0x67,
0xc7,  0x3e,  0xf3,  0xdb,  0x30,  0x08,  0x4a,  0x91,  0x21,  0x1b,  0x36,  0x08,  0x98,  0xb3,  0x6f,  0x23,
0x26,  0xb3,  0xc0,  0x94,  0x31,  0x17,  0x04,  0x5c,  0xd8,  0xcf,  0x03,  0x53,  0x86,  0xb3,  0xc0,  0xae,
0x97,  0x8b,  0xbf,  0x09,  0x8a,  0x89,  0xc5,  0x1e,  0x2c,  0xe1,  0x23,  0x20,  0xe0,  0xce,  0x90,  0x27,
0x02,  0xb4,  0x08,  0xfa,  0x00,  0x01,  0x67,  0x16,  0xcf,  0xc0,  0xa5,  0x09,  0xce,  0x21,  0x13,  0x38,
0xa4,  0x03,  0x4b,  0xf6,  0x4c,  0xb2,  0x90,  0x73,  0x2a,  0xcc,  0x00,  0x33,  0x86,  0x9c,  0x10,  0xb0,

0x61,  0xc9,  0x3a,  0x0d,  0xb2,  0xb8,  0x0f,  0x38,  0x33,  0x65,  0x77,  0x02,  0x45,  0x52,  0x47,  0x0c,
0x63,  0x80,  0x33,  0x7b,  0xb6,  0x73,  0xc8,  0x6c,  0x14,  0xb8,  0x34,  0x00,  0x3b,  0xf6,  0x2c,  0x39,
0xb3,  0x64,  0xca,  0x39,  0x05,  0x18,  0x72,  0xad,  0x87,  0x45,  0x60,  0xc9,  0x25,  0x06,  0xf7,  0x30,
0x19,  0x92,  0x80,  0xe9,  0x12,  0xb0,  0x67,  0x57,  0xc2,  0x8c,  0x98,  0x3e,  0x00,  0x73,  0x04,  0x53,
0xc0,  0x38,  0x1c,  0xb8,  0xa5,  0xc2,  0x28,  0x30,  0xab,  0x00,  0x04,  0xcc,  0xd9,  0x3f,  0x61,  0x38,
0x46,  0x13,  0x53,  0x6e,  0x2c,  0xd9,  0xa4,  0x42,  0x31,  0x20,  0x78,  0x22,  0xce,  0x5c,  0x6f,  0x90,
0xcd,  0x19,  0xb0,  0xac,  0x02,  0x04,  0x4c,  0x98,  0x32,  0x6f,  0x85,  0x6b,  0x66,  0xfc,  0x0c,  0x1c,
0x18,  0x72,  0xba,  0x86,  0x45,  0x0c,  0x02,  0x36,  0x3c,  0xd1,  0x81,  0x02,
};

The command-line utility has options for binary output. See ./compress.py --help for details.

Arduino decompressor

For the Arduino, the C++ code for a decompress-to-RAM function can be:

static void decompress(byte *dst, byte *src)
{
  BS.begin(src);
  byte O = GDFB.getn(4);
  byte L = GDFB.getn(4);
  byte M = GDFB.getn(2);
  byte *end = dst + BS.getn(16);
  while (dst != end) {
    if (BS.get1() == 0) {
      *dst++ = BS.getn(8);
    } else {
      int offset = -BS.getn(O) - 1;
      int length = BS.getn(L) + minlen;
      while (length--) {
        *dst = *(dst + offset);
        dst++;
      }
    }
  }
}

BS is a simple bitstream object that, once initialized, delivers the next single bit (get1) and multi-bit (getn) value from the source. Here is a possibe implementation that uncompresses from flash storage:

class Flashbits {
public:
  void begin(prog_uchar *s) {
    src = s;
    mask = 0x01;
  }
  byte get1(void) {
    byte r = (pgm_read_byte_near(src) & mask) != 0;
    mask <<= 1;
    if (!mask) {
      mask = 1;
      src++;
    }
    return r;
  }
  unsigned short getn(byte n) {
    unsigned short r = 0;
    while (n--) {
      r <<= 1;
      r |= get1();
    }
    return r;
  }
private:
  prog_uchar *src;
  byte mask;
};

static Flashbits BS;