As part of the HTML Course we discussed image compression for formats like GIF, PNG and JPG. The simplest to actually work through in detail was the LZ Compression used in GIF images.

As illustration we used some text, 365 bytes long:

So after breakfast they went round to see Piglet and Pooh explained
as they went that Piglet was a very small animal who did not like 
bouncing and asked Tigger not to be too bouncy just at first. 
And Tigger who had been hiding behind trees said that a Tigger was 
only bouncy before breakfast and that as soon as they had had 
breakfast they became quiet and refined.

Basic algorithm:

prefix= emptystring;
repeat
  get nextcharacter;
  if prefix + nextcharacter is in the Code Table
  then prefix = prefix + nextcharacter
  else {
      add prefix + nextcharacter to the Code Table;
      output the code of the prefix from the Code Table
      set the prefix to the nextcharacter
      }
until complete;
output the code of the last prefix

so after breakfast ...

Prefix Next Comment
Empty s Set Prefix to s
s o Put 'so' in Code Table, output code of 's'
o space Put 'o ' in Code Table, output code for 'o'
space a Put ' a' in Code table and output code for space
  • Even the small Pooh example will take up a large amount of space
  • Will use a simple example first
  • Suppose it is Autumn and nuts are falling from the trees
  • I say to my friend Sally:

Tall Sally, all balls fall all Fall

  • Rather contrived but probably more similar to an image
  • Will use _ to represent space, ignore uppercase
  • tall_sally_all_balls_fall_all_fall
  • Opportunity to have single code for all and all_

Click on the Next button to move the coding along.

Tall Sally, all balls fall all Fall tall_sally_all_balls_fall_all_fall LZ Coder LZ Coder 00 _ 01 a 02 b 03 c 04 d 05 e 06 f 07 g 08 h 09 i 10 j 11 k 12 l 13 m 14 n 15 o 16 p 17 q 18 r 19 s 20 t 21 u 22 v 23 w 24 x 25 y 26 z 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 Prefix Next Character + Look Up = Look Up No Add to Table Output Prefix Prefix = Next Char Yes Add Input Character to Prefix t t Output Next

After this we compressed the complete piece of text showing how the code table built up. Conclusions were something like:

  • Takes a while to build up significant entries in the Code Table
  • Worse off than before?
  • Yes if we have to transmit 220 code values but we don't!
  • Maximum code value we have to send is 221 so all the output numbers fit into a byte
  • 0-26 entries are all less than 32 so could fit into 5 bits each
  • 27-46 entries are all less than 64 so could fit into 6 bits each
  • 47-111 entries are all less than 128 so could fit into 7 bits each
  • 112 onwards are all less than 256 so could fit into 8 bits each
  • The original 365 bytes can be compressed to about 200 bytes, CR = 1.8
  • We have enough information to reconstruct the Code Table that comes after the basic symbols
  • Because we were not too greedy and output the code of the prefix and not prefix+next character
  • This is the smart part invented by Lempel and Ziv
  • As we decode the input stream, make up the Code Table that created it