# LZW Encoding

#### by Sanju 2009-06-29 19:04:00

LZW Encoding

Let us take an example:

TOBEORNOTTOBEORTOBEORNOT#

The # is a marker used to show that the end of the message has been reached. Clearly, then, we have 27 symbols in our alphabet (the 26 capital letters A through Z, plus the # character).

A computer will render these as strings of bits; 5-bit strings are needed to give sufficient combinations to encompass the entire dictionary. As the dictionary grows, the strings will need to grow in length to accommodate the additional entries.

A 5-bit string gives 25 = 32 possible combinations of bits, and so when the 33rd dictionary word is created, the algorithm will have to start using 6-bit strings.

Since the all-zero string 00000 is used, and is labeled "0", the 33rd dictionary entry will be labeled 32.

# = 00000
A = 00001
B = 00010
C = 00011
.
.
.
Z = 11010

If we weren't using LZW, and just sent the message as it stands (25 symbols at 5 bits each), it would require 125 bits.We will be able to compare this figure to the LZW output later. Apply LZW:

```
Current  Next  Output Value           Extended
Sequence Char  (# of bits)            Dictionary
NULL     T
T        O     20 = 10100 = 5 bits    27:  TO <-- Don't forget, we originally had 27 symbols, so the next one is 28th.
O        B     15 = 01111 = 5 bits    28:  OB
B        E      2 = 00010 = 5 bits    29:  BE
E        O      5 = 00101 = 5 bits    30:  EO
O        R     15 = 01111 = 5 bits    31:  OR
R        N     18 = 010010 = 6 bits   32:  RN <-- start using 6-bit strings
N        O     14 = 001110 = 6 bits   33:  NO
O        T     15 = 001111 = 6 bits   34:  OT
T        T     20 = 010100 = 6 bits   35:  TT
TO       B     28 = 011100 = 6 bits   36:  TOB
BE       O     30 = 011110 = 6 bits   37:  BEO
OR       T     32 = 100000 = 6 bits   38:  ORT
TOB      E     37 = 100101 = 6 bits   39:  TOBE
EO       R     31 = 011111 = 6 bits   40:  EOR
RN       O     33 = 100001 = 6 bits   41:  RNO
OT       #     35 = 100011 = 6 bits   42:  OT#
#               0 = 000000 = 6 bits
Total Length = 5*5 + 12*6 = 97 bits.
```

In using LZW we have made a saving of 28 bits out of 125 -- we have reduced the message by almost 22%.