Let us take an example:

TOBEORNOTTOBEORTOBEORNOT#

The # is a marker used to show that the end of the message has been reached. Clearly, then, we have 27 symbols in our alphabet (the 26 capital letters A through Z, plus the # character).

A computer will render these as strings of bits; 5-bit strings are needed to give sufficient combinations to encompass the entire dictionary. As the dictionary grows, the strings will need to grow in length to accommodate the additional entries.

A 5-bit string gives 25 = 32 possible combinations of bits, and so when the 33rd dictionary word is created, the algorithm will have to start using 6-bit strings.

Since the all-zero string 00000 is used, and is labeled "0", the 33rd dictionary entry will be labeled 32.

# = 00000

A = 00001

B = 00010

C = 00011

.

.

.

Z = 11010

A = 00001

B = 00010

C = 00011

.

.

.

Z = 11010

If we weren't using LZW, and just sent the message as it stands (25 symbols at 5 bits each), it would require 125 bits.We will be able to compare this figure to the LZW output later. Apply LZW:

Current Next Output Value Extended

Sequence Char (# of bits) Dictionary

NULL T

T O 20 = 10100 = 5 bits 27: TO <-- Don't forget, we originally had 27 symbols, so the next one is 28th.

O B 15 = 01111 = 5 bits 28: OB

B E 2 = 00010 = 5 bits 29: BE

E O 5 = 00101 = 5 bits 30: EO

O R 15 = 01111 = 5 bits 31: OR

R N 18 = 010010 = 6 bits 32: RN <-- start using 6-bit strings

N O 14 = 001110 = 6 bits 33: NO

O T 15 = 001111 = 6 bits 34: OT

T T 20 = 010100 = 6 bits 35: TT

TO B 28 = 011100 = 6 bits 36: TOB

BE O 30 = 011110 = 6 bits 37: BEO

OR T 32 = 100000 = 6 bits 38: ORT

TOB E 37 = 100101 = 6 bits 39: TOBE

EO R 31 = 011111 = 6 bits 40: EOR

RN O 33 = 100001 = 6 bits 41: RNO

OT # 35 = 100011 = 6 bits 42: OT#

# 0 = 000000 = 6 bits

Total Length = 5*5 + 12*6 = 97 bits.

In using LZW we have made a saving of 28 bits out of 125 -- we have reduced the message by almost 22%.

You must LOGIN to add comments