LZW Compression

The LZW compression method maps strings of text characters to numeric codes. To begin with, all characters that may occur in the text are assigned a code. For example, suppose the text file to be compressed is the string :
aaabbbbbbaabaaba

This string is composed of the characters `a' and `b'. `a' is assigned the code 0 and `b' the code 1. The mapping between character strings and their codes is stored in a dictionary. Each dictionary entry has two fields : "key" and "code". The character string represented by "code" is stored in the field "key". The initial dictionary for our example is given by the first two columns of the table below (i.e., codes "0" and "1") :
code 0 1 2 3 4 5 6 7
key a b aa aab bb bbb bbba aaba
Beginning with the dictionary initialized as above, the LZW compressor repeatedly finds the longest prefix "p" of the unencoded part of the input file that is in the dicitonary and outputs its code. If there is a next character "c" in the input file, then "pc" ("pc" is the prefix string "p" followed by the character "c") is assigned the next code and inserted into the dicitionary. This strategy is called the LZW rule.

Let us now try this rule on our example string. The longest prefix of the input that is in the dictionary is "a". Its code,0, is the output, and the string "aa" is assigned the code and entered into the dictionary. "aa" is the longest prefix of the remaining string that is in the dictionary. Its code, 2, is output; the string "aab" is assigned the code 3 and entered into the dictionary. Following the output of the code 2, the code for "b" is output; "bb" is assigned code 4 and entered into the dictionary. Then the code for "bb" is output, and "bbb" is entered into the table with code 5. Next the code 5 is output, and "bbba" is entered with code 6. Then the code 3 is output for "aab", and "aaba" is entered into the dictionary with code 7. Finally the code 7 is output for the remaining string "aaba". Our sample string is encoded as the string 0214537.