The LZW compression method maps strings of text characters to
numeric codes. To begin with,
all characters that may occur in the text are assigned a code. For example,
suppose the text file to be compressed is the string :
aaabbbbbbaabaaba
This string is composed of the characters `a' and `b'. `a' is assigned the
code 0 and `b' the code 1. The mapping between character strings and
their codes is stored in a dictionary. Each dictionary entry has two
fields : "key" and "code". The character string represented by "code"
is stored in the field "key". The initial dictionary for our
example is given by the first two columns of the table below (i.e.,
codes "0" and "1") :
code |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
key |
a |
b |
aa |
aab |
bb |
bbb |
bbba |
aaba |
Beginning with the dictionary initialized as above, the LZW compressor
repeatedly finds the longest prefix "p" of the unencoded part of the
input file that is in the dicitonary and outputs its code. If there is a
next character "c" in the input file, then "pc" ("pc" is the prefix
string "p" followed by the character "c") is assigned the next code and
inserted into the dicitionary. This strategy is called the LZW rule.
Let us now try this rule on our example string. The longest prefix of the
input that is in the dictionary is "a". Its code,0, is the output, and
the string "aa" is assigned the code and entered into the dictionary.
"aa" is the longest prefix of the remaining string that is in the
dictionary. Its code, 2, is output; the string "aab" is assigned the code
3 and entered into the dictionary. Following the output of the code 2,
the code for "b" is output; "bb" is assigned code 4 and entered into the
dictionary. Then the code for "bb" is output, and "bbb" is entered into the
table with code 5. Next the code 5 is output, and "bbba" is entered with
code 6. Then the code 3 is output for "aab", and "aaba" is entered into the
dictionary with code 7. Finally the code 7 is output for the remaining
string "aaba". Our sample string is encoded as the string 0214537.