From: Phil.Norman@genedata.com (Phil Norman)
Subject: Re: Is "LZW" the most efficient theoretically possible compression technique?
Date: 5 Jan 2000 11:43:17 +0100
Newsgroups: comp.compression,sci.math,sci.physics

On Mon, 03 Jan 2000 16:09:07 -0500,
    DSM <dsm002_DEATH_TO_SPAMMERS_@bigfoot.com> wrote:
>
>Is LZW the best which can be done?

No, most definately not.  Due to the way LZW encodes information,
it wastes, on average, half a bit for every code it outputs.  This
is because it always outputs an integer number of bits for every
code, and since it's creating these codes on the fly, it cannot
output any codes whose values are higher than that of the code
it's creating at the time.  Therefore there are always some values
which can be encoded in the number of bits the encoder is using,
but will never actually be encoded because the relevant dictionary
entries do not exist yet.  Since these values are implicitly known
to the decompressor, the ability to output such values is a waste
of bits.

Bolting on an adaptive huffman encoder (or an arithmetic encoder)
would help.


>A practical mind would insist "once redundancy
>in a data set has been cut by the LZW (or similar)
>compression algorithm, you can not shorten the
>set without loss of data."

I'd disagree with this.  If a further compressor is specifically
designed to work on LZW-encoded data (and therefore knows how the
LZW algorithm works), then it is quite possible for it to compress
the data further, if only by exploiting the fact (mentioned above)
that LZW is inherently redundant in its use of fixed-length output
codes (fixed-length on a short-term basis, that is).

It's helpful to realise that there are several different types of
compression.  Two of the basic forms are pattern-matching and
value prediction.  LZW is an example of pattern-matching, and
huffman and arithmetic encoding are both forms of prediction (they
predict that the most frequently-used values will be used most
often, and so assign such values shorter codes).  It is possible
to generate a data-stream which contains no repeated patterns, but
a very strong difference in frequency between different
characters, just as it is possible to create a data-stream with
the opposite characteristics.  Generally speaking, using a
combination of both of these types of compression yields better
results than just using one.  LZW is simply a pattern-matching
compressor, whereas zip deflate (for example) uses both pattern
matching (an LZ77 variant), and value prediction (huffman).


>However, could an algorithm that searches for extremely
>subtle patterns in the data set triumph over conventional
>(LZW and others) compression techniques?

Generally speaking, taking a compressed file and trying to
compress it again is not going to work.  However, writing a
better compressor than LZW (even if this is done simply by adding
further stages of compression to the LZW algorithm, like huffman
or arithmetic coding) is not hard.

Cheers,
Phil