Sometimes you save more space by not compressing data

Can the compressed data length be longer than the uncompressed data

Q: I am working on a highly secure application and need to compress data such as string and byte arrays. I am using the java.util.zip.* classes, but I am having some problems.

First, when using the Deflator and Inflator classes, I get DataFormatExceptions when the string is less than 30 characters.

Second, I have a question about the compression itself. I am using ByteArrayOutputStream and DeflaterOutputStream . I noticed that the compressdata.length() > OriginalData.length() where OriginalData is the uncompressed data. It doesn’t seem to make sense that the compressed length is longer than the uncompressed length. Can this be right?

A: In order to answer the first part of your question, I tested a string less than 30 characters and one greater than 30 characters. The only time that I could get a DataFormatException was when Inflater and Deflater were constructed with different nowrap values. Be sure that the Inflater and Deflater specify nowrap the same way. If the Deflater sets nowrap to false, the Inflater must do the same. Likewise, if the Deflater sets it to true, the Inflater must set it to true.

Whether or not to set nowrap to true or false depends on your needs. A true nowrap omits the ZLIB header and checksum data from the compressed data. A false no wrap leaves it. However, the Inflater‘s nowrap must be set to match the compressed input. Otherwise, as we have seen, you will get a DataFormatException.

Your second question raises an important fact about data compression. As strange as it may seem, the compressed data size can be larger than the uncomp ressed size. Depending on your Deflater settings, the Deflater may append a header to the compressed data. This header is used to decode the information and check it for errors. If you deal with very small strings, it is likely that not much real compression has gone on. Cutting a string of 30 characters to 15, while a 50 percent reduction, is only a reduction of 15 characters. As a result, the added size of the header makes the compressed string longer than the original. You will not see the benefits of compression until your data reaches a certain larger, precompression size. It’s hard to say what this size is, but generically it is where: (compressed size + header size) < uncompressed size. If your data is not large enough, you’re wasting time using compression.

You may also want to consider some of the other compression settings. Some compression algorithms are optimized for time, while others achieve a better compression but take longer to decompress. So the algorithm that you choose goes a long way in determining the final size of your compressed data.

Tony Sintes is a principal consultant at BroadVision. Tony, a Sun-certified Java 1.1 programmer and Java 2 developer, has worked with Java since 1997.

Source: www.infoworld.com