ACM logo

The ACM Journal of Experimental Algorithmics


The Effect of Flexible Parsing for Dynamic Dictionary-Based Data Compression

by

Yossi Matias

and

Nasir Rajpoot

and

S. Cenk Sahinalp


Abstract:

We report on the performance evaluation of greedy parsing with a single step look-ahead (which we call flexible parsing or FP) as an alternative to the commonly used greedy parsing (with no look-ahead) scheme. Greedy parsing is the basis of most popular compression programs includiung the Unix compress and gzip, but it usually results in far from optimal parsing/compression with regard to the dictionary construction scheme in use. Flexible parsing, however, is optimal, i.e., it partitions any given input into the smallest number of phrases possible, for dictionary schemes that satisfy the prefix property throughout their execution.

We focus on the application of FP in the context of the LZW variant of the Lempel-Ziv dictionary construction method, which is of considerable practical interest. We implement two compression algorithms which use (i) FP with LZW dictionary (LZW-FP) and (ii) FP with an alternative flexible dictionary (FPA). Our implementations are based on novel online data structures enabling us to use linear time and space. We test our implementations on a collection of input sequences which include textual files, DNA sequences, medical images, and speudorandom binary files, and compare our results with compress and gzip. Our results demonstrate that flexible parsing is especially useful for non-textual data, on which it improves over the compression rates of compress and gzip by up to 20% and 35%, respectively.

Keywords: Experimental Algorithms; parsing; data compression.

Sources: