Move-to-front, distance coding, and inversion frequencies revisited

doi:10.1007/978-3-540-73437-6_10

Book ChapterDOI

Move-to-front, distance coding, and inversion frequencies revisited

- pp 71-82

TLDR

This paper analyzes Move-to-Front, Distance Coding and Inversion Frequencies from the point of view of how effective they are in the task of compressing low-entropy strings, that is, strings which have many regularities and are therefore highly compressible.

Abstract:

Move-to-Front, Distance Coding and Inversion Frequencies are three somewhat related techniques used to process the output of the Burrows-Wheeler Transform. In this paper we analyze these techniques from the point of view of how effective they are in the task of compressing low-entropy strings, that is, strings which have many regularities and are therefore highly compressible. This is a non-trivial task since many compressors have non-constant overheads that become non-negligible when the input string is highly compressible. Because of the properties of the Burrows-Wheeler transform, being locally optimal ensures an algorithm compresses low-entropy strings effectively. Informally, local optimality implies that an algorithm is able to effectively compress an arbitrary partition of the input string. We show that in their original formulation neither Move-to-Front, nor Distance Coding, nor Inversion Frequencies is locally optimal. Then, we describe simple variants of the above algorithms which are locally optimal. To achieve local optimality with Move-to-Front it suffices to combine it with Run Length Encoding. To achieve local optimality with Distance Coding and Inversion Frequencies we use a novel "escape and re-enter" strategy. Since we build on previous results, our analyses are simple and shed new light on the inner workings of the three techniques considered in this paper.

Move-to-front, distance coding, and inversion frequencies revisited

Citations

The myriad virtues of wavelet trees

The Myriad Virtues of Wavelet Trees

Balancing and clustering of words in the Burrows-Wheeler transform

Wavelet Trees: From Theory to Practice

Words with Simple Burrows-Wheeler Transforms

References

Elements of information theory

A Block-sorting Lossless Data Compression Algorithm

Universal codeword sets and representations of the integers

Compressed full-text indexes

High-order entropy-compressed text indexes

Related Papers (5)

A Block-sorting Lossless Data Compression Algorithm

A universal algorithm for sequential data compression

A locally adaptive data compression scheme

Compressed representations of sequences and full-text indexes

Compression of individual sequences via variable-rate coding