scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Compression of individual sequences via variable-rate coding

01 Sep 1978-IEEE Transactions on Information Theory (IEEE)-Vol. 24, Iss: 5, pp 530-536
TL;DR: The proposed concept of compressibility is shown to play a role analogous to that of entropy in classical information theory where one deals with probabilistic ensembles of sequences rather than with individual sequences.
Abstract: Compressibility of individual sequences by the class of generalized finite-state information-lossless encoders is investigated. These encoders can operate in a variable-rate mode as well as a fixed-rate one, and they allow for any finite-state scheme of variable-length-to-variable-length coding. For every individual infinite sequence x a quantity \rho(x) is defined, called the compressibility of x , which is shown to be the asymptotically attainable lower bound on the compression ratio that can be achieved for x by any finite-state encoder. This is demonstrated by means of a constructive coding theorem and its converse that, apart from their asymptotic significance, also provide useful performance criteria for finite and practical data-compression tasks. The proposed concept of compressibility is also shown to play a role analogous to that of entropy in classical information theory where one deals with probabilistic ensembles of sequences rather than with individual sequences. While the definition of \rho(x) allows a different machine for each different sequence to be compressed, the constructive coding theorem leads to a universal algorithm that is asymptotically optimal for all sequences.
Citations
More filters
Book
15 May 1999
TL;DR: In this article, the authors present a rigorous and complete textbook for a first course on information retrieval from the computer science (as opposed to a user-centred) perspective, which provides an up-to-date student oriented treatment of the subject.
Abstract: From the Publisher: This is a rigorous and complete textbook for a first course on information retrieval from the computer science (as opposed to a user-centred) perspective. The advent of the Internet and the enormous increase in volume of electronically stored information generally has led to substantial work on IR from the computer science perspective - this book provides an up-to-date student oriented treatment of the subject.

9,923 citations

Book
06 Oct 2003
TL;DR: A fun and exciting textbook on the mathematics underpinning the most dynamic areas of modern science and engineering.
Abstract: Fun and exciting textbook on the mathematics underpinning the most dynamic areas of modern science and engineering.

8,091 citations

01 Jan 2019
TL;DR: The book presents a thorough treatment of the central ideas and their applications of Kolmogorov complexity with a wide range of illustrative applications, and will be ideal for advanced undergraduate students, graduate students, and researchers in computer science, mathematics, cognitive sciences, philosophy, artificial intelligence, statistics, and physics.
Abstract: The book is outstanding and admirable in many respects. ... is necessary reading for all kinds of readers from undergraduate students to top authorities in the field. Journal of Symbolic Logic Written by two experts in the field, this is the only comprehensive and unified treatment of the central ideas and their applications of Kolmogorov complexity. The book presents a thorough treatment of the subject with a wide range of illustrative applications. Such applications include the randomness of finite objects or infinite sequences, Martin-Loef tests for randomness, information theory, computational learning theory, the complexity of algorithms, and the thermodynamics of computing. It will be ideal for advanced undergraduate students, graduate students, and researchers in computer science, mathematics, cognitive sciences, philosophy, artificial intelligence, statistics, and physics. The book is self-contained in that it contains the basic requirements from mathematics and computer science. Included are also numerous problem sets, comments, source references, and hints to solutions of problems. New topics in this edition include Omega numbers, KolmogorovLoveland randomness, universal learning, communication complexity, Kolmogorov's random graphs, time-limited universal distribution, Shannon information and others.

3,361 citations

Journal ArticleDOI
TL;DR: The state of the art in data compression is arithmetic coding, not the better-known Huffman method, which gives greater compression, is faster for adaptive models, and clearly separates the model from the channel encoding.
Abstract: The state of the art in data compression is arithmetic coding, not the better-known Huffman method. Arithmetic coding gives greater compression, is faster for adaptive models, and clearly separates the model from the channel encoding.

3,188 citations

01 Jan 1994
TL;DR: A block-sorting, lossless data compression algorithm, and the implementation of that algorithm and the performance of the implementation with widely available data compressors running on the same hardware are compared.
Abstract: The charter of SRC is to advance both the state of knowledge and the state of the art in computer systems. From our establishment in 1984, we have performed basic and applied research to support Digital's business objectives. Our current work includes exploring distributed personal computing on multiple platforms, networking , programming technology, system modelling and management techniques, and selected applications. Our strategy is to test the technical and practical value of our ideas by building hardware and software prototypes and using them as daily tools. Interesting systems are too complex to be evaluated solely in the abstract; extended use allows us to investigate their properties in depth. This experience is useful in the short term in refining our designs, and invaluable in the long term in advancing our knowledge. Most of the major advances in information systems have come through this strategy, including personal computing, distributed systems, and the Internet. We also perform complementary work of a more mathematical flavor. Some of it is in established fields of theoretical computer science, such as the analysis of algorithms, computational geometry, and logics of programming. Other work explores new ground motivated by problems that arise in our systems research. We have a strong commitment to communicating our results; exposing and testing our ideas in the research and development communities leads to improved understanding. Our research report series supplements publication in professional journals and conferences. We seek users for our prototype systems among those with whom we have common interests, and we encourage collaboration with university researchers. This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part without payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies include the following: a notice that such copying is by permission an acknowledgment of the authors and individual contributors to the work; and all applicable portions of the copyright notice. Copying, reproducing, or republishing for any other purpose shall require a license with payment of fee to the Systems Research Center. All rights reserved. Authors' abstract We describe a block-sorting, lossless data compression algorithm, and our implementation of that algorithm. We compare the performance of our implementation with widely available data compressors running on the same hardware. The algorithm works by applying a reversible transformation to a block of input …

2,753 citations

References
More filters
Book
01 Jan 1968
TL;DR: This chapter discusses Coding for Discrete Sources, Techniques for Coding and Decoding, and Source Coding with a Fidelity Criterion.
Abstract: Communication Systems and Information Theory. A Measure of Information. Coding for Discrete Sources. Discrete Memoryless Channels and Capacity. The Noisy-Channel Coding Theorem. Techniques for Coding and Decoding. Memoryless Channels with Discrete Time. Waveform Channels. Source Coding with a Fidelity Criterion. Index.

6,684 citations

Journal ArticleDOI
TL;DR: The compression ratio achieved by the proposed universal code uniformly approaches the lower bounds on the compression ratios attainable by block-to-variable codes and variable- to-block codes designed to match a completely specified source.
Abstract: A universal algorithm for sequential data compression is presented. Its performance is investigated with respect to a nonprobabilistic model of constrained sources. The compression ratio achieved by the proposed universal code uniformly approaches the lower bounds on the compression ratios attainable by block-to-variable codes and variable-to-block codes designed to match a completely specified source.

5,844 citations

Journal ArticleDOI
TL;DR: A new approach to the problem of evaluating the complexity ("randomness") of finite sequences is presented, related to the number of steps in a self-delimiting production process by which a given sequence is presumed to be generated.
Abstract: A new approach to the problem of evaluating the complexity ("randomness") of finite sequences is presented. The proposed complexity measure is related to the number of steps in a self-delimiting production process by which a given sequence is presumed to be generated. It is further related to the number of distinct substrings and the rate of their occurrence along the sequence. The derived properties of the proposed measure are discussed and motivated in conjunction with other well-established complexity criteria.

2,473 citations

Journal ArticleDOI
TL;DR: The finite-state complexity of a sequence plays a role similar to that of entropy in classical information theory (which deals with probabilistic ensembles of sequences rather than an individual sequence).
Abstract: A quantity called the {\em finite-state} complexity is assigned to every infinite sequence of elements drawn from a finite sot. This quantity characterizes the largest compression ratio that can be achieved in accurate transmission of the sequence by any finite-state encoder (and decoder). Coding theorems and converses are derived for an individual sequence without any probabilistic characterization, and universal data compression algorithms are introduced that are asymptotically optimal for all sequences over a given alphabet. The finite-state complexity of a sequence plays a role similar to that of entropy in classical information theory (which deals with probabilistic ensembles of sequences rather than an individual sequence). For a probabilistic source, the expectation of the finite state complexity of its sequences is equal to the source's entropy. The finite state complexity is of particular interest when the source statistics are unspecified.

202 citations

Journal ArticleDOI
TL;DR: The application of the tests to finite deterministic automata is discussed and a method of constructing a decoder for a given finite automaton that is information lossless of finite order, is described.
Abstract: A coding graph is a model which contains all the types of finite automata and codes as special cases. A test for information losslessness and for information losslessness of finite order of a coding graph is described. Efficient methods of computation are given which make the calculation simple and mechanizable. The application of the tests to finite deterministic automata is discussed and a method of constructing a decoder for a given finite automaton that is information lossless of finite order, is described.

57 citations