A space efficient direct access data structure

doi:10.1016/J.JDA.2016.12.001

Home
/
Papers
/
A space efficient direct access data structure

Journal Article•DOI•

A space efficient direct access data structure

Gilad Baruch¹, Shmuel T. Klein¹, Dana Shapira²•Institutions (2)

Bar-Ilan University¹, Ariel University²

01 Mar 2017-Journal of Discrete Algorithms (Elsevier)-Vol. 43, pp 26-37

TL;DR: The pruning procedure is improved and empirical evidence is given that when memory storage is of main concern, the suggested data structure outperforms other direct access techniques such as those due to Külekci, DACs and sampling, with a slowdown as compared to DAC’s and fixed length encoding.

read less

About: This article is published in Journal of Discrete Algorithms.The article was published on 2017-03-01 and is currently open access. It has received 15 citations till now. The article focuses on the topics: Canonical Huffman code & Huffman coding.

...read moreread less

Citations

PDF

Open Access

More filters

Book Chapter•DOI•

Forward Looking Huffman Coding

[...]

Shmuel T. Klein¹, Shoham Saadia², Dana Shapira²•Institutions (2)

Bar-Ilan University¹, Ariel University²

01 Apr 2021

TL;DR: A new dynamic Huffman encoding approach is proposed, that provably always performs at least as good as static Huffman coding, and may be better than the standard dynamic HuffMan coding for certain files.

...read moreread less

Abstract: Huffman coding is known to be optimal, yet its dynamic version may yield smaller compressed files. The best known bound is that the number of bits used by dynamic Huffman coding in order to encode a message of n characters is at most larger by n bits than the number of bits required by static Huffman coding. In particular, dynamic Huffman coding can also generate a larger encoded file than the static variant, though in practice the file might often, but not always, be smaller. We propose here a new dynamic Huffman encoding approach, that provably always performs at least as good as static Huffman coding, and may be better than the standard dynamic Huffman coding for certain files. This is achieved by reversing the direction for the references of the encoded elements to those forming the model of the encoding, from pointing backwards to looking into the future.

...read moreread less

8 citations

Journal Article•DOI•

On the Randomness of Compressed Data

[...]

Shmuel T. Klein, Dana Shapira

07 Apr 2020-Information-an International Interdisciplinary Journal

TL;DR: Evidence is presented here that arithmetic coding may produce an output that is identical to that of Huffman coding, and it is found that there is much variability in the randomness of the output of these techniques.

...read moreread less

Abstract: It seems reasonable to expect from a good compression method that its output should not be further compressible, because it should behave essentially like random data. We investigate this premise for a variety of known lossless compression techniques, and find that, surprisingly, there is much variability in the randomness, depending on the chosen method. Arithmetic coding seems to produce perfectly random output, whereas that of Huffman or Ziv-Lempel coding still contains many dependencies. In particular, the output of Huffman coding has already been proven to be random under certain conditions, and we present evidence here that arithmetic coding may produce an output that is identical to that of Huffman.

...read moreread less

7 citations

Journal Article•DOI•

Smaller Compressed Suffix Arrays

[...]

Ekaterina Benza¹, Shmuel T. Klein², Dana Shapira¹•Institutions (2)

Ariel University¹, Bar-Ilan University²

19 May 2021-The Computer Journal

TL;DR: An alternative to compressed suffix arrays is introduced, based on representing a sequence of integers using Fibonacci encodings, thereby reducing the space requirements of state of the art implementations of the suffix array, while retaining the searching functionalities.

...read moreread less

Abstract: An alternative to compressed suffix arrays is introduced, based on representing a sequence of integers using Fibonacci encodings, thereby reducing the space requirements of state-of-the-art implementations of the suffix array, while retaining the searching functionalities. Empirical tests support the theoretical space complexity improvements and show that there is no deterioration in the processing times.

...read moreread less

7 citations

Cites methods from "A space efficient direct access dat..."

...Range decoding in WT is proposed in [12]....
[...]

Journal Article•DOI•

Analysis of Variable-Length Codes for Integer Encoding in Hyperspectral Data Compression with the k2-Raster Compact Data Structure

[...]

Kevin Chow, Dion Eustathios Olivier Tzamarias, Miguel Hernandez-Cabronero, Ian Blanes, Joan Serra-Sagrista - Show less +1 more

20 Jun 2020-Remote Sensing

TL;DR: This research shows experimental results of different integer encoders such as Rice, Simple9, Simple16, PForDelta codes, and DACs and a method to determine an appropriate k value for building a k2-raster compact data structure with competitive performance is discussed.

...read moreread less

Abstract: This paper examines the various variable-length encoders that provide integer encoding to hyperspectral scene data within a k 2 -raster compact data structure This compact data structure leads to a compression ratio similar to that produced by some of the classical compression techniques This compact data structure also provides direct access for query to its data elements without requiring any decompression The selection of the integer encoder is critical for obtaining a competitive performance considering both the compression ratio and access time In this research, we show experimental results of different integer encoders such as Rice, Simple9, Simple16, PForDelta codes, and DACs Further, a method to determine an appropriate k value for building a k 2 -raster compact data structure with competitive performance is discussed

...read moreread less

6 citations

Journal Article•DOI•

Bidirectional Adaptive Compression

[...]

Aharon Fruchtman, Shmuel T. Klein, Dana Shapira

01 May 2023-Discrete Applied Mathematics

TL;DR: In this paper , a dynamic Huffman encoding was proposed, which instead of basing itself on the information gathered from the already processed portion of the file, as traditional adaptive codings do, uses rather the information that is still to come.

...read moreread less

4 citations

References

PDF

Open Access

More filters

Journal Article•DOI•

Universal codeword sets and representations of the integers

[...]

Peter Elias¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Mar 1975-IEEE Transactions on Information Theory

TL;DR: An application is the construction of a uniformly universal sequence of codes for countable memoryless sources, in which the n th code has a ratio of average codeword length to source rate bounded by a function of n for all sources with positive rate.

...read moreread less

Abstract: Countable prefix codeword sets are constructed with the universal property that assigning messages in order of decreasing probability to codewords in order of increasing length gives an average code-word length, for any message set with positive entropy, less than a constant times the optimal average codeword length for that source. Some of the sets also have the asymptotically optimal property that the ratio of average codeword length to entropy approaches one uniformly as entropy increases. An application is the construction of a uniformly universal sequence of codes for countable memoryless sources, in which the n th code has a ratio of average codeword length to source rate bounded by a function of n for all sources with positive rate; the bound is less than two for n = 0 and approaches one as n increases.

...read moreread less

1,306 citations

Proceedings Article•DOI•

High-order entropy-compressed text indexes

[...]

Roberto Grossi¹, Ankur Gupta², Jeffrey Scott Vitter³•Institutions (3)

University of Pisa¹, Durham University², Purdue University³

12 Jan 2003

TL;DR: A novel implementation of compressed suffix arrays exhibiting new tradeoffs between search time and space occupancy for a given text (or sequence) of n symbols over an alphabet σ, where each symbol is encoded by lg|σ| bits.

...read moreread less

Abstract: We present a novel implementation of compressed suffix arrays exhibiting new tradeoffs between search time and space occupancy for a given text (or sequence) of n symbols over an alphabet σ, where each symbol is encoded by lgvσv bits. We show that compressed suffix arrays use just nHh + σ bits, while retaining full text indexing functionalities, such as searching any pattern sequence of length m in O(m lg vσv + polylog(n)) time. The term Hh ≤ lg vσv denotes the hth-order empirical entropy of the text, which means that our index is nearly optimal in space apart from lower-order terms, achieving asymptotically the empirical entropy of the text (with a multiplicative constant 1). If the text is highly compressible so that Hn = o(1) and the alphabet size is small, we obtain a text index with o(m) search time that requires only o(n) bits. Further results and tradeoffs are reported in the paper.

...read moreread less

818 citations

"A space efficient direct access dat..." refers background in this paper

...…space efficient direct access data structure ✩ Gilad Baruch a, Shmuel T. Klein a, Dana Shapira b,∗ a Computer Science Department, Bar Ilan University, Ramat Gan 52900, Israel b Computer Science Department, Ariel University, Ariel 40700, Israel a r t i c l e i n f o a b s t r a c t...
[...]

Book•

The psycho-biology of language

[...]

Miles A. Tinker, George Kingsley Zipf

01 Jan 1935

768 citations

Proceedings Article•DOI•

Space-efficient static trees and graphs

[...]

Guy Jacobson¹•Institutions (1)

Carnegie Mellon University¹

30 Oct 1989

TL;DR: Data structures that represent static unlabeled trees and planar graphs are developed, and there is no other structure that encodes n-node trees with fewer bits per node, as N grows without bound.

...read moreread less

Abstract: Data structures that represent static unlabeled trees and planar graphs are developed. The structures are more space efficient than conventional pointer-based representations, but (to within a constant factor) they are just as time efficient for traversal operations. For trees, the data structures described are asymptotically optimal: there is no other structure that encodes n-node trees with fewer bits per node, as N grows without bound. For planar graphs (and for all graphs of bounded page number), the data structure described uses linear space: it is within a constant factor of the most succinct representation. >

...read moreread less

759 citations

"A space efficient direct access dat..." refers background in this paper

...…space efficient direct access data structure ✩ Gilad Baruch a, Shmuel T. Klein a, Dana Shapira b,∗ a Computer Science Department, Bar Ilan University, Ramat Gan 52900, Israel b Computer Science Department, Ariel University, Ariel 40700, Israel a r t i c l e i n f o a b s t r a c t...
[...]

Journal Article•DOI•

The Psycho-Biology of Language

[...]

Stanley S. Newman, George Kingsley Zipf

01 Apr 1936-American Speech

747 citations