Showing papers on "Compressed pattern matching published in 1999"

PDF

Open Access

Proceedings Article•DOI•

[...]

N.J. Larsson¹, Alistair Moffat²•Institutions (2)

Lund University¹, University of Melbourne²

29 Mar 1999

TL;DR: A compression scheme is developed that is a combination of a simple but powerful phrase derivation method and a compact dictionary encoding that is highly efficient, particularly in decompression, and has characteristics that make it a favorable choice when compressed data is to be searched directly.

...read moreread less

Abstract: Dictionary-based modelling is the mechanism used in many practical compression schemes. We use the full message (or a large block of it) to infer a complete dictionary in advance, and include an explicit representation of the dictionary as part of the compressed message. Intuitively, the advantage of this offline approach is that with the benefit of having access to all of the message, it should be possible to optimize the choice of phrases so as to maximize compression performance. Indeed, we demonstrate that very good compression can be attained by an offline method without compromising the fast decoding that is a distinguishing characteristic of dictionary-based techniques. Several nontrivial sources of overhead, in terms of both computation resources required to perform the compression, and bits generated into the compressed message, have to be carefully managed as part of the offline process. To meet this challenge, we have developed a novel phrase derivation method and a compact dictionary encoding. In combination these two techniques produce the compression scheme RE-PAIR, which is highly efficient, particularly in decompression.

...read moreread less

228 citations

Book Chapter•DOI•

A General Practical Approach to Pattern Matching over Ziv-Lempel Compressed Text

[...]

Gonzalo Navarro¹, Mathieu Raffinot²•Institutions (2)

University of Chile¹, Institut Gaspard Monge²

22 Jul 1999

TL;DR: A general technique for string matching when the text comes as a sequence of blocks is developed, which abstracts the essential features of Ziv-Lempel compression and presents the first algorithm to find all the matches of a pattern in a text compressed using LZ77.

...read moreread less

Abstract: We address the problem of string matching on Ziv-Lempel compressed text. The goal is to search a pattern in a text without uncompressing it. This is a highly relevant issue to keep compressed text databases where efficient searching is still possible. We develop a general technique for string matching when the text comes as a sequence of blocks. This abstracts the essential features of Ziv-Lempel compression. We then apply the scheme to each particular type of compression. We present the first algorithm to find all the matches of a pattern in a text compressed using LZ77. When we apply our scheme to LZ78, we obtain a much more efficient search algorithm, which is faster than uncompressing the text and then searching on it. Finally, we propose a new hybrid compression scheme which is between LZ77 and LZ78, being in practice as good to compress as LZ77 and as fast to search in as LZ78.

...read moreread less

123 citations

Book Chapter•DOI•

Shift-And Approach to Pattern Matching in LZW Compressed Text

[...]

Takuya Kida¹, Masayuki Takeda¹, Ayumi Shinohara¹, Setsuo Arikawa¹•Institutions (1)

Kyushu University¹

22 Jul 1999

TL;DR: This paper considers the Shift-And approach to the problem of pattern matching in LZW compressed text, and gives a new algorithm that solves it, and shows that the algorithm is indeed fast when a pattern length is at most 32, or the word length.

...read moreread less

Abstract: This paper considers the Shift-And approach to the problem of pattern matching in LZW compressed text, and gives a new algorithm that solves it. The algorithm is indeed fast when a pattern length is at most 32, or the word length. After an O(m + |Σ|) time and O(|Σ|) space preprocessing of a pattern, it scans an LZW compressed text in O(n + r) time and reports all occurrences of the pattern, where n is the compressed text length, m is the pattern length, and r is the number of the pattern occurrences. Experimental results show that it runs approximately 1.5 times faster than a decompression followed by a simple search using the Shift-And algorithm. Moreover, the algorithm can be extended to the generalized pattern matching, to the pattern matching with k mismatches, and to the multiple pattern matching, like the Shift-And algorithm.

...read moreread less

58 citations

A Boyer-Moore type algorithm for compressed pattern matching

[...]

Yusuke Shibata, 裕介柴田, Tetsuya Matsumoto¹, 徹也松本¹, Masayuki Takeda¹, 正幸竹田, Ayumi Shinohara¹, 歩篠原, Setsuo Arikawa¹, 節夫有川 - Show less +6 more•Institutions (1)

Kyushu University¹

01 Jan 1999

57 citations

Proceedings Article•DOI•

Almost-optimal fully LZW-compressed pattern matching

[...]

Leszek Gasieniec¹, W. Rytter²•Institutions (2)

University of Liverpool¹, University of Warmia and Mazury in Olsztyn²

29 Mar 1999

TL;DR: The techniques used can be used in design of efficient algorithms for a wide range of the most typical string problems, in the compressed LZW setting, including: computing a period of a word, finding repetitions, symmetries, counting subwords, and multi-pattern matching.

...read moreread less

Abstract: Given two strings: pattern P and text T of lengths |P|=M and |T|=N, a string matching problem is to find all occurrences of pattern P in text T. A fully compressed string matching problem is the string matching problem with input strings P and T given in compressed forms p and t respectively, where |p|=m and |t|=n. We present first, almost-optimal, string matching algorithms for LZW-compressed strings running in: (1) O((n+m)log(n+m)) time on a single processor machine; and (2) O/sup /spl tilde//(n+m) work on a (n+m)-processor PRAM. The techniques used can be used in design of efficient algorithms for a wide range of the most typical string problems, in the compressed LZW setting, including: computing a period of a word, finding repetitions, symmetries, counting subwords, and multi-pattern matching.

...read moreread less

51 citations

Proceedings Article•DOI•

A unifying framework for compressed pattern matching

[...]

Takuya Kida¹, Yusuke Shibata¹, Masayuki Takeda, Ayumi Shinohara¹, Setsuo Arikawa¹ - Show less +1 more•Institutions (1)

Kyushu University¹

21 Sep 1999

TL;DR: A general framework which is suitable to capture an essence of compressed pattern matching according to various dictionary based compressions is introduced, and a compressedpattern matching algorithm for the framework is proposed.

...read moreread less

Abstract: We introduce a general framework which is suitable to capture an essence of compressed pattern matching according to various dictionary based compressions, and propose a compressed pattern matching algorithm for the framework. The goal is to find all occurrences of a pattern in a text without decompression, which is one of the most active topics in string matching. Our framework includes such compression methods as Lempel-Ziv family, (LZ77, LZSS, LZ78, LZW) (J. Ziv and A. Lempel, 1978), byte-pair encoding, and the static dictionary based method. Technically, our pattern matching algorithm extends that for LZW compressed text presented by A. Amir et al. (1996).

...read moreread less

46 citations

Book Chapter•DOI•

Pattern Matching in Text Compressed by Using Antidictionaries

[...]

Yusuke Shibata¹, Masayuki Takeda¹, Ayumi Shinohara¹, Setsuo Arikawa¹•Institutions (1)

Kyushu University¹

22 Jul 1999

TL;DR: An algorithm is shown which preprocesses a pattern of length m and an antidictionary M in O(m2 + ||M||) time, and then scans a compressed text of length n in O (n + r) time to find all pattern occurrences.

...read moreread less

Abstract: In this paper we focus on the problem of compressed pattern matching for the text compression using antidictionaries, which is a new compression scheme proposed recently by Crochemore et al. (1998). We show an algorithm which preprocesses a pattern of length m and an antidictionary M in O(m2 + ||M||) time, and then scans a compressed text of length n in O(n + r) time to find all pattern occurrences, where ||M|| is the total length of strings in M and r is the number of the pattern occurrences.

...read moreread less

34 citations