scispace - formally typeset
Search or ask a question

Showing papers on "Compressed pattern matching published in 1999"


Proceedings ArticleDOI
29 Mar 1999
TL;DR: A compression scheme is developed that is a combination of a simple but powerful phrase derivation method and a compact dictionary encoding that is highly efficient, particularly in decompression, and has characteristics that make it a favorable choice when compressed data is to be searched directly.
Abstract: Dictionary-based modelling is the mechanism used in many practical compression schemes. We use the full message (or a large block of it) to infer a complete dictionary in advance, and include an explicit representation of the dictionary as part of the compressed message. Intuitively, the advantage of this offline approach is that with the benefit of having access to all of the message, it should be possible to optimize the choice of phrases so as to maximize compression performance. Indeed, we demonstrate that very good compression can be attained by an offline method without compromising the fast decoding that is a distinguishing characteristic of dictionary-based techniques. Several nontrivial sources of overhead, in terms of both computation resources required to perform the compression, and bits generated into the compressed message, have to be carefully managed as part of the offline process. To meet this challenge, we have developed a novel phrase derivation method and a compact dictionary encoding. In combination these two techniques produce the compression scheme RE-PAIR, which is highly efficient, particularly in decompression.

228 citations


Book ChapterDOI
22 Jul 1999
TL;DR: A general technique for string matching when the text comes as a sequence of blocks is developed, which abstracts the essential features of Ziv-Lempel compression and presents the first algorithm to find all the matches of a pattern in a text compressed using LZ77.
Abstract: We address the problem of string matching on Ziv-Lempel compressed text. The goal is to search a pattern in a text without uncompressing it. This is a highly relevant issue to keep compressed text databases where efficient searching is still possible. We develop a general technique for string matching when the text comes as a sequence of blocks. This abstracts the essential features of Ziv-Lempel compression. We then apply the scheme to each particular type of compression. We present the first algorithm to find all the matches of a pattern in a text compressed using LZ77. When we apply our scheme to LZ78, we obtain a much more efficient search algorithm, which is faster than uncompressing the text and then searching on it. Finally, we propose a new hybrid compression scheme which is between LZ77 and LZ78, being in practice as good to compress as LZ77 and as fast to search in as LZ78.

123 citations


Book ChapterDOI
22 Jul 1999
TL;DR: This paper considers the Shift-And approach to the problem of pattern matching in LZW compressed text, and gives a new algorithm that solves it, and shows that the algorithm is indeed fast when a pattern length is at most 32, or the word length.
Abstract: This paper considers the Shift-And approach to the problem of pattern matching in LZW compressed text, and gives a new algorithm that solves it. The algorithm is indeed fast when a pattern length is at most 32, or the word length. After an O(m + |Σ|) time and O(|Σ|) space preprocessing of a pattern, it scans an LZW compressed text in O(n + r) time and reports all occurrences of the pattern, where n is the compressed text length, m is the pattern length, and r is the number of the pattern occurrences. Experimental results show that it runs approximately 1.5 times faster than a decompression followed by a simple search using the Shift-And algorithm. Moreover, the algorithm can be extended to the generalized pattern matching, to the pattern matching with k mismatches, and to the multiple pattern matching, like the Shift-And algorithm.

58 citations



Proceedings ArticleDOI
29 Mar 1999
TL;DR: The techniques used can be used in design of efficient algorithms for a wide range of the most typical string problems, in the compressed LZW setting, including: computing a period of a word, finding repetitions, symmetries, counting subwords, and multi-pattern matching.
Abstract: Given two strings: pattern P and text T of lengths |P|=M and |T|=N, a string matching problem is to find all occurrences of pattern P in text T. A fully compressed string matching problem is the string matching problem with input strings P and T given in compressed forms p and t respectively, where |p|=m and |t|=n. We present first, almost-optimal, string matching algorithms for LZW-compressed strings running in: (1) O((n+m)log(n+m)) time on a single processor machine; and (2) O/sup /spl tilde//(n+m) work on a (n+m)-processor PRAM. The techniques used can be used in design of efficient algorithms for a wide range of the most typical string problems, in the compressed LZW setting, including: computing a period of a word, finding repetitions, symmetries, counting subwords, and multi-pattern matching.

51 citations


Proceedings ArticleDOI
21 Sep 1999
TL;DR: A general framework which is suitable to capture an essence of compressed pattern matching according to various dictionary based compressions is introduced, and a compressedpattern matching algorithm for the framework is proposed.
Abstract: We introduce a general framework which is suitable to capture an essence of compressed pattern matching according to various dictionary based compressions, and propose a compressed pattern matching algorithm for the framework. The goal is to find all occurrences of a pattern in a text without decompression, which is one of the most active topics in string matching. Our framework includes such compression methods as Lempel-Ziv family, (LZ77, LZSS, LZ78, LZW) (J. Ziv and A. Lempel, 1978), byte-pair encoding, and the static dictionary based method. Technically, our pattern matching algorithm extends that for LZW compressed text presented by A. Amir et al. (1996).

46 citations


Book ChapterDOI
22 Jul 1999
TL;DR: An algorithm is shown which preprocesses a pattern of length m and an antidictionary M in O(m2 + ||M||) time, and then scans a compressed text of length n in O (n + r) time to find all pattern occurrences.
Abstract: In this paper we focus on the problem of compressed pattern matching for the text compression using antidictionaries, which is a new compression scheme proposed recently by Crochemore et al. (1998). We show an algorithm which preprocesses a pattern of length m and an antidictionary M in O(m2 + ||M||) time, and then scans a compressed text of length n in O(n + r) time to find all pattern occurrences, where ||M|| is the total length of strings in M and r is the number of the pattern occurrences.

34 citations