scispace - formally typeset
Search or ask a question

Showing papers on "Compressed pattern matching published in 1997"


Journal ArticleDOI
Udi Manber1
TL;DR: A new text compression scheme is presented in this article to speed up string matching by searching the compressed file directly, and can remain compressed indefinitely, saving space while allowing faster search at the same time.
Abstract: A new text compression scheme is presented in this article. The main purpose of this scheme is to speed up string matching by searching the compressed file directly. The scheme requires no modification of the string-matching algorithm, which is used as a black box; any string-matching procedure can be used. Instead, the pattern is modified; only the outcome of the matching of the modified pattern against the compressed file is decompressed. Since the compressed file is smaller than the original file, the search is faster both in terms of I/O time and precessing time than a search in the original file. For typical text files, we achieve about 30% reduction of space and slightly less of search time. A 30% space saving is not competitive with good text compression schemes, and thus should not be used where space is the predominant concern. The intended applications of this scheme are files that are searched often, such as catalogs, bibliographic files, and address books. Such files are typically not compressed, but with this scheme they can remain compressed indefinitely, saving space while allowing faster search at the same time. A particular application to an information retrieval system that we developed is also discussed.

151 citations



Journal Article
TL;DR: An O(n 4 log n) time algorithm is shown for the pattern matching problem for strings which are succinctly described in terms of straight-line programs, in which the constants are symbols and the only operation is the concatenation.
Abstract: We investigate the time complexity of the pattern matching problem for strings which are succinctly described in terms of straight-line programs, in which the constants are symbols and the only operation is the concatenation. Most strings of descriptive size n are of exponential length with respect to n. We show an O(n 4 log n) time algorithm for this problem. The crucial point in our algorithm is the succinct representation of all periods of a (possibly long) string described in this manner. We also show a (rather straightforward) result that a very simple extension of the pattern-matching problem for shortly described strings is NP-complete.

91 citations


Journal ArticleDOI
TL;DR: This work achieves optimal time by proving new properties of two-dimensional periodicity, which enables performing duels in which no witness is required, and presents the firstoptimaltwo-dimensional compressed matching algorithm.

26 citations