Search or ask a question

Showing papers by "Richard Cole published in 2004"

PDF

Open Access

Proceedings Article•DOI•

Dictionary matching and indexing with errors and don't cares

[...]

Richard Cole¹, Lee-Ad Gottlieb¹, Moshe Lewenstein²•Institutions (2)

New York University¹, Bar-Ilan University²

13 Jun 2004

TL;DR: This paper considers various flavors of the following online problem: preprocess a text or collection of strings, so that given a query string p, all matches of p with the text can be reported quickly.

...read moreread less

Abstract: This paper considers various flavors of the following online problem: preprocess a text or collection of strings, so that given a query string p, all matches of p with the text can be reported quickly. In this paper we consider matches in which a bounded number of mismatches are allowed, or in which a bounded number of "don't care" characters are allowed. The specific problems we look at are: indexing, in which there is a single text t, and we seek locations where p matches a substring of t; dictionary queries, in which a collection of strings is given upfront, and we seek those strings which match p in their entirety; and dictionary matching, in which a collection of strings is given upfront, and we seek those substrings of a (long) p which match an original string in its entirety. These are all instances of an all-to-all matching problem, for which we provide a single solution.The performance bounds all have a similar character. For example, for the indexing problem with n=|t| and m=|p|, the query time for k substitutions is O(m + (c1 log n)k⁄k! + # matches), with a data structure of size O(n (c2 log n)k⁄k!) and a preprocessing time of O(n (c2 log n)k⁄k!), where c1,c2 > 1 are constants. The deterministic preprocessing assumes a weakly nonuniform RAM model; this assumption is not needed if randomization is used in the preprocessing.

...read moreread less

301 citations

Journal Article•DOI•

Faster Suffix Tree Construction with Missing Suffix Links

[...]

Richard Cole, Ramesh Hariharan¹•Institutions (1)

Indian Institute of Science¹

01 Jan 2004-SIAM Journal on Computing

TL;DR: This work adds a new back-propagation component to McCreight's algorithm and gives a high probability hashing scheme for large degrees, which gives the first randomized linear time algorithm for constructing suffix trees for parameterized strings.

...read moreread less

Abstract: We consider suffix tree construction for situations with missing suffix links Two examples of such situations are suffix trees for parameterized strings and suffix trees for two-dimensional arrays These trees also have the property that the node degrees may be large We add a new back-propagation component to McCreight's algorithm and also give a high probability hashing scheme for large degrees We show that these two features enable construction of suffix trees for general situations with missing suffix links in O(n) time, with high probability This gives the first randomized linear time algorithm for constructing suffix trees for parameterized strings

...read moreread less

27 citations

Book Chapter•DOI•

The average case analysis of partition sorts

[...]

Richard Cole¹, David C. Kandathil¹•Institutions (1)

New York University¹

14 Sep 2004

TL;DR: In this article, a new family of in-place sorting algorithms, the partition sorts, is introduced, which is appealing both for their relative simplicity and their efficient performance, achieving O(n log n) operations on the average and O( n \log 2 n ) operations in the worst case.

...read moreread less

Abstract: This paper introduces a new family of in-place sorting algorithms, the partition sorts. They are appealing both for their relative simplicity and their efficient performance. They perform Θ(n log n) operations on the average, and \(\Theta(n \log^2\!n)\) operations in the worst case.

...read moreread less

4 citations

Journal Article•DOI•

Parallel two dimensional witness computation

[...]

Richard Cole¹, Zvi Galil², Ramesh Hariharan³, S. Muthukrishnan⁴, Kunsoo Park⁵ - Show less +1 more•Institutions (5)

Courant Institute of Mathematical Sciences¹, Columbia University², Indian Institute of Science³, Rutgers University⁴, Seoul National University⁵

10 Jan 2004-Information & Computation

TL;DR: An optimal parallel CRCW-PRAM algorithm to compute witnesses for all non-period vectors of an m1 × m2 pattern is given and yields a work optimal algorithm for 2D pattern matching.

...read moreread less

Abstract: An optimal parallel CRCW-PRAM algorithm to compute witnesses for all non-period vectors of an m1 × m2 pattern is given. The algorithm takes O(log log m) time and does O(m1 × m2) work, where m = max{m1, m2}. This yields a work optimal algorithm for 2D pattern matching which takes O(log log m) preprocessing time and O(1) text processing time.

...read moreread less