Conference

Combinatorial Pattern Matching

About: Combinatorial Pattern Matching is an academic conference. The conference publishes majorly in the area(s): String (computer science) & Time complexity. Over the lifetime, 977 publications have been published by the conference receiving 22247 citations.

...read moreread less

Topics: String (computer science), Time complexity, String searching algorithm, Substring, Pattern matching ...read more

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Approximate string-matching with q -grams and maximal matches

[...]

Esko Ukkonen¹•Institutions (1)

University of Helsinki¹

06 Jan 1992

TL;DR: Two string distance functions that are computable in linear time give a lower bound for the edit distance (in the unit cost model), which leads to fast hybrid algorithms for the edited distance based string matching.

...read moreread less

Abstract: We study approximate string matching in connection with two string distance functions that are computable in linear time. The first function is based on the so-called $q$-grams. An algorithm is given for the associated string matching problem that finds the locally best approximate occurences of pattern $P$, $|P|=m$, in text $T$, $|T|=n$, in time $O(n\log (m-q))$. The occurences with distance $\leq k$ can be found in time $O(n\log k)$. The other distance function is based on finding maximal common substrings and allows a form of approximate string matching in time $O(n)$. Both distances give a lower bound for the edit distance (in the unit cost model), which leads to fast hybrid algorithms for the edit distance based string matching.

...read moreread less

665 citations

Book Chapter•DOI•

Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications

[...]

Toru Kasai¹, Gunho Lee², Hiroki Arimura³, Hiroki Arimura¹, Setsuo Arikawa¹, Kunsoo Park² - Show less +2 more•Institutions (3)

Kyushu University¹, Seoul National University², National Presto Industries³

01 Jul 2001

TL;DR: It is shown that the algorithm is crucial to the effective use of block-sorting compression and a linear-time algorithm to simulate the bottom-up traversal of a suffix tree with a suffix array combined with the longest common prefix information is presented.

...read moreread less

Abstract: We present a linear-time algorithm to compute the longest common prefix information in suffix arrays As two applications of our algorithm, we show that our algorithm is crucial to the effective use of block-sorting compression, and we present a linear-time algorithm to simulate the bottom-up traversal of a suffix tree with a suffix array combined with the longest common prefix information

...read moreread less

520 citations

Book Chapter•DOI•

Identifying and Filtering Near-Duplicate Documents

[...]

Andrei Z. Broder

21 Jun 2000

TL;DR: The algorithm for filtering near-duplicate documents discussed here has been successfully implemented and has been used for the last three years in the context of the AltaVista search engine.

...read moreread less

Abstract: The mathematical concept of document resemblance captures well the informal notion of syntactic similarity. The resemblance can be estimated using a fixed size "sketch" for each document. For a large collection of documents (say hundreds of millions) the size of this sketch is of the order of a few hundred bytes per document. However, for effcient large scale web indexing it is not necessary to determine the actual resemblance value: it suffces to determine whether newly encountered documents are duplicates or near-duplicates of documents already indexed. In other words, it suffces to determine whether the resemblance is above a certain threshold. In this talk we show how this determination can be made using a "sample" of less than 50 bytes per document. The basic approach for computing resemblance has two aspects: first, resemblance is expressed as a set (of strings) intersection problem, and second, the relative size of intersections is evaluated by a process of random sampling that can be done independently for each document. The process of estimating the relative size of intersection of sets and the threshold test discussed above can be applied to arbitrary sets, and thus might be of independent interest. The algorithm for filtering near-duplicate documents discussed here has been successfully implemented and has been used for the last three years in the context of the AltaVista search engine.

...read moreread less

465 citations

Book Chapter•DOI•

Space efficient linear time construction of suffix arrays

[...]

Pang Ko¹, Srinivas Aluru¹•Institutions (1)

Iowa State University¹

25 Jun 2003

TL;DR: This work presents a linear time algorithm to sort all the suffixes of a string over a large alphabet of integers, which improves upon the best known direct algorithm for suffix sorting, which takes O(n log n) time.

...read moreread less

Abstract: We present a linear time algorithm to sort all the suffixes of a string over a large alphabet of integers. The sorted order of suffixes of a string is also called suffix array, a data structure introduced by Manber and Myers that has numerous applications in pattern matching, string processing, and computational biology. Though the suffix tree of a string can be constructed in linear time and the sorted order of suffixes derived from it, a direct algorithm for suffix sorting is of great interest due to the space requirements of suffix trees. Our result improves upon the best known direct algorithm for suffix sorting, which takes O(n log n) time. We also show how to construct suffix trees in linear time from our suffix sorting result. Apart from being simple and applicable for alphabets not necessarily of fixed size, this method of constructing suffix trees is more space efficient.

...read moreread less

305 citations

Book Chapter•DOI•

Proximity Matching Using Fixed-Queries Trees

[...]

Ricardo Baeza-Yates¹, Walter Cunto, Udi Manber², Sun Wu³•Institutions (3)

University of Chile¹, University of Arizona², National Chung Cheng University³

05 Jun 1994

TL;DR: This work presents a new data structure, called the fixed-queries tree, for the problem of finding all elements of a fixed set that are close to a query element under some distance function.

...read moreread less

Abstract: We present a new data structure, called the fixed-queries tree, for the problem of finding all elements of a fixed set that are close, under some distance function, to a query element. Fixed-queries trees can be used for any distance function, not necessarily even a metric, as long as it satisfies the triangle inequality. We give an analysis of several performance parameters of fixed-queries trees and experimental results that support the analysis. Fixed-queries trees are particularly efficient for applications in which comparing two elements is expensive.

...read moreread less

207 citations

Collapse

Performance

Metrics

977

Papers

22,247

Citations

No. of papers from the Conference in previous years
Year	Papers
2021	16
2020	38
2019	44
2018	39
2017	53
2016	45