Boyer-Moore approach to approximate string matching

doi:10.1007/3-540-52846-6_103

Book ChapterDOI

Boyer-Moore approach to approximate string matching

- pp 348-359

TLDR

In this article, a generalized Boyer-Moore algorithm was proposed for approximate string matching with k mismatches and k differences, where the problem is to find all approximate occurrences of a pattern in a text with ≤ k differences (insertions, deletions, changes).

Abstract:

The Boyer-Moore idea applied in exact string matching is generalized to approximate string matching. Two versions of the problem are considered. The k mismatches problem is to find all approximate occurrences of a pattern string (length m) in a text string (length n) with at most k mismatches. Our generalized Boyer-Moore algorithm solves the problem in expected time O(kn(1/(m − k)+k / c)) where c is the size of the alphabet. A related algorithm is developed for the k differences problem where the task is to find all approximate occurrences of a pattern in a text with ≤ k differences (insertions, deletions, changes).

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Approximate string-matching with q -grams and maximal matches

Esko Ukkonen

TL;DR: Two string distance functions that are computable in linear time give a lower bound for the edit distance (in the unit cost model), which leads to fast hybrid algorithms for the edited distance based string matching.

...read moreread less

Patent

System and methods for searching and matching databases

David Brown, +1 more

TL;DR: In this paper, the Soundex function is used to convert elements to terms and then compared against an index of terms to determine which database records relate to the input search data through statistical analysis, match records are given a record weight which may be used to calculate how closely the input data actually is to each match record.

...read moreread less

Book ChapterDOI

Two algorithms for approxmate string matching in static texts

Petteri Jokinen, +1 more

TL;DR: A scheme in which T is first preprocessed to make the subsequent searches with different P fast to find all approximate occurrences P′ of a pattern string P in a text string T such that the edit distance between P and P′ is ≤k.

...read moreread less

Journal ArticleDOI

Faster Approximate String Matching

Ricardo Baeza-Yates, +1 more

- 01 Feb 1999 -

Algorithmica

TL;DR: The algorithm is based on the simulation of a nondeterministic finite automaton built from the pattern and using the text as input and it is shown that the algorithms are among the fastest for typical text searching, being the fastest in some cases.

...read moreread less

Book ChapterDOI

Approximate String-Matching over Suffix Trees

Esko Ukkonen

TL;DR: It is shown how the searches can be done fast using the suffix tree of T augmented with the suffix links as the preprocessed form of T and applying dynamic programming over the tree.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Efficient string matching: an aid to bibliographic search

Alfred V. Aho, +1 more

- 01 Jun 1975 -

Communications of The ACM

TL;DR: A simple, efficient algorithm to locate all occurrences of any of a finite number of keywords in a string of text that has been used to improve the speed of a library bibliographic search program by a factor of 5 to 10.

...read moreread less

Journal ArticleDOI

The String-to-String Correction Problem

Robert A. Wagner, +1 more

- 01 Jan 1974 -

Journal of the ACM

TL;DR: An algorithm is presented which solves the string-to-string correction problem in time proportional to the product of the lengths of the two strings.

...read moreread less

Journal ArticleDOI

Fast Pattern Matching in Strings

Donald E. Knuth, +2 more

- 01 Jun 1977 -

SIAM Journal on Computing

TL;DR: An algorithm is presented which finds all occurrences of one given string within another, in running time proportional to the sum of the lengths of the strings, showing that the set of concatenations of even palindromes, i.e., the language $\{\alpha \alpha ^R\}^*$, can be recognized in linear time.

...read moreread less

Journal ArticleDOI

A fast string searching algorithm

Robert S. Boyer, +1 more

- 01 Oct 1977 -

Communications of The ACM

TL;DR: The algorithm has the unusual property that, in most cases, not all of the first i.” in another string, are inspected.

...read moreread less

Journal ArticleDOI

Algorithms for approximate string matching

Esko Ukkonen

- 01 Mar 1985 -

Information & Computation

TL;DR: An improved algorithm that works in time and in space O and algorithms that can be used in conjunction with extended edit operation sets, including, for example, transposition of adjacent characters.

...read moreread less

Boyer-Moore approach to approximate string matching

Citations

Approximate string-matching with q -grams and maximal matches

System and methods for searching and matching databases

Two algorithms for approxmate string matching in static texts

Faster Approximate String Matching

Approximate String-Matching over Suffix Trees

References

Efficient string matching: an aid to bibliographic search

The String-to-String Correction Problem

Fast Pattern Matching in Strings

A fast string searching algorithm

Algorithms for approximate string matching

Related Papers (5)

Approximate string-matching with q -grams and maximal matches

Fast Pattern Matching in Strings

Fast text searching: allowing errors

A new approach to text searching

A Space-Economical Suffix Tree Construction Algorithm