scispace - formally typeset
Book ChapterDOI

Boyer-Moore approach to approximate string matching

TLDR
In this article, a generalized Boyer-Moore algorithm was proposed for approximate string matching with k mismatches and k differences, where the problem is to find all approximate occurrences of a pattern in a text with ≤ k differences (insertions, deletions, changes).
Abstract
The Boyer-Moore idea applied in exact string matching is generalized to approximate string matching. Two versions of the problem are considered. The k mismatches problem is to find all approximate occurrences of a pattern string (length m) in a text string (length n) with at most k mismatches. Our generalized Boyer-Moore algorithm solves the problem in expected time O(kn(1/(m − k)+k / c)) where c is the size of the alphabet. A related algorithm is developed for the k differences problem where the task is to find all approximate occurrences of a pattern in a text with ≤ k differences (insertions, deletions, changes).

read more

Citations
More filters
Journal ArticleDOI

Approximate string-matching with q -grams and maximal matches

TL;DR: Two string distance functions that are computable in linear time give a lower bound for the edit distance (in the unit cost model), which leads to fast hybrid algorithms for the edited distance based string matching.
Patent

System and methods for searching and matching databases

TL;DR: In this paper, the Soundex function is used to convert elements to terms and then compared against an index of terms to determine which database records relate to the input search data through statistical analysis, match records are given a record weight which may be used to calculate how closely the input data actually is to each match record.
Book ChapterDOI

Two algorithms for approxmate string matching in static texts

TL;DR: A scheme in which T is first preprocessed to make the subsequent searches with different P fast to find all approximate occurrences P′ of a pattern string P in a text string T such that the edit distance between P and P′ is ≤k.
Journal ArticleDOI

Faster Approximate String Matching

TL;DR: The algorithm is based on the simulation of a nondeterministic finite automaton built from the pattern and using the text as input and it is shown that the algorithms are among the fastest for typical text searching, being the fastest in some cases.
Book ChapterDOI

Approximate String-Matching over Suffix Trees

TL;DR: It is shown how the searches can be done fast using the suffix tree of T augmented with the suffix links as the preprocessed form of T and applying dynamic programming over the tree.
References
More filters
Journal ArticleDOI

Efficient string matching: an aid to bibliographic search

TL;DR: A simple, efficient algorithm to locate all occurrences of any of a finite number of keywords in a string of text that has been used to improve the speed of a library bibliographic search program by a factor of 5 to 10.
Journal ArticleDOI

The String-to-String Correction Problem

TL;DR: An algorithm is presented which solves the string-to-string correction problem in time proportional to the product of the lengths of the two strings.
Journal ArticleDOI

Fast Pattern Matching in Strings

TL;DR: An algorithm is presented which finds all occurrences of one given string within another, in running time proportional to the sum of the lengths of the strings, showing that the set of concatenations of even palindromes, i.e., the language $\{\alpha \alpha ^R\}^*$, can be recognized in linear time.
Journal ArticleDOI

A fast string searching algorithm

TL;DR: The algorithm has the unusual property that, in most cases, not all of the first i.” in another string, are inspected.
Journal ArticleDOI

Algorithms for approximate string matching

TL;DR: An improved algorithm that works in time and in space O and algorithms that can be used in conjunction with extended edit operation sets, including, for example, transposition of adjacent characters.