Topic
Approximate string matching
About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.
Papers published on a yearly basis
Papers
More filters
••
19 May 2002TL;DR: The crucial new idea underlying the first three results above is that of confirming matches by convolving vectors obtained by coding characters in the alphabet with non-boolean entries; in contrast, almost all previous pattern matching algorithms consider only boolean codes for the alphabet.
Abstract: (MATH) This paper obtains the following results on pattern matching problems in which the text has length n and the pattern has length mAn O(nlog m) time deterministic algorithm for the String Matching with Wildcards problems, even when the alphabet is large.An O(klog2 m) time Las Vegas algorithm for the Sparse String Matching with Wildcards problem, where k«n is the number of non-zeros in the text. We also give Las Vegas algorithms for the higher dimensional version of this problem.As an application of the above, an O(nlog2 m) time Las Vegas algorithm for the Subset Matching and Tree Pattern Matching problems, and a Las Vegas algorithm for the Geometric Pattern Matching problem.Finally, an O(nlog2 m) time deterministic algorithm for Subset Matching and Tree Pattern Matching..The crucial new idea underlying the first three results above is that of confirming matches by convolving vectors obtained by coding characters in the alphabet with non-boolean (i.e., rational or even complex) entries; in contrast, almost all previous pattern matching algorithms consider only boolean codes for the alphabet. The crucial new idea underlying the fourth result is a simpler method of shifting characters which ensures that each character occurs as a singleton in some shift.
159 citations
01 Jan 2001
TL;DR: It is shown that gapped q-grams can provide orders of magnitude faster and/or more efficient filtering than contiguous q- grams and a filter parameter called threshold have to be optimized.
Abstract: A popular and well-studied class of filters for approximate string matching compares substrings of length q, the q-grams, in the pattern and the text to identify text areas that contain potential matches. A generalization of the method that uses gapped q-grams instead of contiguous substrings is mentioned a few times in literature but has never been analyzed in any depth. In this paper, we report the first results of a study on gapped q-grams. We show that gapped q-grams can provide orders of magnitude faster and/or more efficient filtering than contiguous q-grams. To achieve these results the arrangement of the gaps in the q-gram and a filter parameter called threshold have to be optimized. Both of these tasks are nontrivial combinatorial optimization problems for which we present efficient solutions. We concentrate on the k mismatches problem, i.e, approximate string matching with the Hamming distance.
159 citations
••
TL;DR: The string matching with mismatches problem as discussed by the authors is that of finding the number of mismatches between a pattern P of length m and every length m substring of the text T.
155 citations
••
TL;DR: A new method for the recognition of arbitrary two-dimensional shapes based on string edit distance computation is described, which is invariant under translation, rotation, scaling and partial occlusion.
154 citations
••
TL;DR: This paper surveys techniques for designing efficient sequential and parallel approximate string matching algorithms and special attention is given to the methods for the construction of data structures that efficiently support primitive operations needed in approximatestring matching.
153 citations