scispace - formally typeset
Search or ask a question
Topic

Approximate string matching

About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.


Papers
More filters
Proceedings ArticleDOI
19 May 2002
TL;DR: The crucial new idea underlying the first three results above is that of confirming matches by convolving vectors obtained by coding characters in the alphabet with non-boolean entries; in contrast, almost all previous pattern matching algorithms consider only boolean codes for the alphabet.
Abstract: (MATH) This paper obtains the following results on pattern matching problems in which the text has length n and the pattern has length mAn O(nlog m) time deterministic algorithm for the String Matching with Wildcards problems, even when the alphabet is large.An O(klog2 m) time Las Vegas algorithm for the Sparse String Matching with Wildcards problem, where k«n is the number of non-zeros in the text. We also give Las Vegas algorithms for the higher dimensional version of this problem.As an application of the above, an O(nlog2 m) time Las Vegas algorithm for the Subset Matching and Tree Pattern Matching problems, and a Las Vegas algorithm for the Geometric Pattern Matching problem.Finally, an O(nlog2 m) time deterministic algorithm for Subset Matching and Tree Pattern Matching..The crucial new idea underlying the first three results above is that of confirming matches by convolving vectors obtained by coding characters in the alphabet with non-boolean (i.e., rational or even complex) entries; in contrast, almost all previous pattern matching algorithms consider only boolean codes for the alphabet. The crucial new idea underlying the fourth result is a simpler method of shifting characters which ensures that each character occurs as a singleton in some shift.

159 citations

01 Jan 2001
TL;DR: It is shown that gapped q-grams can provide orders of magnitude faster and/or more efficient filtering than contiguous q- grams and a filter parameter called threshold have to be optimized.
Abstract: A popular and well-studied class of filters for approximate string matching compares substrings of length q, the q-grams, in the pattern and the text to identify text areas that contain potential matches. A generalization of the method that uses gapped q-grams instead of contiguous substrings is mentioned a few times in literature but has never been analyzed in any depth. In this paper, we report the first results of a study on gapped q-grams. We show that gapped q-grams can provide orders of magnitude faster and/or more efficient filtering than contiguous q-grams. To achieve these results the arrangement of the gaps in the q-gram and a filter parameter called threshold have to be optimized. Both of these tasks are nontrivial combinatorial optimization problems for which we present efficient solutions. We concentrate on the k mismatches problem, i.e, approximate string matching with the Hamming distance.

159 citations

Journal ArticleDOI
TL;DR: The string matching with mismatches problem as discussed by the authors is that of finding the number of mismatches between a pattern P of length m and every length m substring of the text T.

155 citations

Journal ArticleDOI
TL;DR: A new method for the recognition of arbitrary two-dimensional shapes based on string edit distance computation is described, which is invariant under translation, rotation, scaling and partial occlusion.

154 citations

Journal ArticleDOI
TL;DR: This paper surveys techniques for designing efficient sequential and parallel approximate string matching algorithms and special attention is given to the methods for the construction of data structures that efficiently support primitive operations needed in approximatestring matching.

153 citations


Network Information
Related Topics (5)
Server
79.5K papers, 1.4M citations
81% related
Cluster analysis
146.5K papers, 2.9M citations
80% related
Scheduling (computing)
78.6K papers, 1.3M citations
79% related
Network packet
159.7K papers, 2.2M citations
78% related
Optimization problem
96.4K papers, 2.1M citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20238
202230
202132
202030
201948
201839