Home
/
Topics
/
Approximate string matching

Topic

Approximate string matching

About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1981
1980
1979
1978
1977
1976
1975
1974
1973

Papers

PDF

Open Access

More filters

Patent•

System and method for improved string matching under noisy channel conditions

[...]

Kevyn Collins-Thompson¹, Charles B. Schweizer¹•Institutions (1)

Microsoft¹

30 Jul 2001

TL;DR: In this article, a system and method for improving string matching in a noisy channel environment is described. The system identifies candidates within the textual file that may match the query string and analyzes the probability that the string candidate matches a user-defined string.

...read moreread less

Abstract: Described is a system and method for improving string matching in a noisy channel environment. The invention provides a method for identifying string candidates and analyzing the probability that the string candidate matches a user-defined string. In one implementation, a find engine receives a query string, converts an image file into a textual file, and identifies each instance of the query string in the textual file. The find engine identifies candidates within the textual file that may match the query string. The find engine refers to a confusion table to help identify whether candidates that are near matches to the query string are actually matches to the query string but for a common recognition error. Candidates meeting a probability threshold are identified as matches to the query string. The invention further provides for analysis options including word heuristics, language models, and OCR confidences.

...read moreread less

43 citations

Journal Article•DOI•

A comparison of three string matching algorithms

[...]

G. De V. Smit¹•Institutions (1)

Stellenbosch University¹

01 Jan 1982-Software - Practice and Experience

TL;DR: It is shown that the Boyel-Moore algorithm is extremely efficient in most cases and that, contrary to the impression one might get from the analytical results, the Knuth-Morris-Pratt algorithm is not significantly better on the average than the straightforward algorithm.

...read moreread less

Abstract: Three string matching algorithms—straightforward, Knuth-Morris-Pratt and Boyer-Moor—re examined and their time complexities discussed. A comparison of their actual average behaviour is made, based on empirical data presented. It is shown that the Boyel-Moore algorithm is extremely efficient in most cases and that, contrary to the impression one might get from the analytical results, the Knuth-Morris-Pratt algorithm is not significantly better on the average than the straightforward algorithm.

...read moreread less

43 citations

Journal Article•DOI•

Improving an Algorithm for Approximate Pattern Matching

[...]

Gonzalo Navarro¹, Ricardo Baeza-Yates¹•Institutions (1)

University of Chile¹

01 Oct 2001-Algorithmica

TL;DR: This work shows an excellent example of a complex and theoretical analysis of algorithms used for design and for practical algorithm engineering, instead of the common practice of first designing an algorithm and then analyzing it.

...read moreread less

Abstract: We study a recent algorithm for fast on-line approximate string matching. This is the problem of searching a pattern in a text allowing errors in the pattern or in the text. The algorithm is based on a very fast kernel which is able to search short patterns using a nondeterministic finite automaton, which is simulated using bit-parallelism. A number of techniques to extend this kernel for longer patterns are presented in that work. However, the techniques can be integrated in many ways and the optimal interplay among them is by no means obvious. The solution to this problem starts at a very low level, by obtaining basic probabilistic information about the problem which was not previously known, and ends integrating analytical results with empirical data to obtain the optimal heuristic. The conclusions obtained via analysis are experimentally confirmed. We also improve many of the techniques and obtain a combined heuristic which is faster than the original work. This work shows an excellent example of a complex and theoretical analysis of algorithms used for design and for practical algorithm engineering, instead of the common practice of first designing an algorithm and then analyzing it.

...read moreread less

43 citations

Fast string searching with suffix trees

[...]

Mark Nelson

01 Jan 1996

43 citations

Journal Article•DOI•

Optimization techniques for string selection and comparison problems in genomics

[...]

Cláudio N. Meneses¹, Carlos Augusto Fernandes de Oliveira, Panos M. Pardalos•Institutions (1)

University of Florida¹

06 Jun 2005-IEEE Engineering in Medicine and Biology Magazine

TL;DR: The paper presents a detailed view of the most important problems occurring in the area of string comparison and selection, using the Hamming distance measure.

...read moreread less

Abstract: In this article, a discussion of optimization issues occurring in the area of genomics such as string comparison and selection problems are discussed. With this objective, an important part of the existing results in this area will be discussed. The problems that are of interest in this paper include the closest string problem (CSP), closest substring problem (CSSP), farthest string problem (FSP), farthest substring problem (FSSP), and far from most string (FFMSP) problem. The paper presents a detailed view of the most important problems occurring in the area of string comparison and selection, using the Hamming distance measure is given.

...read moreread less

43 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
…
52
53
54
55
56
57
58
…
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

1,942

Papers

64,998

Citations

No. of papers in the topic in previous years
Year	Papers
2023	8
2022	30
2021	32
2020	30
2019	48
2018	39

Approximate string matching

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics