Home
/
Topics
/
Approximate string matching

Topic

Approximate string matching

About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1981
1980
1979
1978
1977
1976
1975
1974
1973

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Fast exact string matching algorithms

[...]

Thierry Lecroq¹•Institutions (1)

University of Rouen¹

30 May 2007-Information Processing Letters

TL;DR: A very fast new family of string matching algorithms based on hashing q-grams are proposed, which are the fastest on many cases, in particular, on small size alphabets.

...read moreread less

122 citations

Patent•

Efficient fuzzy match for evaluating data records

[...]

Surajit Chaudhuri¹, Kris Ganjam¹, Venkatesh Ganti¹, Rajeev Motwani¹•Institutions (1)

Microsoft¹

20 Jun 2003

TL;DR: In this article, a disclosed similarity function that utilizes token substrings referred to as q-grams overcomes limitations of prior art similarity functions while efficiently performing a fuzzy match process is proposed.

...read moreread less

Abstract: To help ensure high data quality, data warehouses validate and clean, if needed incoming data tuples from external sources. In many situations, input tuples or portions of input tuples must match acceptable tuples in a reference table. For example, product name and description fields in a sales record from a distributor must match the pre-recorded name and description fields in a product reference relation. A disclosed system implements an efficient and accurate approximate or fuzzy match operation that can effectively clean an incoming tuple if it fails to match exactly with any of the multiple tuples in the reference relation. A disclosed similarity function that utilizes token substrings referred to as q-grams overcomes limitations of prior art similarity functions while efficiently performing a fuzzy match process.

...read moreread less

120 citations

Proceedings Article•DOI•

Efficient approximate and dynamic matching of patterns using a labeling paradigm

[...]

Sühleyman Cenk Sahinalp¹, Uzi Vishkin•Institutions (1)

University of Maryland, College Park¹

14 Oct 1996

TL;DR: The authors show that this general method based on assigning labels to some of the substrings of a given string is also useful for several central problems in the area of string processing: approximate string matching, dynamic dictionary matching, and dynamic text indexing.

...read moreread less

Abstract: A key approach in string processing algorithmics has been the labeling paradigm which is based on assigning labels to some of the substrings of a given string. If these labels are chosen consistently, they can enable fast comparisons of substrings. Until the first optimal parallel algorithm for suffix tree construction was given by the authors in 1994 the labeling paradigm was considered not to be competitive with other approaches. They show that this general method is also useful for several central problems in the area of string processing: approximate string matching, dynamic dictionary matching, and dynamic text indexing. The approximate string matching problem deals with finding all substrings of a text which match a pattern "approximately", i.e., with at most m differences. The differences can be in the form of inserted, deleted, or replaced characters. The text indexing problem deals with finding all occurrences of a pattern in a text, after the text is preprocessed. In the dynamic text indexing problem, updates to the text in the form of insertions and deletions of substrings are permitted. The dictionary matching problem deals with finding all occurrences of each pattern set of a set of patterns in a text, after the pattern set is preprocessed. In the dynamic dictionary matching problem, insertions and deletions of patterns to the pattern set are permitted.

...read moreread less

119 citations

Book Chapter•DOI•

Fast and Practical Approximate String Matching

[...]

Ricardo Baeza-Yates¹, Chris H. Perleberg¹•Institutions (1)

University of Chile¹

29 Apr 1992

TL;DR: This work presents an algorithm for string matching with mismatches based in arithmetical operations that runs in linear worst case time for most practical cases and presents a new approach to string searching.

...read moreread less

Abstract: We present new algorithms for approximate string matching based in simple, but efficient, ideas. First, we present an algorithm for string matching with mismatches based in arithmetical operations that runs in linear worst case time for most practical cases. This is a new approach to string searching. Second, we present an algorithm for string matching with errors based on partitioning the pattern that requires linear expected time for typical inputs.

...read moreread less

118 citations

Journal Article•DOI•

Block edit models for approximate string matching

[...]

Daniel P. Lopresti¹, Andrew Tomkins¹•Institutions (1)

Princeton University¹

15 Jul 1997-Theoretical Computer Science

TL;DR: This paper examines string block edit distance, in which two strings A and B are compared by extracting collections of substrings and placing them into correspondence, and shows that several variants are NPcomplete and give polynomial-time algorithms for solving the remainder.

...read moreread less

118 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
…
16
17
18
19
20
21
22
…
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

1,942

Papers

64,998

Citations

No. of papers in the topic in previous years
Year	Papers
2023	8
2022	30
2021	32
2020	30
2019	48
2018	39

Approximate string matching

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics