Home
/
Topics
/
Approximate string matching

Topic

Approximate string matching

About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1981
1980
1979
1978
1977
1976
1975
1974
1973

Papers

PDF

Open Access

More filters

Book Chapter•DOI•

Complexities of the centre and median string problems

[...]

François Nicolas¹, Eric Rivals¹•Institutions (1)

Centre national de la recherche scientifique¹

25 Jun 2003

TL;DR: This work provides an answer to the question whether the MEDIAN STRING problem is NP-complete for finite and even binary alphabets and gives the complexity of the related CENTRESTRING problem.

...read moreread less

Abstract: Given a finite set of strings, the MEDIAN STRING problem consists in finding a string that minimizes the sum of the distances to the strings in the set. Approximations of the median string are used in a very broad range of applications where one needs a representative string that summarizes common information to the strings of the set. It is the case in Classification, in Speech and Pattern Recognition, and in Computational Biology. In the latter, MEDIAN STRING is related to the key problem of Multiple Alignment. In the recent literature, one finds a theorem stating the NP-completeness of the MEDIAN STRING for unbounded alphabets. However, in the above mentioned areas, the alphabet is often finite. Thus, it remains a crucial question whether the MEDIAN STRING problem is NP-complete for finite and even binary alphabets. In this work, we provide an answer to this question and also give the complexity of the related CENTRE STRING problem. Moreover, we study the parametrized complexity of both problems with respect to the number of input strings.

...read moreread less

35 citations

Book Chapter•DOI•

Efficient Randomized Dictionary Matching Algorithms (Extended Abstract)

[...]

Amihood Amir¹, Martin Farach², Yossi Matias³•Institutions (3)

Georgia Institute of Technology¹, Rutgers University², University of Maryland, College Park³

29 Apr 1992

TL;DR: The standard string matching problem involves finding all occurrences of a single pattern in a single text, while there are some domains in which it is more appropriate to deal with dictionaries of patterns.

...read moreread less

Abstract: The standard string matching problem involves finding all occurrences of a single pattern in a single text. While this approach works well in many application areas, there are some domains in which it is more appropriate to deal with dictionaries of patterns. A dictionary is a set of patterns; the goal of dictionary matching is to find all dictionary patterns in a given text, simultaneously.

...read moreread less

34 citations

Patent•

Incremental maintenance of inverted indexes for approximate string matching

[...]

Marios Hadjieleftheriou¹, Nick Koudas¹, Divesh Srivastava¹•Institutions (1)

AT&T¹

10 Jun 2009

TL;DR: In this paper, the inverted indexes are updated only as necessary to guarantee answer precision within predefined thresholds which are determined with little cost in comparison to the updates of the indexes themselves.

...read moreread less

Abstract: In embodiments of the disclosed technology, indexes, such as inverted indexes, are updated only as necessary to guarantee answer precision within predefined thresholds which are determined with little cost in comparison to the updates of the indexes themselves. With the present technology, a batch of daily updates can be processed in a matter of minutes, rather than a few hours for rebuilding an index, and a query may be answered with assurances that the results are accurate or within a threshold of accuracy.

...read moreread less

34 citations

Book Chapter•DOI•

Fast-search: a new efficient variant of the Boyer-Moore string matching algorithm

[...]

Domenico Cantone¹, Simone Faro¹•Institutions (1)

University of Catania¹

26 May 2003-Lecture Notes in Computer Science

TL;DR: It turns out that this new variant of the Boyer-Moore string matching algorithm achieve very good results in terms of both time efficiency and number of character inspections, especially in the cases in which the patterns are very short.

...read moreread less

Abstract: We present a new variant of the Boyer-Moore string matching algorithm which, though not linear, is very fast in practice. We compare our algorithm with the Horspool, Quick Search, Tuned Boyer-Moore, and Reverse Factor algorithms, which are among the fastest string matching algorithms for practical uses. It turns out that our algorithm achieve very good results in terms of both time efficiency and number of character inspections, especially in the cases in which the patterns are very short.

...read moreread less

34 citations

Proceedings Article•DOI•

MF-Join: Efficient Fuzzy String Similarity Join with Multi-level Filtering

[...]

Jin Wang¹, Chunbin Lin², Carlo Zaniolo¹•Institutions (2)

University of California, Los Angeles¹, Amazon.com²

08 Apr 2019

TL;DR: M-Join is proposed, a multi-level filtering approach for fuzzy string similarity join that provides a flexible framework that can support multiple similarity functions at both levels and clearly outperforms state-of-the-art methods.

...read moreread less

Abstract: As an essential operation in data integration and data cleaning, similarity join has attracted considerable attention from the database community. In many application scenarios, it is essential to support fuzzy matching, which allows approximate matching between elements that improves the effectiveness of string similarity join. To describe the fuzzy matching between strings, we consider two levels of similarity, i.e., element-level and record-level similarity. Then the problem of calculating fuzzy matching similarity can be transformed into finding the weighted maximal matching in a bipartite graph. In this paper, we propose MF-Join, a multi-level filtering approach for fuzzy string similarity join. MF-Join provides a flexible framework that can support multiple similarity functions at both levels. To improve performance, we devise and implement several techniques to enhance the filter power. Specifically, we utilize a partition-based signature at the element-level and propose a frequency-aware partition strategy to improve the quality of signatures. We also devise a count filter at the record level to further prune dissimilar pairs. Moreover, we deduce an effective upper bound for the record-level similarity to reduce the computational overhead of verification. Experimental results on two popular datasets shows that our proposed method clearly outperforms state-of-the-art methods.

...read moreread less

34 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
…
64
65
66
67
68
69
70
…
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

1,942

Papers

64,998

Citations

No. of papers in the topic in previous years
Year	Papers
2023	8
2022	30
2021	32
2020	30
2019	48
2018	39

Approximate string matching

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics