Home
/
Topics
/
Approximate string matching

Topic

Approximate string matching

About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1981
1980
1979
1978
1977
1976
1975
1974
1973

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Parallel construction of a suffix tree with applications

[...]

Alberto Apostolico¹, Costas S. Iliopoulos¹, Gad M. Landau², Baruch Schieber², Uzi Vishkin³ - Show less +1 more•Institutions (3)

Purdue University¹, Tel Aviv University², Courant Institute of Mathematical Sciences³

01 Nov 1988-Algorithmica

TL;DR: This paper presents a CRCW parallel RAM algorithm that constructs the suffix tree associated with a string ofn symbols inO(logn) time withn processors that requires Θ(n2) space.

...read moreread less

Abstract: Many string manipulations can be performed efficiently on suffix trees. In this paper a CRCW parallel RAM algorithm is presented that constructs the suffix tree associated with a string ofn symbols inO(logn) time withn processors. The algorithm requires ź(n2) space. However, the space needed can be reduced toO(n1+ź) for any 0< ź ≤1, with a corresponding slow-down proportional to 1/ź. Efficient parallel procedures are also given for some string problems that can be solved with suffix trees.

...read moreread less

152 citations

Proceedings Article•DOI•

Bed-tree: an all-purpose index structure for string similarity search based on edit distance

[...]

Zhenjie Zhang¹, Marios Hadjieleftheriou², Beng Chin Ooi¹, Divesh Srivastava²•Institutions (2)

National University of Singapore¹, AT&T Labs²

06 Jun 2010

TL;DR: The Bed-tree is a complete solution that meets the requirements of all applications, providing high scalability and fast response time, and identifies the necessary properties of a mapping from the string space to the integer space for supporting searching and pruning for these queries.

...read moreread less

Abstract: Strings are ubiquitous in computer systems and hence string processing has attracted extensive research effort from computer scientists in diverse areas. One of the most important problems in string processing is to efficiently evaluate the similarity between two strings based on a specified similarity measure. String similarity search is a fundamental problem in information retrieval, database cleaning, biological sequence analysis, and more. While a large number of dissimilarity measures on strings have been proposed, edit distance is the most popular choice in a wide spectrum of applications. Existing indexing techniques for similarity search queries based on edit distance, e.g., approximate selection and join queries, rely mostly on n-gram signatures coupled with inverted list structures. These techniques are tailored for specific query types only, and their performance remains unsatisfactory especially in scenarios with strict memory constraints or frequent data updates. In this paper we propose the Bed-tree, a B+-tree based index structure for evaluating all types of similarity queries on edit distance and normalized edit distance. We identify the necessary properties of a mapping from the string space to the integer space for supporting searching and pruning for these queries. Three transformations are proposed that capture different aspects of information inherent in strings, enabling efficient pruning during the search process on the tree. Compared to state-of-the-art methods on string similarity search, the Bed-tree is a complete solution that meets the requirements of all applications, providing high scalability and fast response time.

...read moreread less

141 citations

Journal Article•DOI•

Topology of strings: median string is NP-complete

[...]

C. de la Higuera, Francisco Casacuberta

06 Dec 1999-Theoretical Computer Science

TL;DR: It is proved that computing the median string corresponds to a NP-complete decision problems, thus proving that this problem is NP-hard.

...read moreread less

137 citations

Proceedings Article•DOI•

Fast-join: An efficient method for fuzzy token matching based string similarity join

[...]

Jiannan Wang¹, Guoliang Li¹, Jianhua Fe¹•Institutions (1)

Tsinghua University¹

11 Apr 2011

TL;DR: This paper proposes a new similarity metrics, called “fuzzy token matching based similarity”, which extends token-based similarity functions by allowing fuzzy match between two tokens, and achieves high efficiency and result quality, and significantly outperforms state-of-the-art methods.

...read moreread less

Abstract: String similarity join that finds similar string pairs between two string sets is an essential operation in many applications, and has attracted significant attention recently in the database community. A significant challenge in similarity join is to implement an effective fuzzy match operation to find all similar string pairs which may not match exactly. In this paper, we propose a new similarity metrics, called “fuzzy token matching based similarity”, which extends token-based similarity functions (e.g., Jaccard similarity and Cosine similarity) by allowing fuzzy match between two tokens. We study the problem of similarity join using this new similarity metrics and present a signature-based method to address this problem. We propose new signature schemes and develop effective pruning techniques to improve the performance. Experimental results show that our approach achieves high efficiency and result quality, and significantly outperforms state-of-the-art methods.

...read moreread less

137 citations

Journal Article•DOI•

Bounds for the String Editing Problem

[...]

C. K. Wong¹, Ashok K. Chandra¹•Institutions (1)

IBM¹

01 Jan 1976-Journal of the ACM

TL;DR: It is shown that if the operations on symbols of the strings are restricted to tests of equality, then O(nm) operations are necessary (and sufficient) to compute the distance between two strings.

...read moreread less

Abstract: The string editing problem is to determine the distance between two strings as measured by the minimal cost sequence of deletions, insertions, and changes of symbols needed to transform one string into the other. The longest common subsequence problem can be viewed as a special case. Wagner and Fischer proposed an algorithm that runs in time O(nm), where n, m are the lengths of the two strings. In the present paper, it is shown that if the operations on symbols of the strings are restricted to tests of equality, then O(nm) operations are necessary (and sufficient) to compute the distance.

...read moreread less

137 citations

1
2
3
4
5
6
7
8
9
10
11
…
12
13
14
15
16
17
18
…
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

1,942

Papers

64,998

Citations

No. of papers in the topic in previous years
Year	Papers
2023	8
2022	30
2021	32
2020	30
2019	48
2018	39

Approximate string matching

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics