Home
/
Topics
/
Approximate string matching

Topic

Approximate string matching

About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1981
1980
1979
1978
1977
1976
1975
1974
1973

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Incorporating string transformations in record matching

[...]

Arvind Arasu¹, Surajit Chaudhuri¹, Kris Ganjam¹, Raghav Kaushik¹•Institutions (1)

Microsoft¹

09 Jun 2008

TL;DR: This work expands the problem of record matching to take such user-defined string transformations as input, and demonstrates an improvement in record matching quality and efficient retrieval based on the index structure that is cognizant of transformations.

...read moreread less

Abstract: Today's record matching infrastructure does not allow a flexible way to account for synonyms such as "Robert" and "Bob" which refer to the same name, and more general forms of string transformations such as abbreviations. We expand the problem of record matching to take such user-defined string transformations as input. These transformations coupled with an underlying similarity function are used to define the similarity between two strings. We demonstrate the effectiveness of this approach via a fuzzy match operation that is used to lookup an input record against a table of records, where we have an additional table of transformations as input. We demonstrate an improvement in record matching quality and efficient retrieval based on our index structure that is cognizant of transformations.

...read moreread less

16 citations

Patent•

Method of comparing version strings

[...]

David Youd¹•Institutions (1)

University of California¹

01 Aug 2002

TL;DR: In this article, a method of comparing version strings in a computing environment for use in version-specific computing tasks is presented, where each of a first and a second version string at each one of a set of predetermined delimiters to produce respective first and second sets of sequentially ordered string chunks.

...read moreread less

Abstract: A method of comparing version strings in a computing environment for use in version-specific computing tasks. In one embodiment, the method divides each of a first and a second version string at each one of a set of predetermined delimiters to produce respective first and second sets of sequentially ordered string chunks. Next, string chunks of the same order from the first and second chunk sets are iteratively compared to determine matching of same-order string chunks, with the comparison continuing until a non-matching same-order string chunk pair is encountered. From the matching/non-matching comparisons, a determination may be made whether a specified quality relationship exists between the first and second version strings, where the quality relationship determines the propriety of a version-specific computing task.

...read moreread less

16 citations

Journal Article•DOI•

Efficient Algorithms for Approximate String Matching with Swaps

[...]

Dong Kyue Kim¹, Jee-Soo Lee², Kunsoo Park¹, Yookun Cho¹•Institutions (2)

Seoul National University¹, Korea National Open University²

01 Mar 1999-Journal of Complexity

TL;DR: This paper includes the swapoperation that interchanges two adjacent characters into the set of allowable edit operations, and presents anO(tmin(m,n))-time algorithm for the extended edit distance problem, where tmin represents the edit distance between the given strings, and n represents the extendedk-differences problem.

...read moreread less

16 citations

Patent•

Deflate compression algorithm

[...]

Yingquan Wu

09 May 2014

TL;DR: In this paper, a compression algorithm replaces duplicative strings with a copy pair indicating a location and length of a preceding identical string that is within a window from the duplicative string.

...read moreread less

Abstract: A compression algorithm replaces duplicative strings with a copy pair indicating a location and length of a preceding identical string that is within a window from the duplicative string. Rather than a replacing a longest matching string within a window from a given point with a copy pair, the longest matching string may be used provide it is at least two bytes larger than the next longest matching string or is at a distance that is less than some multiple of a distance to the next longest matching string. In another aspect, the length of the window in which a matching string may be found is dependent on a length of the matching string. In yet another aspect, rather than labeling each literal and copy pair to indicate what it is, strings of non-duplicative literals are represented by a label and a length of the string.

...read moreread less

16 citations

Proceedings Article•DOI•

DeezyMatch: A Flexible Deep Learning Approach to Fuzzy String Matching

[...]

Kasra Hosseini¹, Federico Nanni², Mariona Coll Ardanuy²•Institutions (2)

University of Oxford¹, The Turing Institute²

01 Oct 2020

TL;DR: DeezyMatch is presented, a free, open-source software library written in Python for fuzzy string matching and candidate ranking that supports various deep neural network architectures for training new classifiers and for fine-tuning a pretrained model, which paves the way for transfer learning in fuzzystring matching.

...read moreread less

Abstract: We present DeezyMatch, a free, open-source software library written in Python for fuzzy string matching and candidate ranking. Its pair classifier supports various deep neural network architectures for training new classifiers and for fine-tuning a pretrained model, which paves the way for transfer learning in fuzzy string matching. This approach is especially useful where only limited training examples are available. The learned DeezyMatch models can be used to generate rich vector representations from string inputs. The candidate ranker component in DeezyMatch uses these vector representations to find, for a given query, the best matching candidates in a knowledge base. It uses an adaptive searching algorithm applicable to large knowledge bases and query sets. We describe DeezyMatch’s functionality, design and implementation, accompanied by a use case in toponym matching and candidate ranking in realistic noisy datasets.

...read moreread less

16 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
…
111
112
113
114
115
116
117
…
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

1,942

Papers

64,998

Citations

No. of papers in the topic in previous years
Year	Papers
2023	8
2022	30
2021	32
2020	30
2019	48
2018	39

Approximate string matching

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics