Home
/
Topics
/
Approximate string matching

Topic

Approximate string matching

About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1981
1980
1979
1978
1977
1976
1975
1974
1973

Papers

PDF

Open Access

More filters

Proceedings Article•

Approximate String Matching for Geographic Names and Personal Names

[...]

Clodoveu A. Davis, Emerson de Salles

01 Jan 2007

TL;DR: A novel method for approximate string matching, developed for the recognition of geographic and personal names, deals with abbreviations, name inversions, stopwords, and omission of parts.

...read moreread less

Abstract: The problem of matching strings allowing errors has recently gained importance, considering the increasing volume of online textual data. In geo- technologies, approximate string matching algorithms find many applications, such as gazetteers, address matching, and geographic information retrieval. This paper presents a novel method for approximate string matching, devel- oped for the recognition of geographic and personal names. The method deals with abbreviations, name inversions, stopwords, and omission of parts. Three similarity measures and a method to match individual words considering ac- cent marks and other multilingual aspects were developed. Test results show high precision-recall rates and good overall matching efficiency.

...read moreread less

14 citations

Proceedings Article•

Fast Approximate String Matching with Suffix Arrays and A* Parsing

[...]

Philipp Koehn¹, Jean Senellart•Institutions (1)

University of Edinburgh¹

01 Jan 2010

TL;DR: This article used suffix arrays to detect exact n-gram matches, A* search heuristics to discard matches and A* parsing to validate candidate segments, which outperforms the canonical baseline by a factor of 100, with average lookup times of 4.3-247ms for a segment in a realistic scenario.

...read moreread less

Abstract: We present a novel exact solution to the approximate string matching problem in the context of translation memories, where a text segment has to be matched against a large corpus, while allowing for errors. We use suffix arrays to detect exact n-gram matches, A* search heuristics to discard matches and A* parsing to validate candidate segments. The method outperforms the canonical baseline by a factor of 100, with average lookup times of 4.3–247ms for a segment in a realistic scenario.

...read moreread less

14 citations

Book Chapter•DOI•

On musical performances identification, entropy and string matching

[...]

Antonio Camarena-Ibarrola¹, Edgar Chávez¹•Institutions (1)

Universidad Michoacana de San Nicolás de Hidalgo¹

13 Nov 2006

TL;DR: An entropy based Audio-Fingerprint delivering a framed, small footprint AFP is used which reduces the problem to a string matching problem and is able to correctly identify different renditions of masterpieces as well as pop music in less than a second per comparison.

...read moreread less

Abstract: In this paper we address the problem of matching musical renditions of the same piece of music also known as performances. We use an entropy based Audio-Fingerprint delivering a framed, small footprint AFP which reduces the problem to a string matching problem. The Entropy AFP has very low resolution (750 ms per symbol), making it suitable for flexible string matching. We show experimental results using dynamic time warping (DTW), Levenshtein or edit distance and the Longest Common Subsequence (LCS) distance. We are able to correctly (100%) identify different renditions of masterpieces as well as pop music in less than a second per comparison. The three approaches are 100% effective, but LCS and Levenshtein can be computed online, making them suitable for monitoring applications (unlike DTW), and since they are distances a metric index could be use to speed up the recognition process.

...read moreread less

14 citations

Proceedings Article•DOI•

A windowed weighted approach for approximate cyclic string matching

[...]

Ramón Alberto Mollineda¹, Enrique Vidal¹, Francisco Casacuberta¹•Institutions (1)

Polytechnic University of Valencia¹

01 Aug 2002

TL;DR: A method for measuring dissimilarities between cyclic strings is introduced, which computes a weighted mean between two (lower and upper) bounds of the exact cyclic edit distance, which are founded on a window-constrained edit graph related to the strings involved.

...read moreread less

Abstract: A method for measuring dissimilarities between cyclic strings is introduced. It computes a weighted mean between two (lower and upper) bounds of the exact cyclic edit distance, which are founded on a window-constrained edit graph related to the strings involved. Weights are the ones which minimize the sum of squared relative errors of the weighted solution with respect to exact values, on a training set of string pairs. This method takes O(n/sup 2/) time. Experiments on both artificial and real data, show the highly accurate solutions achieved by this technique, which is clearly faster than the most efficient exact algorithms.

...read moreread less

14 citations

Patent•

Modified levenshtein distance algorithm for coding

[...]

Kurt P. Kopchik, Oren I. Oxman¹, Timothy O. Withum²•Institutions (2)

Leidos¹, Lockheed Martin Corporation²

23 Jan 2007

TL;DR: In this article, the Levenshtein Distance Algorithm (LDA) is augmented with additional information in the form of adjustments based on particular character substitutions, insertions and deletions together with weighting based on multiple alternatives for the OCR text string.

...read moreread less

Abstract: Methods and systems of mapping of an optical character recognition (OCR) text string to a code included in a coding dictionary by supplementing the Levenshtein Distance Algorithm (LDA) with additional information in the form of adjustments based on particular character substitutions, insertions and deletions together with weighting based on multiple alternatives for the OCR text string. In one embodiment, an OCR text string mapping method (100) includes receiving (110) an OCR text string, comparing (120) it with selected text strings from a coding dictionary, computing (130) modified Levenshtein distances associated with the comparisons by determining (140) substitution penalties, determining (150) insertion penalties, determining (160) deletion penalties and combining (170) the penalties, selecting (180) the best matching text string from the coding dictionary based on the modified Levenshtein distances, determining (190) whether a maximum threshold distance is met, and assigning (200) a code associated with the best matching text string to the OCR text string when met, and assigning (210) a null or no code when not met.

...read moreread less

14 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
…
120
121
122
123
124
125
126
…
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

1,942

Papers

64,998

Citations

No. of papers in the topic in previous years
Year	Papers
2023	8
2022	30
2021	32
2020	30
2019	48
2018	39

Approximate string matching

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics