Home
/
Topics
/
Approximate string matching

Topic

Approximate string matching

About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1981
1980
1979
1978
1977
1976
1975
1974
1973

Papers

PDF

Open Access

More filters

Journal Article•DOI•

A string matching computer-assisted system for dolphin photoidentification.

[...]

Babak Nadjar Araabi¹, Nasser Kehtarnavaz¹, T. McKinney¹, Gilbert R. Hillman², Bernd Würsig¹ - Show less +1 more•Institutions (2)

Texas A&M University¹, University of Texas Medical Branch²

01 Jan 2000-Annals of Biomedical Engineering

TL;DR: The developed computer-assisted system can help marine mammalogists in their identification of dolphins, since it allows them to examine only a handful of candidate images instead of the currently used manual searching of the entire database.

...read moreread less

Abstract: This paper presents a syntactic/semantic string representation scheme as well as a string matching method as part of a computer-assisted system to identify dolphins from photographs of their dorsal fins. A low-level string representation is constructed from the curvature function of a dolphin's fin trailing edge, consisting of positive and negative curvature primitives. A high-level string representation is then built over the low-level string via merging appropriate groupings of primitives in order to have a less sensitive representation to curvature fluctuations or noise. A family of syntactic/semantic distance measures between two strings is introduced. A composite distance measure is then defined and used as a dissimilarity measure for database search, highlighting both the syntax (structure or sequence) and semantic (attribute or feature) differences. The syntax consists of an ordered sequence of significant protrusions and intrusions on the edge, while the semantics consist of seven attributes extracted from the edge and its curvature function. The matching results are reported for a database of 624 images corresponding to 164 individual dolphins. The identification results indicate that the developed string matching method performs better than the previous matching methods including dorsal ratio, curvature, and curve matching. The developed computer-assisted system can help marine mammalogists in their identification of dolphins, since it allows them to examine only a handful of candidate images instead of the currently used manual searching of the entire database. © 2000 Biomedical Engineering Society. PAC00: 8780Tq, 4230Sy, 0705Pj

...read moreread less

55 citations

Journal Article•DOI•

A Randomized Algorithm for Approximate String Matching

[...]

Mikhail J. Atallah¹, Frédéric Chyzak², Philippe Dumas²•Institutions (2)

Purdue University¹, French Institute for Research in Computer Science and Automation²

01 Mar 2001-Algorithmica

TL;DR: A randomized algorithm in deterministic time O(Nlog M) for estimating the score vector of matches between a text string of length N and a patternstring of length M, i.e., the vector obtained when the pattern is slid along the text, and the number of matches is counted for each position.

...read moreread less

Abstract: We give a randomized algorithm in deterministic time O(Nlog M) for estimating the score vector of matches between a text string of length N and a pattern string of length M , i.e., the vector obtained when the pattern is slid along the text, and the number of matches is counted for each position. A direct application is approximate string matching. The randomized algorithm uses convolution to find an estimator of the scores; the variance of the estimator is particularly small for scores that are close to M , i.e., for approximate occurrences of the pattern in the text. No assumption is made about the probabilistic characteristics of the input, or about the size of the alphabet. The solution extends to string matching with classes, class complements, ``never match'' and ``always match'' symbols, to the weighted case and to higher dimensions.

...read moreread less

55 citations

Book Chapter•DOI•

Approximate string matching with arbitrary costs for text and hypertext

[...]

Udi Manber, Sun Wu

01 Feb 1993

55 citations

Book Chapter•DOI•

A Metric Index for Approximate String Matching

[...]

Edgar Chávez¹, Gonzalo Navarro²•Institutions (2)

Universidad Michoacana de San Nicolás de Hidalgo¹, University of Chile²

03 Apr 2002

TL;DR: A radically new indexing approach for approximate string matching where the sites are the nodes of the suffix tree of the text, and the approximate query is seen as a proximity query on that metric space.

...read moreread less

Abstract: We present a radically new indexing approach for approximate string matching. The scheme uses the metric properties of the edit distance and can be applied to any other metric between strings. We build a metric space where the sites are the nodes of the suffix tree of the text, and the approximate query is seen as a proximity query on that metric space. This permits us finding the R occurrences of a pattern of length m in a text of length n in average time O(mlog2 n+m2+R), using O(n log n) space and O(n log2 n) index construction time. This complexity improves by far over all other previous methods. We also show a simpler scheme needing O(n) space.

...read moreread less

55 citations

Journal Article•DOI•

On Approximate Jumbled Pattern Matching in Strings

[...]

Péter Burcsi¹, Ferdinando Cicalese², Gabriele Fici³, Zsuzsanna Lipták⁴•Institutions (4)

Eötvös Loránd University¹, University of Salerno², University of Nice Sophia Antipolis³, Bielefeld University⁴

01 Jan 2012

TL;DR: This work presents an algorithm which solves the decision version of the Approximate Jumbled Pattern Matching problem in constant time, by indexing the string in subquadratic time.

...read moreread less

Abstract: Given a string s, the Parikh vector of s, denoted p(s), counts the multiplicity of each character in s. Searching for a match of a Parikh vector q in the text s requires finding a substring t of s with p(t)=q. This can be viewed as the task of finding a jumbled (permuted) version of a query pattern, hence the term Jumbled Pattern Matching. We present several algorithms for the approximate version of the problem: Given a string s and two Parikh vectors u,v (the query bounds), find all maximal occurrences in s of some Parikh vector q such that u≤q≤v. This definition encompasses several natural versions of approximate Parikh vector search. We present an algorithm solving this problem in sub-linear expected time using a wavelet tree of s, which can be computed in time O(n) in a preprocessing phase. We then discuss a Scrabble-like variation of the problem, in which a weight function on the letters of s is given and one has to find all occurrences in s of a substring t with maximum weight having Parikh vector p(t)≤v. For the case of a binary alphabet, we present an algorithm which solves the decision version of the Approximate Jumbled Pattern Matching problem in constant time, by indexing the string in subquadratic time.

...read moreread less

55 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
…
40
41
42
43
44
45
46
…
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

1,942

Papers

64,998

Citations

No. of papers in the topic in previous years
Year	Papers
2023	8
2022	30
2021	32
2020	30
2019	48
2018	39

Approximate string matching

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics