Home
/
Topics
/
Approximate string matching

Topic

Approximate string matching

About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1981
1980
1979
1978
1977
1976
1975
1974
1973

Papers

PDF

Open Access

More filters

Posted Content•

JAROWINKLER: Stata module to calculate the Jaro-Winkler distance between strings

[...]

James Feigenbaum

13 Oct 2016-Research Papers in Economics

TL;DR: This work calculates the distance between two string variables using the Jaro-Winkler distance metric, used in record linkage to compare first or last names in different sources.

...read moreread less

Abstract: jarowinkler calculates the distance between two string variables using the Jaro-Winkler distance metric. The distance metric is often used in record linkage to compare first or last names in different sources.

...read moreread less

8 citations

Journal Article•DOI•

Bit-Parallel Approximate Matching of Circular Strings with k Mismatches

[...]

Tommi Hirvola¹, Jorma Tarhio¹•Institutions (1)

Aalto University¹

18 Sep 2017-ACM Journal of Experimental Algorithms

TL;DR: This work derives a sublinear-time algorithm for searching a noncircular pattern with k allowed mismatches, which is extended to the problem of approximate circular pattern matching with k mismatches and is the first nonfiltering method for approximate circular string matching in sublinear average time.

...read moreread less

Abstract: We consider approximate string matching of a circular pattern consisting of the rotations of a pattern of length m. From SBNDM and Tuned Shift-Add, we derive a sublinear-time algorithm for searching a noncircular pattern with k allowed mismatches, which is extended to the problem of approximate circular pattern matching with k mismatches. We prove that the presented algorithms are average-optimal for m⋅⌈log2(k+1)+1 ⌉ = O(w), where w is the size of the computer word in bits. Experiments conducted under the aforementioned condition show that the new k-mismatches algorithm for circular strings outperforms previous solutions in practice. In particular, our algorithm is the first nonfiltering method for approximate circular string matching in sublinear average time, which makes it more suitable than earlier filtering methods for high error levels k/m and small alphabets.

...read moreread less

8 citations

Proceedings Article•

A Parallel Algorithm for Approximate String Matching.

[...]

Kathleen Kaplan, Legand Burge, Moses Garuba

01 Jan 2003

8 citations

Book Chapter•DOI•

Efficient computations of l 1 and l ∞ rearrangement distances

[...]

Amihood Amir¹, Yonatan Aumann², Piotr Indyk³, Avivit Levy², Ely Porat² - Show less +1 more•Institutions (3)

Johns Hopkins University¹, Bar-Ilan University², Massachusetts Institute of Technology³

29 Oct 2007

TL;DR: In this article, a new pattern matching paradigm was proposed, pattern matching with address errors, where the pattern is transformed through a sequence of rearrangement operations, each with an associated cost.

...read moreread less

Abstract: Recently, a new pattern matching paradigm was proposed, pattern matching with address errors. In this paradigm approximate string matching problems are studied, where the content is unaltered and only the locations of the different entries may change. Specifically, a broad class of problems in this new paradigm was defined - the class of rearrangement errors. In this type of errors the pattern is transformed through a sequence of rearrangement operations, each with an associated cost. The natural l1 and l2 rearrangement systems were considered. A variant of the l1-rearrangement distance problem seems more difficult - where the pattern is a general string that may have repeating symbols. The best algorithm presented for the general case is O(nm). In this paper, we show that even for general strings the problem can be approximated in linear time! This paper also considers another natural rearrangement system - the l∞ rearrangement distance. For this new rearrangement system we provide efficient exact solutions for different variants of the problem, as well as a faster approximation.

...read moreread less

8 citations

Book Chapter•DOI•

Average Optimal String Matching in Packed Strings

[...]

Djamal Belazzougui¹, Mathieu Raffinot²•Institutions (2)

University of Helsinki¹, Paris Diderot University²

22 May 2013

TL;DR: This paper shows a slightly improved worst-case efficient multiple pattern matching algorithm, and a data structure that requires O(m) words of space and can be compressed to only use O(mlogσ) bits of space while achieving query time O(n(log σ m) e /y), and shows two other direct applications.

...read moreread less

Abstract: In this paper we are concerned with the basic problem of string pattern matching: preprocess one or multiple fixed strings over alphabet σ so as to be able to efficiently search for all occurrences of the string(s) in a given text T of length n. In our model, we assume that text and patterns are tightly packed so that any single character occupies logσ bits and thus any sequence of k consecutive characters in the text or the pattern occupies exactly klogσ bits. We first show a data structure that requires O(m) words of space (more precisely O(mlogm) bits of space) where m is the total size of the patterns and answers to search queries in average-optimal O(n/y) time where y is the length of the shortest pattern (y = m in case of a single pattern). This first data structure, while optimal in time, still requires O(mlogm) bits of space, which might be too much considering that the patterns occupy only mlogσ bits of space. We then show that our data structure can be compressed to only use O(mlogσ) bits of space while achieving query time O(n(log σ m) e /y), with e any constant such that 0 < e < 1. We finally show two other direct applications: average optimal pattern matching with worst-case guarantees and average optimal pattern matching with k differences. In the meantime we also show a slightly improved worst-case efficient multiple pattern matching algorithm.

...read moreread less

8 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
…
166
167
168
169
170
171
172
…
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

1,942

Papers

64,998

Citations

No. of papers in the topic in previous years
Year	Papers
2023	8
2022	30
2021	32
2020	30
2019	48
2018	39

Approximate string matching

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics