Home
/
Topics
/
Approximate string matching

Topic

Approximate string matching

About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1981
1980
1979
1978
1977
1976
1975
1974
1973

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Invariant handwritten chinese character recognition using fuzzy ring data

[...]

Din Chang Tseng, Hung Pin Chiu¹, Jen Chieh Cheng¹•Institutions (1)

National Central University¹

01 Oct 1996-Image and Vision Computing

TL;DR: An invariant handwritten Chinese character recognition system is proposed and fuzzy matching is verified through extensive experiments with the character set to show the performance of the proposed invariant features is clearly superior to that of moment invariants.

...read moreread less

11 citations

Journal Article•DOI•

Locating Maximal Multirepeats in Multiple Strings Under Various Constraints†A preliminary version of the results of this paper was presented in CPM 2002.

[...]

A. Bakalis¹, Costas S. Iliopoulos¹, Christos Makris², Spyros Sioutas³, Evangelos Theodoridis³, Athanasios K. Tsakalidis³, Kostas Tsichlas³ - Show less +3 more•Institutions (3)

King's College London¹, University of Patras², Research Academic Computer Technology Institute³

01 Mar 2007-The Computer Journal

TL;DR: Two different versions of the problem of finding maximal multirepeats in a set of strings are presented, in the case of arbitrary gaps and when the gap is bounded in a small range c.

...read moreread less

Abstract: A multirepeat in a string is a substring (factor) that appears a predefined number of times. A multirepeat is maximal if it cannot be extended either to the right or to the left and produce a multirepeat. In this paper, we present algorithms for two different versions of the problem of finding maximal multirepeats in a set of strings. In the case of arbitrary gaps, we propose an algorithm with O(σN2n + α) time complexity. When the gap is bounded in a small range c, we propose an algorithm with O((c2 + σ2)mN2nÂ log(Nn) + α) time complexity. Here, N is the number of strings, n the mean length of each string, m the multiplicity of the multirepeat and α the number of reported occurrences. Our results extend previous work by considering sets of strings as well as by generalizing pairs to multirepeats.

...read moreread less

11 citations

Journal Article•

Bit-parallel approximate string matching algorithms with transposition

[...]

Heikki Hyyrö¹•Institutions (1)

University of Tampere¹

01 Jan 2003-Lecture Notes in Computer Science

TL;DR: In this article, a uniform way of modifying each of these algorithms to permit also a fourth type of edit operation, transposing two adjacent characters in the pattern, is discussed, which is also known as the Damerau edit distance.

...read moreread less

Abstract: Using bit-parallelism has resulted in fast and practical algorithms for approximate string matching under the Levenshtein edit distance, which permits a single edit operation to insert, delete or substitute a character. Depending on the parameters of the search, currently the fastest non-filtering algorithms in practice are the O(kn[m/ω]) algorithm of Wu & Manber, the O([km/ω]n) algorithm of Baeza-Yates & Navarro, and the O([m/ω]n) algorithm of Myers, where m is the pattern length, n is the text length, k is the error threshold and w is the computer word size. In this paper we discuss a uniform way of modifying each of these algorithms to permit also a fourth type of edit operation: transposing two adjacent characters in the pattern. This type of edit distance is also known as the Damerau edit distance. In the end we also present an experimental comparison of the resulting algorithms.

...read moreread less

11 citations

Proceedings Article•

Implementation and performance evaluation of fuzzy file block matching

[...]

Bo Han¹, Peter J. Keleher¹•Institutions (1)

University of Maryland, College Park¹

17 Jun 2007

TL;DR: It is shown that fuzzy matching can recover new versions of GNU Emacs source from older versions, and can improve the performance of underlying distributed file storage systems by potentially saving significant network bandwidth and reducing file transmission costs.

...read moreread less

Abstract: The fuzzy file block matching technique (fuzzy matching for short), was first proposed for opportunistic use of Content Addressable Storage. Fuzzy matching aims to increase the hit ratio in the content-addressable storage providers, and thus can improve the performance of underlying distributed file storage systems by potentially saving significant network bandwidth and reducing file transmission costs. Fuzzy matching employs shingling to represent the fuzzy hashing of file blocks for similarity detection, and error-correcting information to reconstruct the canonical content of a file block from some similar blocks. In this paper, we present the implementation details of fuzzy matching and a very basic evaluation of its performance. In particular, we show that fuzzy matching can recover new versions of GNU Emacs source from older versions.

...read moreread less

11 citations

Book Chapter•DOI•

Nested Counters in Bit-Parallel String Matching

[...]

Kimmo Fredriksson¹, Szymon Grabowski²•Institutions (2)

University of Eastern Finland¹, University of Łódź²

31 Mar 2009

TL;DR: This work presents several non-trivial applications of Matryoshka counters in string matching algorithms, improving their worst- or average-case time complexities.

...read moreread less

Abstract: Many algorithms, e.g. in the field of string matching, are based on handling many counters, which can be performed in parallel, even on a sequential machine, using bit-parallelism. The recently presented technique of nested counters (Matryoshka counters ) [1] is to handle small counters most of the time, and refer to larger counters periodically, when the small counters may get full, to prevent overflow. In this work, we present several non-trivial applications of Matryoshka counters in string matching algorithms, improving their worst- or average-case time complexities. The set of problems comprises (Δ ,α )-matching, matching with k insertions, episode matching, and matching under Levenshtein distance.

...read moreread less

11 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
…
137
138
139
140
141
142
143
…
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

1,942

Papers

64,998

Citations

No. of papers in the topic in previous years
Year	Papers
2023	8
2022	30
2021	32
2020	30
2019	48
2018	39

Approximate string matching

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics