Home
/
Topics
/
Approximate string matching

Topic

Approximate string matching

About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1981
1980
1979
1978
1977
1976
1975
1974
1973

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Fast Algorithms for Approximate Circular String Matching

[...]

Carl Barton¹, Costas S. Iliopoulos², Costas S. Iliopoulos¹, Costas S. Iliopoulos³, Solon P. Pissis¹ - Show less +1 more•Institutions (3)

King's College London¹, University of Western Australia², Curtin University³

22 Mar 2014-Algorithms for Molecular Biology

TL;DR: A suboptimal average-case algorithm for exact circular string matching requiring time O(n) requiring time k=O(m/logm) for moderate values of k, and how the same results can be easily obtained under the edit distance model.

...read moreread less

Abstract: Background Circular string matching is a problem which naturally arises in many biological contexts. It consists in finding all occurrences of the rotations of a pattern of length m in a text of length n. There exist optimal average-case algorithms for exact circular string matching. Approximate circular string matching is a rather undeveloped area.

...read moreread less

34 citations

Journal Article•DOI•

Increased bit-parallelism for approximate and multiple string matching

[...]

Heikki Hyyrö¹, Kimmo Fredriksson², Gonzalo Navarro³•Institutions (3)

University of Tampere¹, University of Eastern Finland², University of Chile³

31 Dec 2005-ACM Journal of Experimental Algorithms

TL;DR: This paper shows how multiple patterns can be packed into a single computer word so as to search for all them simultaneously, and how the ideas can be applied to other problems such as multiple exact string matching and one-against-all computation of edit distance and longest common subsequences.

...read moreread less

Abstract: Bit-parallelism permits executing several operations simultaneously over a set of bits or numbers stored in a single computer word. This technique permits searching for the approximate occurrences of a pattern of length m in a text of length n in time O(⌈m/w⌉n), where w is the number of bits in the computer word. Although this is asymptotically the optimal bit-parallel speedup over the basic O(mn) time algorithm, it wastes bit-parallelism's power in the common case where m is much smaller than w, since w−m bits in the computer words are unused. In this paper, we explore different ways to increase the bit-parallelism when the search pattern is short. First, we show how multiple patterns can be packed into a single computer word so as to search for all them simultaneously. Instead of spending O(rn) time to search for r patterns of length m≤w/2, we need O(⌈rm/w⌉n) time. Second, we show how the mechanism permits boosting the search for a single pattern of length m≤w/2, which can be searched for in O(⌈n/⌊w/m⌋⌉) bit-parallel steps instead of O(n). Third, we show how to extend these algorithms so that the time bounds essentially depend on k instead of m, where k is the maximum number of differences permitted. Finally, we show how the ideas can be applied to other problems such as multiple exact string matching and one-against-all computation of edit distance and longest common subsequences. Our experimental results show that the new algorithms work well in practice, obtaining significant speedups over the best existing alternatives, especially on short patterns and moderate number of differences allowed. This work fills an important gap in the field, where little work has focused on very short patterns.

...read moreread less

34 citations

Proceedings Article•

A Very Fast String Matching Algorithm for Small Alphabeths and Long Patterns (Extended Abstract)

[...]

Christian Charras, Thierry Lecroq, Joseph Daniel Pehoushek

20 Jul 1998

TL;DR: In this paper, a small amount of germanium or gallium was added to the ferrite and an atmosphere, such as air, was used during the sintering and cooling steps.

...read moreread less

Abstract: Desirable properties of manganese zinc ferrites are obtained without the need for controlling or changing the oxygen partial pressure during the sintering and cooling steps by adding a small amount of germanium or gallium to the ferrite and using an atmosphere, such as air, during the sintering and cooling steps, that has at least 1 percent oxygen by volume.

...read moreread less

34 citations

Proceedings Article•

Faster filters for approximate string matching

[...]

Juha Kärkkäinen¹, Joong Chae Na¹•Institutions (1)

University of Helsinki¹

06 Jan 2007

TL;DR: This work introduces a new filtering method for approximate string matching called the suffix filter, which has some similarity with well-known filtration algorithms, which it is demonstrated experimentally that suffix filters are faster in practice than factor filters.

...read moreread less

Abstract: We introduce a new filtering method for approximate string matching called the suffix filter. It has some similarity with well-known filtration algorithms, which we call factor filters, and which are among the best practical algorithms for approximate string matching using a text index. Suffix filters are stronger, i.e., produce fewer false matches than factor filters. We demonstrate experimentally that suffix filters are faster in practice, too.

...read moreread less

33 citations

Proceedings Article•

A closer look at the closest string and closest substring problem

[...]

Markus Chimani¹, Matthias Woste¹, Sebastian Böcker¹•Institutions (1)

University of Jena¹

22 Jan 2011

TL;DR: For the CSSP, a new formulation is given that is polytope-wise stronger than a straightforward extension of the CSP formulation and a strengthening constraint class is proposed that speeds up the running time.

...read moreread less

Abstract: Let S be a set of k strings over an alphabet Σ each string has a length between e and n. The Closest Substring Problem (CSSP) is to find a minimal integer d (and a corresponding string t of length e) such that each string s ∈ S has a substring of length e with Hamming distance at most d to t. We say t is the closest substring to S. For e = n, this problem is known as the Closest String Problem (CSP). Particularly in computational biology, the CSP and CSSP have found numerous practical applications such as identifying regulatory motifs and approximate gene clusters, and in degenerate primer design. We study ILP formulations for both problems. Our experiments show that a position-based formulation for the CSP performs very well on real-world instances emerging from biology. Even on randomly generated instances that are hard to solve to optimality, solving the root relaxation leads to solutions very close to the optimum. For the CSSP we give a new formulation that is polytope-wise stronger than a straightforward extension of the CSP formulation. Furthermore we propose a strengthening constraint class that speeds up the running time.

...read moreread less

33 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
…
66
67
68
69
70
71
72
…
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

1,942

Papers

64,998

Citations

No. of papers in the topic in previous years
Year	Papers
2023	8
2022	30
2021	32
2020	30
2019	48
2018	39

Approximate string matching

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics