Home
/
Topics
/
Approximate string matching

Topic

Approximate string matching

About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1981
1980
1979
1978
1977
1976
1975
1974
1973

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

A factor-searching-based multiple string matching algorithm for intrusion detection

[...]

Yanbing Liu, Qingyun Liu, Ping Liu, Jianlong Tan, Li Guo - Show less +1 more

10 Jun 2014

TL;DR: A space-efficient multiple string matching algorithm BVM, which makes use of bit-vector and succinct hash table to replace the automata used in factor-searching-based algorithms.

...read moreread less

Abstract: Multiple string matching plays a fundamental role in network intrusion detection systems. Automata-based multiple string matching algorithms like AC, SBDM and SBOM are widely used in practice, but the huge memory usage of automata prevents them from being applied to a large-scale pattern set. Meanwhile, poor cache locality of huge automata degrades the matching speed of algorithms. Here we propose a space-efficient multiple string matching algorithm BVM, which makes use of bit-vector and succinct hash table to replace the automata used in factor-searching-based algorithms. Space complexity of the proposed algorithm is O(rm 2 + Σ pϵP |p|), that is more space-efficient than the classic automata-based algorithms. Experiments on datasets including Snort, ClamAV, URL blacklist and synthetic rules show that the proposed algorithm significantly reduces memory usage and still runs at a fast matching speed. Above all, BVM costs less than 0.75% of the memory usage of AC, and is capable of matching millions of patterns efficiently.

...read moreread less

6 citations

Journal Article•DOI•

Approximate String Matching for Searching DNA Sequences

[...]

Jolanta Kawulok

01 Jan 2013-International Journal of Bioscience, Biochemistry and Bioinformatics

TL;DR: Experimental results indicate that the algorithm is highly effective and it outperforms a popular Basic Local Alignment Search Tool (BLAST) in case of searching for short sequences.

...read moreread less

Abstract: —This paper presents a new algorithm for searching short fragments of sequences in long DNA sequences. A short sequence (pattern) is searched in both DNA strands with a given maximal value of errors. Each DNA sequence (T) is preprocessed by compressing it using Burrows-Wheeler transform and wavelet tree. First, the pattern is divided into short words which overlap themselves, and then their positions in T are determined using FM-index. Connections between the words are searched under the assumption of an acceptable maximal error allowed. Experimental results indicate that the algorithm is highly effective and it outperforms a popular Basic Local Alignment Search Tool (BLAST) in case of searching for short sequences.

...read moreread less

6 citations

A Fast Algorithm for the Inexact Characteristic String Problem

[...]

Moritz G. Maass

01 Jan 2003

TL;DR: An improved algorithm to solve the Inexact Characteristic String Problem using Hamming distance instead of Levenshtein distance as a measure is presented being simpler and faster in practice by a constant factor than the previous algorithm.

...read moreread less

Abstract: We present a new algorithm to solve the Inexact Characteristic String Problem (ICSP) using Hamming distance instead of Levenshtein distance as a measure. We embed our new algorithm and the previously known algorithm for Levenshtein distance in a common framework which reveals an additional improvement to the Levenshtein distance algorithm. The ICSP can thus be solved in time O(||T||+l*||S-T||) for Hamming distance and in time O(||T|| + k*l*||S-T||) for Levenshtein distance, where S is a set of strings, T is a non-empty subset of S (the target set), and l is the length of a shortest string in T. The ICSP has applications in probe and primer design. Both algorithms need to solve the Common Substring Problem for more than two strings. We present an improved algorithm for this problem being simpler and faster in practice by a constant factor than the previous algorithm.

...read moreread less

6 citations

Patent•

Matching target strings to known strings

[...]

Enyuan Wu¹•Institutions (1)

Microsoft¹

09 Sep 2011

TL;DR: In this article, a target string is broken into one or more target terms, and the target terms are matched to known terms in an index tree, where the terms in the index tree are associated with known string IDs.

...read moreread less

Abstract: One or more techniques and/or systems are disclosed for matching a target string to a known string. A target string is broken into one or more target terms, and the one or more target terms are matched to known terms in an index tree. The index tree comprises one or more known terms from a plurality of known strings, where the respective known terms in the index tree are associated with one or more known string IDs. A known term that is associated with a known string ID (in the index tree, and to which a target term is matched), is comprised in a known string, which corresponds to the known string ID. The target string can be matched to the known string using the known string's corresponding known string ID that is associated with a desired number of occurrences in the matching of the one or more target terms.

...read moreread less

6 citations

Book Chapter•DOI•

Metric Indexes for Approximate String Matching in a Dictionary

[...]

Kimmo Fredriksson¹•Institutions (1)

University of Eastern Finland¹

05 Oct 2004

TL;DR: This work considers the problem of finding all approximate occurrences of a given string q, with at most k differences, in a finite database or dictionary of strings, and considers the “triangular inequality”, the most important property in this case.

...read moreread less

Abstract: We consider the problem of finding all approximate occurrences of a given string q, with at most k differences, in a finite database or dictionary of strings. The strings can be e.g. natural language words, such as the vocabulary of some document or set of documents. This has many important application in both off-line (indexed) and on-line string matching. More precisely, we have a universe \({\mathbb U}\) of strings, and a non-negative distance function \(d: {\mathbb U} \times {\mathbb U} \rightarrow {\mathbb N}\). The distance function is metric, if it satisfies (i) \(d(x,y) = 0 ~ \Leftrightarrow ~ x = y\); (ii) d(x,y) = d(y,x); (iii) d(x,y) ≤ d(x,z) + d(z,y). The last item is called the “triangular inequality”, and is the most important property in our case. Many useful distance functions are known to be metric, in particular edit (Levenshtein) distance is metric, which we will use for d.

...read moreread less

6 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
…
191
192
193
194
195
196
197
…
198
199
200

Collapse

Network Information

Performance

Metrics

1,942

Papers

64,998

Citations

No. of papers in the topic in previous years
Year	Papers
2023	8
2022	30
2021	32
2020	30
2019	48
2018	39

Approximate string matching

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics