Home
/
Topics
/
Approximate string matching

Topic

Approximate string matching

About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1981
1980
1979
1978
1977
1976
1975
1974
1973

Papers

PDF

Open Access

More filters

Patent•

Method and system for searching indexed string containing a search string

[...]

Claude Roux¹, Bernard Jacquemin•Institutions (1)

Xerox¹

19 Dec 2003

TL;DR: In this article, the inner grammatical structure of a string over a language having a vocabulary and a grammar using bit vectors is indexed on different levels by disregarding some of the grammatical relationships of component levels.

...read moreread less

Abstract: Systems and methods for indexing and searching the inner structure of a string over a language having a vocabulary and a grammar using bit vectors. The index preserves the inner grammatical structure of the string while allowing for a fast search. A single search provides immediate access to every level of a document, without having to re-search a single string to determine which sub-parts of that string match the search string. When a string is indexed, the index maintains a compositional representation and the grammatical relationship between the elements of the vocabulary according to the language. The string is then indexed on different levels by disregarding some of the grammatical relationships of component levels.

...read moreread less

12 citations

Journal Article•DOI•

A randomized Numerical Aligner (rNA)

[...]

Alberto Policriti, Alexandru I. Tomescu¹, Francesco Vezzi•Institutions (1)

University of Bucharest¹

01 Nov 2012-Journal of Computer and System Sciences

TL;DR: A generalization of the classical Rabin-Karp string matching algorithm to solve the k-mismatch problem, with average complexity O(n+m) (n text and m pattern lengths, respectively) and is in general faster and more accurate than other available tools like SOAP2, BWA, and BOWTIE.

...read moreread less

12 citations

Journal Article•DOI•

On the feasibility of character n-grams pseudo-translation for Cross-Language Information Retrieval tasks

[...]

Jesús Vilares, Manuel Vilares¹, Miguel A. Alonso, Michael Oakes²•Institutions (2)

University of Vigo¹, University of Wolverhampton²

01 Mar 2016-Computer Speech & Language

TL;DR: The results obtained not only confirm the consistency across languages of this kind of character n-gram based approaches, but also constitute a further proof of their validity and applicability, these not being tied to a given implementation.

...read moreread less

12 citations

Book Chapter•DOI•

On the closest string via rank distance

[...]

Liviu P. Dinu¹, Alexandru Popa²•Institutions (2)

University of Bucharest¹, Aalto University²

03 Jul 2012

TL;DR: The CSP and CSSP via rank distance are NP-hard and a polynomial time k-approximation algorithm for the CSP is presented, which is a parametrized algorithm if the alphabet is binary and each string has the same number of 0's and 1's.

...read moreread less

Abstract: Given a set S of k strings of maximum length n, the goal of the closest substring problem (CSSP) is to find the smallest integer d (and a corresponding string t of length l≤n) such that each string s∈S has a substring of length l of "distance" at most d to t. The closest string problem (CSP) is a special case of CSSP where l=n. CSP and CSSP arise in many applications in bioinformatics and are extensively studied in the context of Hamming and edit distance. In this paper we consider a recently introduced distance measure, namely the rank distance. First, we show that the CSP and CSSP via rank distance are NP-hard. Then, we present a polynomial time k-approximation algorithm for the CSP problem. Finally, we give a parametrized algorithm for the CSP (the parameter is the number of input strings) if the alphabet is binary and each string has the same number of 0's and 1's.

...read moreread less

12 citations

Journal Article•DOI•

Swiftly Computing Center Strings

[...]

Franziska Hufsky¹, Franziska Hufsky², Leon Kuchenbecker³, Katharina Jahn³, Jens Stoye³, Sebastian Böcker¹ - Show less +2 more•Institutions (3)

University of Jena¹, Max Planck Society², Bielefeld University³

19 Apr 2011-BMC Bioinformatics

TL;DR: This paper introduces data reduction techniques that allow us to infer that certain instances have no solution, or that a center string must satisfy certain conditions, and describes a novel iterative search strategy that is effecient in practice, where some of the reduction techniques can be applied.

...read moreread less

Abstract: The center string (or closest string) problem is a classic computer science problem with important applications in computational biology. Given k input strings and a distance threshold d, we search for a string within Hamming distance at most d to each input string. This problem is NP complete. In this paper, we focus on exact methods for the problem that are also swift in application. We first introduce data reduction techniques that allow us to infer that certain instances have no solution, or that a center string must satisfy certain conditions. We describe how to use this information to speed up two previously published search tree algorithms. Then, we describe a novel iterative search strategy that is effecient in practice, where some of our reduction techniques can also be applied. Finally, we present results of an evaluation study for two different data sets from a biological application. We find that the running time for computing the optimal center string is dominated by the subroutine calls for d = dopt -1 and d = dopt. Our data reduction is very effective for both, either rejecting unsolvable instances or solving trivial positions. We find that this speeds up computations considerably.

...read moreread less

12 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
…
130
131
132
133
134
135
136
…
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

1,942

Papers

64,998

Citations

No. of papers in the topic in previous years
Year	Papers
2023	8
2022	30
2021	32
2020	30
2019	48
2018	39

Approximate string matching

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics