Home
/
Topics
/
Approximate string matching

Topic

Approximate string matching

About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1981
1980
1979
1978
1977
1976
1975
1974
1973

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Two-dimensional substring indexing

[...]

Paolo Ferragina¹, Nick Koudas², Divesh Srivastava², S. Muthukrishnan²•Institutions (2)

University of Pisa¹, AT&T Labs²

01 May 2001

TL;DR: A technique for two-dimensional substring indexing based on a reduction to the geometric problem of identifying common colors in two ranges containing colored points is presented and can be practically realized using a combination of string B-trees and R-tree.

...read moreread less

Abstract: As databases have expanded in scope to storing string data (XML documents, product catalogs), it has become increasingly important to search databases based on matching substrings, often on multiple, correlated dimensions. While string B-trees are I/O optimal in one dimension, no index structure with non-trivial query bounds is known for two-dimensional substring indexing.In this paper, we present a technique for two-dimensional substring indexing based on a reduction to the geometric problem of identifying common colors in two ranges containing colored points. We develop an I/O efficient algorithm for solving the common colors problem, and use it to obtain an I/O efficient (poly-logarithmic query time) algorithm for the two-dimensional substring indexing problem. Our techniques result in a family of secondary memory index structures that trade space for time, with no loss of accuracy. We show how our technique can be practically realized using a combination of string B-trees and R-trees.

...read moreread less

17 citations

Proceedings Article•DOI•

Text scanning approach for exact string matching

[...]

Muhammad Zubair¹, Fazal Wahab¹, Iftikhar Hussain¹, Muhammad Ikram¹•Institutions (1)

Iqra University¹

11 Jun 2010

TL;DR: This research proposes a new concept to solve the problem of exact string matching by scanning text string for the rightmost character of the pattern in preprocessing phase by implementing TSPRC (Test Scanning for Pattern Rightmost Character).

...read moreread less

Abstract: Exact string matching algorithms are essential components in practical applications of the computer system. In this research we propose a new concept to solve the problem of exact string matching by scanning text string for the rightmost character of the pattern in preprocessing phase. In matching phase TSPRC (Test Scanning for Pattern Rightmost Character) compares the pattern with text window from both directions simultaneously. Proposed algorithm implemented and compared with existing algorithms. Comparison results demonstrate that TSPRC is efficient than the number of the existing algorithm and take O(1) time complexity in the best case.

...read moreread less

17 citations

Proceedings Article•DOI•

Accelerating Levenshtein and Damerau edit distance algorithms using GPU with unified memory

[...]

Khaled Balhaf¹, Mohammad A. Alsmirat¹, Mahmoud Al-Ayyoub¹, Yaser Jararweh¹, Mohammed A. Shehab¹ - Show less +1 more•Institutions (1)

Jordan University of Science and Technology¹

04 Apr 2017

TL;DR: This paper uses the CUDA based Graphics Processing Unit (GPU) and the newly introduced Unified Memory (UM) to speed up the most common algorithms to compute the edit distance between two string algorithms, the Levenshtein and Damerau distance algorithms.

...read moreread less

Abstract: String matching problems such as sequence alignment is one of the fundamental problems in many computer since fields such as natural language processing (NLP) and bioinformatics. Many algorithms have been proposed in the literature to address this problem. Some of these algorithms compute the edit distance between the two strings to perform the matching. However, these algorithms usually require long execution time. Many researches use high performance computing to reduce the execution time of many string matching algorithms. In this paper, we use the CUDA based Graphics Processing Unit (GPU) and the newly introduced Unified Memory(UM) to speed up the most common algorithms to compute the edit distance between two string. These algorithms are the Levenshtein and Damerau distance algorithms. Our results show that using GPU to implement the Levenshtein and Damerau distance algorithms improvements their execution times of about 11X and 12X respectively when compared to the sequential implementation. And an improvement of about 61X and 71X respectively can be achieved when GPU is used with unified memory.

...read moreread less

17 citations

Book Chapter•DOI•

Filtration Algorithms for Approximate Order-Preserving Matching

[...]

Tamanna Chhabra¹, Emanuele Giaquinta¹, Jorma Tarhio¹•Institutions (1)

Aalto University¹

01 Sep 2015

TL;DR: Practical solutions for the exact order-preserving matching problem to find all the substrings of a text T which have the same length and relative order as a pattern P are presented.

...read moreread less

Abstract: The exact order-preserving matching problem is to find all the substrings of a text T which have the same length and relative order as a pattern P. Like string maching, order-preserving matching can be generalized by allowing the match to be approximate. In approximate order-preserving matching two strings match if they have the same relative order after removing up to k elements in the same positions in both strings. In this paper we present practical solutions for this problem. The methods are based on filtration, and one of them is the first sublinear solution on average. We show by practical experiments that the new solutions are fast and efficient.

...read moreread less

17 citations

Proceedings Article•DOI•

Efficient algorithms for approximate member extraction using signature-based inverted lists

[...]

Jiaheng Lu¹, Jialong Han¹, Xiaofeng Meng¹•Institutions (1)

Renmin University of China¹

02 Nov 2009

TL;DR: An incremental algorithm using signature-based inverted lists to minimize the duplicate list-scan operations of overlapping windows in the text and significantly outperform existing methods in the literature.

...read moreread less

Abstract: We study the problem of approximate membership extraction (AME), i.e., how to efficiently extract substrings in a text document that approximately match some strings in a given dictionary. This problem is important in a variety of applications such as named entity recognition and data cleaning. We solve this problem in two steps. In the first step, for each substring in the text, we filter away the strings in the dictionary that are very different from the substring. In the second step, each candidate string is verified to decide whether the substring should be extracted. We develop an incremental algorithm using signature-based inverted lists to minimize the duplicate list-scan operations of overlapping windows in the text. Our experimental study of the proposed algorithms on real and synthetic datasets showed that our solutions significantly outperform existing methods in the literature.

...read moreread less

17 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
…
105
106
107
108
109
110
111
…
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

1,942

Papers

64,998

Citations

No. of papers in the topic in previous years
Year	Papers
2023	8
2022	30
2021	32
2020	30
2019	48
2018	39

Approximate string matching

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics