Home
/
Topics
/
Approximate string matching

Topic

Approximate string matching

About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1981
1980
1979
1978
1977
1976
1975
1974
1973

Papers

PDF

Open Access

More filters

Posted Content•

String Reconstruction from Substring Compositions

[...]

Jayadev Acharya¹, Hirakendu Das², Olgica Milenkovic, Alon Orlitsky³, Shengjun Pan - Show less +1 more•Institutions (3)

Massachusetts Institute of Technology¹, Yahoo!², University of California, San Diego³

10 Mar 2014-arXiv: Discrete Mathematics

TL;DR: In this article, the authors consider the problem of reconstructing a string from the multiset of its substring compositions and derive lower and upper bounds on the largest number of strings with given substring composition.

...read moreread less

Abstract: Motivated by mass-spectrometry protein sequencing, we consider a simply-stated problem of reconstructing a string from the multiset of its substring compositions. We show that all strings of length 7, one less than a prime, or one less than twice a prime, can be reconstructed uniquely up to reversal. For all other lengths we show that reconstruction is not always possible and provide sometimes-tight bounds on the largest number of strings with given substring compositions. The lower bounds are derived by combinatorial arguments and the upper bounds by algebraic considerations that precisely characterize the set of strings with the same substring compositions in terms of the factorization of bivariate polynomials. The problem can be viewed as a combinatorial simplification of the turnpike problem, and its solution may shed light on this long-standing problem as well. Using well known results on transience of multi-dimensional random walks, we also provide a reconstruction algorithm that reconstructs random strings over alphabets of size $\ge4$ in optimal near-quadratic time.

...read moreread less

8 citations

Patent•

Method and system for counting machine translation based on phrases

[...]

Liu Zhanyi, Haifeng Wang

10 Mar 2010

TL;DR: In this paper, a method and a system for counting machine translation based on phrases is presented, which comprises a step of performing fuzzy match for the phrases input into a sentence in a presetphrase list.

...read moreread less

Abstract: The invention provides a method and a system for counting machine translation based on phrases The method comprises a step of performing fuzzy match for the phrases input into a sentence in a presetphrase list By performing the fuzzy match for the phrases, the method and the system can generate high-quality translation for longer phrases input into the sentence, and can effectively improve thequality of the translation compared with a machine translation system for precise matching based on the phrases

...read moreread less

8 citations

Journal Article•DOI•

Efficient computations of l 1 and l ∞ rearrangement distances

[...]

Amihood Amir¹, Yonatan Aumann², Piotr Indyk³, Avivit Levy⁴, Ely Porat² - Show less +1 more•Institutions (4)

Johns Hopkins University¹, Bar-Ilan University², Massachusetts Institute of Technology³, University of Haifa⁴

01 Oct 2009-Theoretical Computer Science

TL;DR: It is shown that the problem can be approximated in linear time for general patterns, and efficient exact solutions for different variants of the problem are provided, as well as a faster approximation.

...read moreread less

8 citations

Proceedings Article•DOI•

Fast plagiarism detection based on simple document similarity

[...]

Kensuke Baba¹•Institutions (1)

Fujitsu¹

28 Jun 2017

TL;DR: A plagiarism detection algorithm based on approximate string matching to be specified in “copy and paste”-type plagiarisms, and a speed improvement to an implementation of the algorithm are proposed.

...read moreread less

Abstract: Plagiarism detection in a large number of documents requires efficient methods. This paper proposes a plagiarism detection algorithm based on approximate string matching to be specified in “copy and paste”-type plagiarisms, and a speed improvement to an implementation of the algorithm. Most of the computations required in the algorithm are omitted by two kinds of approximations of the output used for plagiarism detection, while the decrease of accuracy caused by the approximations is acceptable. The effect of the improvement on the processing time and accuracy of the algorithm is evaluated by conducting experiments with a data set. The experimental results show that the improvement can reduce the processing time to approximately one-twentieth for a 6.4% decrease of the accuracy from those for the normal implementation of the algorithm.

...read moreread less

8 citations

Proceedings Article•DOI•

Filtering Strategies for Inexact Subgraph Matching on Noisy Multiplex Networks

[...]

Alexei Kopylov¹, Jiejun Xu¹•Institutions (1)

HRL Laboratories¹

01 Dec 2019

TL;DR: This work extends existing filtering-based subgraph matching algorithms and proposes a new set of filters leveraging the monotone function properties in the multiplex setting that enables effective pruning of irrelevant subgraph regions and expedites the overall matching process.

...read moreread less

Abstract: We study the problem of detecting matching subgraphs in a large multiplex background network based on predefined subgraph templates. Our approach extends existing filtering-based subgraph matching algorithms and proposes a new set of filters leveraging the monotone function properties in the multiplex setting. This enables effective pruning of irrelevant subgraph regions and expedites the overall matching process. In addition, our approach proposes a new strategy based on maximum likelihood estimate to identify “closely matched” subgraphs that are not isomorphic to the given templates from a noisy background network. This allows us to generalize this approach to real-world networks, which are often noisy, incomplete and ambiguous. We demonstrate the effectiveness of the proposed method on a real-world multiplex network provided by the DARPA Modeling Adversarial Activity (MAA) program. Our approach obtains highly accurate subgraph matching results for both the clean and noisy versions of the network, which significantly outperforms the baseline filtering methods. Furthermore, our proposed approach is parallelizable such that it can scale up to handle large input networks.

...read moreread less

8 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
…
160
161
162
163
164
165
166
…
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

1,942

Papers

64,998

Citations

No. of papers in the topic in previous years
Year	Papers
2023	8
2022	30
2021	32
2020	30
2019	48
2018	39

Approximate string matching

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics