Home
/
Topics
/
Edit distance

Topic

Edit distance

About: Edit distance is a research topic. Over the lifetime, 2887 publications have been published within this topic receiving 71491 citations.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1981
1980
1976
1975
1974

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Transposition invariant string matching

[...]

Veli Mäkinen¹, Gonzalo Navarro², Esko Ukkonen¹•Institutions (2)

University of Helsinki¹, University of Chile²

01 Aug 2005-Journal of Algorithms

TL;DR: It is shown how sparse dynamic programming can be used to solve transposition invariant problems, and its connection with multidimensional range-minimum search.

...read moreread less

64 citations

Journal Article•DOI•

Fine Classification & Recognition of Hand Written Devnagari Characters with Regular Expressions & Minimum Edit Distance Method

[...]

P. S. Deshpande, Latesh Malik, Sandhya Arora

05 Jan 2008-Journal of Computers

TL;DR: A use of regular expressions in character recognition problem scenarios in sequence analysis that are ideally suited for the application of regular expression algorithms.

...read moreread less

Abstract: Regular expressions are extremely useful, because they allow us to work with text in terms of patterns. They are considered the most sophisticated means of performing operations such as string searching, manipulation, validation, and formatting in all applications that deal with text data. Character recognition problem scenarios in sequence analysis that are ideally suited for the application of regular expression algorithms. This paper describes a use of regular expressions in this problem domain, and demonstrates how the effective use of regular expressions that can serve to facilitate more efficient and more effective character recognition.

...read moreread less

64 citations

Proceedings Article•DOI•

Subgroup Discovery Meets Bayesian Networks -- An Exceptional Model Mining Approach

[...]

Wouter Duivesteijn¹, Arno Knobbe¹, Ad Feelders², Matthijs van Leeuwen²•Institutions (2)

Leiden University¹, Utrecht University²

13 Dec 2010

TL;DR: This work proposes to use these interdependencies to quantify the quality of subgroups, by integrating Bayesian networks with the Exceptional Model Mining framework, and shows interesting subgroups found with this method on datasets from music theory, semantic scene classification, biology and zoogeography.

...read moreread less

Abstract: Whenever a dataset has multiple discrete target variables, we want our algorithms to consider not only the variables themselves, but also the interdependencies between them. We propose to use these interdependencies to quantify the quality of subgroups, by integrating Bayesian networks with the Exceptional Model Mining framework. Within this framework, candidate subgroups are generated. For each candidate, we fit a Bayesian network on the target variables. Then we compare the network’s structure to the structure of the Bayesian network fitted on the whole dataset. To perform this comparison, we define an edit distance-based distance metric that is appropriate for Bayesian networks. We show interesting subgroups that we experimentally found with our method on datasets from music theory, semantic scene classification, biology and zoogeography.

...read moreread less

64 citations

Proceedings Article•DOI•

Machine Translation System Combination using ITG-based Alignments

[...]

Damianos Karakos¹, Jason Eisner¹, Sanjeev Khudanpur¹, Markus Dreyer¹•Institutions (1)

Johns Hopkins University¹

16 Jun 2008

TL;DR: Given several systems' automatic translations of the same sentence, it is shown how to combine them into a confusion network, whose various paths represent composite translations that could be considered in a subsequent rescoring step.

...read moreread less

Abstract: Given several systems' automatic translations of the same sentence, we show how to combine them into a confusion network, whose various paths represent composite translations that could be considered in a subsequent rescoring step. We build our confusion networks using the method of Rosti et al. (2007), but, instead of forming alignments using the tercom script (Snover et al., 2006), we create alignments that minimize invWER (Leusch et al., 2003), a form of edit distance that permits properly nested block movements of substrings. Oracle experiments with Chinese newswire and weblog translations show that our confusion networks contain paths which are significantly better (in terms of BLEU and TER) than those in tercom-based confusion networks.

...read moreread less

64 citations

Journal Article•DOI•

Real world performance of approximate string comparators for use in patient matching

[...]

Shaun J. Grannis, J. Marc Overhage¹, Clement J. McDonald¹•Institutions (1)

Indiana University – Purdue University Indianapolis¹

01 Jan 2004-Studies in health technology and informatics

TL;DR: Approximate string comparators increase deterministic linkage sensitivity by up to 10% compared to exact match comparisons and represent an accurate method of linking to vital statistics data.

...read moreread less

Abstract: Medical record linkage is becoming increasingly important as clinical data is distributed across independent sources. To improve linkage accuracy we studied different name comparison methods that establish agreement or disagreement between corresponding names. In addition to exact raw name matching and exact phonetic name matching, we tested three approximate string comparators. The approximate comparators included the modified Jaro-Winkler method, the longest common substring, and the Levenshtein edit distance. We also calculated the combined root-mean square of all three. We tested each name comparison method using a deterministic record linkage algorithm. Results were consistent across both hospitals. At a threshold comparator score of 0.8, the Jaro-Winkler comparator achieved the highest linkage sensitivities of 97.4% and 97.7%. The combined root-mean square method achieved sensitivities higher than the Levenshtein edit distance or longest common substring while sustaining high linkage specificity. Approximate string comparators increase deterministic linkage sensitivity by up to 10% compared to exact match comparisons and represent an accurate method of linking to vital statistics data.

...read moreread less

64 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
…
46
47
48
49
50
51
52
…
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

3,030

Papers

78,281

Citations

No. of papers in the topic in previous years
Year	Papers
2023	39
2022	96
2021	111
2020	149
2019	145
2018	139

Edit distance

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics