Home
/
Topics
/
String metric

Topic

String metric

About: String metric is a research topic. Over the lifetime, 1519 publications have been published within this topic receiving 53853 citations. The topic is also known as: string similarity metric & string distance function.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1981
1980
1979
1978
1977
1976
1975
1974
1972
1967
1965
1942

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Efficient string matching: an aid to bibliographic search

[...]

Alfred V. Aho¹, Margaret J. Corasick¹•Institutions (1)

Bell Labs¹

01 Jun 1975-Communications of The ACM

TL;DR: A simple, efficient algorithm to locate all occurrences of any of a finite number of keywords in a string of text that has been used to improve the speed of a library bibliographic search program by a factor of 5 to 10.

...read moreread less

Abstract: This paper describes a simple, efficient algorithm to locate all occurrences of any of a finite number of keywords in a string of text. The algorithm consists of constructing a finite state pattern matching machine from the keywords and then using the pattern matching machine to process the text string in a single pass. Construction of the pattern matching machine takes time proportional to the sum of the lengths of the keywords. The number of state transitions made by the pattern matching machine in processing the text string is independent of the number of keywords. The algorithm has been used to improve the speed of a library bibliographic search program by a factor of 5 to 10.

...read moreread less

3,270 citations

Journal Article•DOI•

The String-to-String Correction Problem

[...]

Robert A. Wagner¹, Michael J. Fischer²•Institutions (2)

Vanderbilt University¹, Massachusetts Institute of Technology²

01 Jan 1974-Journal of the ACM

TL;DR: An algorithm is presented which solves the string-to-string correction problem in time proportional to the product of the lengths of the two strings.

...read moreread less

Abstract: The string-to-string correction problem is to determine the distance between two strings as measured by the minimum cost sequence of “edit operations” needed to change the one string into the other. The edit operations investigated allow changing one symbol of a string into another single symbol, deleting one symbol from a string, or inserting a single symbol into a string. An algorithm is presented which solves this problem in time proportional to the product of the lengths of the two strings. Possible applications are to the problems of automatic spelling correction and determining the longest subsequence of characters common to two strings.

...read moreread less

3,252 citations

Proceedings Article•

Distance Metric Learning with Application to Clustering with Side-Information

[...]

Eric P. Xing¹, Michael I. Jordan¹, Stuart Russell¹, Andrew Y. Ng¹•Institutions (1)

University of California, Berkeley¹

01 Jan 2002

TL;DR: This paper presents an algorithm that, given examples of similar (and, if desired, dissimilar) pairs of points in �”n, learns a distance metric over ℝn that respects these relationships.

...read moreread less

Abstract: Many algorithms rely critically on being given a good metric over their inputs. For instance, data can often be clustered in many "plausible" ways, and if a clustering algorithm such as K-means initially fails to find one that is meaningful to a user, the only recourse may be for the user to manually tweak the metric until sufficiently good clusters are found. For these and other applications requiring good metrics, it is desirable that we provide a more systematic way for users to indicate what they consider "similar." For instance, we may ask them to provide examples. In this paper, we present an algorithm that, given examples of similar (and, if desired, dissimilar) pairs of points in ℝn, learns a distance metric over ℝn that respects these relationships. Our method is based on posing metric learning as a convex optimization problem, which allows us to give efficient, local-optima-free algorithms. We also demonstrate empirically that the learned metrics can be used to significantly improve clustering performance.

...read moreread less

3,176 citations

Journal Article•DOI•

A guided tour to approximate string matching

[...]

Gonzalo Navarro¹•Institutions (1)

University of Chile¹

01 Mar 2001-ACM Computing Surveys

TL;DR: This work surveys the current techniques to cope with the problem of string matching that allows errors, and focuses on online searching and mostly on edit distance, explaining the problem and its relevance, its statistical behavior, its history and current developments, and the central ideas of the algorithms.

...read moreread less

Abstract: We survey the current techniques to cope with the problem of string matching that allows errors. This is becoming a more and more relevant issue for many fast growing areas such as information retrieval and computational biology. We focus on online searching and mostly on edit distance, explaining the problem and its relevance, its statistical behavior, its history and current developments, and the central ideas of the algorithms and their complexities. We present a number of experiments to compare the performance of the different algorithms and show which are the best choices. We conclude with some directions for future work and open problems.

...read moreread less

2,723 citations

Proceedings Article•

A comparison of string distance metrics for name-matching tasks

[...]

William W. Cohen, Pradeep Ravikumar¹, Stephen E. Fienberg¹•Institutions (1)

Carnegie Mellon University¹

09 Aug 2003

TL;DR: Using an open-source, Java toolkit of name-matching methods, the authors experimentally compare string distance metrics on the task of matching entity names and find that the best performing method is a hybrid scheme combining a TFIDF weighting scheme, which is widely used in information retrieval, with the Jaro-Winkler string-distance scheme.

...read moreread less

Abstract: Using an open-source, Java toolkit of name-matching methods, we experimentally compare string distance metrics on the task of matching entity names We investigate a number of different metrics proposed by different communities, including edit-distance metrics, fast heuristic string comparators, token-based distance metrics, and hybrid methods Overall, the best-performing method is a hybrid scheme combining a TFIDF weighting scheme, which is widely used in information retrieval, with the Jaro-Winkler string-distance scheme, which was developed in the probabilistic record linkage community

...read moreread less

1,355 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

1,561

Papers

56,742

Citations

No. of papers in the topic in previous years
Year	Papers
2023	8
2022	32
2021	18
2020	28
2019	29
2018	37

String metric

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics