A Comparison of String Metrics for Matching Names and Records
Citations
[...]
2,579 citations
1,285 citations
Cites background from "A Comparison of String Metrics for ..."
...A comparison of different string matching techniques, from distance like functions to token-based distance functions can be found in [9]....
[...]
465 citations
Cites background or methods from "A Comparison of String Metrics for ..."
...The second one is performed with classical benchmarks found in literature for data integration and retrieval [20]....
[...]
...Thus we could not resist but to evaluate it with classical benchmarks found in literature like the ones in [7,8,24,20]....
[...]
...In order to evaluate our metric against these datasets we used the SecondString open-source library [20]....
[...]
422 citations
Cites background or methods from "A Comparison of String Metrics for ..."
...Cohen et al. [ 6 ] found such hybrid measures to outperform pure word-based and pure string-based ones for entity resolution....
[...]
...The second step involves learning an MLN(B+C+T) model on the words inferred by the first stage.3 This model implements a hybrid similarity measure as proposed by Cohen et al. [ 6 ]....
[...]
...Several authors have devised, compared and learned similarity measures for use in entity resolution (e.g., [ 6 , 45, 3])....
[...]
357 citations
References
3,985 citations
"A Comparison of String Metrics for ..." refers methods in this paper
...SecondString supports a range of metrics based on edit distance, including Levenstein distance, which assigns a unit cost to all edit operations); and the Monge-Elkan distance function (Monge & Elkan 1996), a well-tuned affine variant of the Smith-Waterman distance function (Durban et al. 1998)....
[...]
2,306 citations
"A Comparison of String Metrics for ..." refers methods in this paper
...We have also implemented token-based distance metrics based on Jensen-Shannon distance (Dagan, Lee, & Pereira 1999) with various smoothing methods, and a simplified form of Fellegi and Sunter’s method (Fellegi & Sunter 1969), called SFS below....
[...]
...In statistics, a long line of research has been conducted in probabilistic record linkage, largely based on the seminal paper by Fellegi and Sunter (1969)....
[...]
1,355 citations
1,347 citations
"A Comparison of String Metrics for ..." refers background or methods in this paper
...These proposals have been, by and large, adopted by subsequent researchers, often with elaborations of the underlying statistical model (Jaro 1989; 1995; Winkler 1999; Larsen 1999; Belin & Rubin 1997)....
[...]
...It also supports the Jaro metric (Jaro 1995; 1989), a metric widely used in the record-linkage community, with and without a variation due to Winkler (1999)....
[...]
1,197 citations