A Theory for Record Linkage

doi:10.1080/01621459.1969.10501049

Journal ArticleDOI

A Theory for Record Linkage

Ivan P. Fellegi, +1 more

- 01 Dec 1969 -

Journal of the American Statistical Asso...

- Vol. 64, Iss: 328, pp 1183-1210

Chats0

TLDR

A mathematical model is developed to provide a theoretical framework for a computer-oriented solution to the problem of recognizing those records in two files which represent identical persons, objects or events.

Abstract:

A mathematical model is developed to provide a theoretical framework for a computer-oriented solution to the problem of recognizing those records in two files which represent identical persons, objects or events (said to be matched). A comparison is to be made between the recorded characteristics and values in two records (one from each file) and a decision made as to whether or not the members of the comparison-pair represent the same person or event, or whether there is insufficient evidence to justify either of these decisions at stipulated levels of error. These three decisions are referred to as link (A 1), a non-link (A 3), and a possible link (A 2). The first two decisions are called positive dispositions. The two types of error are defined as the error of the decision A 1 when the members of the comparison pair are in fact unmatched, and the error of the decision A 3 when the members of the comparison pair are, in fact matched. The probabilities of these errors are defined as and respecti...

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Duplicate Record Detection: A Survey

Ahmed K. Elmagarmid, +2 more

- 01 Jan 2007 -

IEEE Transactions on Knowledge and Data ...

TL;DR: This paper presents an extensive set of duplicate detection algorithms that can detect approximately duplicate records in a database and covers similarity metrics that are commonly used to detect similar field entries.

...read moreread less

Proceedings Article

A comparison of string distance metrics for name-matching tasks

William W. Cohen, +2 more

TL;DR: Using an open-source, Java toolkit of name-matching methods, the authors experimentally compare string distance metrics on the task of matching entity names and find that the best performing method is a hybrid scheme combining a TFIDF weighting scheme, which is widely used in information retrieval, with the Jaro-Winkler string-distance scheme.

...read moreread less

Journal ArticleDOI

Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida

Matthew A. Jaro

- 01 Jun 1989 -

Journal of the American Statistical Asso...

TL;DR: The theoretical and practical issues encountered in conducting the matching operation and the results of that operation are discussed.

...read moreread less

Proceedings ArticleDOI

Efficient clustering of high-dimensional data sets with application to reference matching

Andrew McCallum, +2 more

TL;DR: This work presents a new technique for clustering large datasets, using a cheap, approximate distance measure to eciently divide the data into overlapping subsets the authors call canopies, and presents ex- perimental results on grouping bibliographic citations from the reference sections of research papers.

...read moreread less

Proceedings ArticleDOI

Adaptive duplicate detection using learnable string similarity measures

Mikhail Bilenko, +1 more

TL;DR: This paper proposes to employ learnable text distance functions for each database field, and shows that such measures are capable of adapting to the specific notion of similarity that is appropriate for the field's domain.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Automatic linkage of vital records.

H.B. Newcombe, +3 more

- 16 Oct 1959 -

Science

TL;DR: The authors' special interest in the techniques of record linkage relates to their possible use for keeping track of large groups of individuals who have been exposed to low levels of radiation, in order to determine the causes of their eventual deaths.

...read moreread less

Journal ArticleDOI

Record linkage: making maximum use of the discriminating power of identifying information

Howard B. Newcombe, +1 more

- 01 Nov 1962 -

Communications of The ACM

TL;DR: Rules that can be applied generally to name retrieval systems have been developed in a methodological study of the linkage of vital and health records into family groupings for demographic research purposes.

...read moreread less

Journal ArticleDOI

A Model for Optimum Linkage of Records

Benjamin J. Tepping

- 01 Dec 1968 -

Journal of the American Statistical Asso...

TL;DR: In this article, a model for the frequently recurring problem of linking records from two lists is presented, and the criterion for an optimum decision rule is taken to be the minimization of the expected total costs associated with the various actions that may be taken for each pair of records.

...read moreread less

Journal ArticleDOI

A Solution to the Problem of Linking Multivariate Documents

N. S. D'Andrea Du Bois

- 01 Mar 1969 -

Journal of the American Statistical Asso...

TL;DR: Some aspects of classifying pairs of documents into one of two populations when their items are identifying information, where each item of information can take on three distinct values correct, incorrect or missing, are considered.

...read moreread less

Journal ArticleDOI

Outcome Probabilities for a Record Matching Process with Complete Invariant Information

Gad Nathan

- 01 Jun 1967 -

Journal of the American Statistical Asso...

TL;DR: It is shown that this can be done for a simple model which assumes that the information used for matching is complete and invariant but, possibly, insufficient to distinguish between all population items, by considering only the class-size probability distributions.

...read moreread less

Related Papers (5)

Duplicate Record Detection: A Survey

Ahmed K. Elmagarmid, +2 more

- 01 Jan 2007 -

IEEE Transactions on Knowledge and Data ...

Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida

Matthew A. Jaro

- 01 Jun 1989 -

Journal of the American Statistical Asso...

A Theory for Record Linkage

Citations

Duplicate Record Detection: A Survey

A comparison of string distance metrics for name-matching tasks

Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida

Efficient clustering of high-dimensional data sets with application to reference matching

Adaptive duplicate detection using learnable string similarity measures

References

Automatic linkage of vital records.

Record linkage: making maximum use of the discriminating power of identifying information

A Model for Optimum Linkage of Records

A Solution to the Problem of Linking Multivariate Documents

Outcome Probabilities for a Record Matching Process with Complete Invariant Information

Related Papers (5)

Duplicate Record Detection: A Survey

Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida

The merge/purge problem for large databases

Efficient clustering of high-dimensional data sets with application to reference matching

Adaptive duplicate detection using learnable string similarity measures