Proceedings ArticleDOI
Record linkage: similarity measures and algorithms
Nick Koudas,Sunita Sarawagi,Divesh Srivastava +2 more
- pp 802-803
Reads0
Chats0
TLDR
This tutorial provides a comprehensive and cohesive overview of the key research results in the area of record linkage methodologies and algorithms for identifying approximate duplicate records, and available tools for this purpose.Abstract:
This tutorial provides a comprehensive and cohesive overview of the key research results in the area of record linkage methodologies and algorithms for identifying approximate duplicate records, and available tools for this purpose. It encompasses techniques introduced in several communities including databases, information retrieval, statistics and machine learning. It aims to identify similarities and differences across the techniques as well as their merits and limitations.read more
Citations
More filters
Journal ArticleDOI
Evaluation of entity resolution approaches on real-world match problems
TL;DR: It is found that some challenging resolution tasks such as matching product entities from online shops are not sufficiently solved with conventional approaches based on the similarity of attribute values.
Journal ArticleDOI
Frameworks for entity matching: A comparison
Hanna Köpcke,Erhard Rahm +1 more
TL;DR: This paper comparatively analyze 11 proposed frameworks for entity matching and considers both frameworks which do or do not utilize training data to semi-automatically find an entity matching strategy to solve a given match task.
Journal ArticleDOI
HoloClean: holistic data repairs with probabilistic inference
TL;DR: A series of optimizations are introduced which ensure that inference over HoloClean's probabilistic model scales to instances with millions of tuples, and yields an average F1 improvement of more than 2× against state-of-the-art methods.
Journal ArticleDOI
Entity resolution: theory, practice & open challenges
TL;DR: This tutorial brings together perspectives on ER from a variety of fields, including databases, machine learning, natural language processing and information retrieval, to provide, in one setting, a survey of a large body of work.
References
More filters
Book
Introduction to Modern Information Retrieval
Gerard Salton,Michael J. McGill +1 more
TL;DR: Reading is a need and a hobby at once and this condition is the on that will make you feel that you must read.
Journal ArticleDOI
A Theory for Record Linkage
Ivan P. Fellegi,Alan B. Sunter +1 more
TL;DR: A mathematical model is developed to provide a theoretical framework for a computer-oriented solution to the problem of recognizing those records in two files which represent identical persons, objects or events.
Algorithms on strings, trees, and sequences
TL;DR: Ukkonen’s method is the method of choice for most problems requiring the construction of a suffix tree, and it will be presented first because it is easier to understand.