scispace - formally typeset
Proceedings ArticleDOI

Record linkage: similarity measures and algorithms

Reads0
Chats0
TLDR
This tutorial provides a comprehensive and cohesive overview of the key research results in the area of record linkage methodologies and algorithms for identifying approximate duplicate records, and available tools for this purpose.
Abstract
This tutorial provides a comprehensive and cohesive overview of the key research results in the area of record linkage methodologies and algorithms for identifying approximate duplicate records, and available tools for this purpose. It encompasses techniques introduced in several communities including databases, information retrieval, statistics and machine learning. It aims to identify similarities and differences across the techniques as well as their merits and limitations.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Evaluation of entity resolution approaches on real-world match problems

TL;DR: It is found that some challenging resolution tasks such as matching product entities from online shops are not sufficiently solved with conventional approaches based on the similarity of attribute values.
Journal ArticleDOI

Frameworks for entity matching: A comparison

TL;DR: This paper comparatively analyze 11 proposed frameworks for entity matching and considers both frameworks which do or do not utilize training data to semi-automatically find an entity matching strategy to solve a given match task.
Journal ArticleDOI

HoloClean: holistic data repairs with probabilistic inference

TL;DR: A series of optimizations are introduced which ensure that inference over HoloClean's probabilistic model scales to instances with millions of tuples, and yields an average F1 improvement of more than 2× against state-of-the-art methods.
Journal ArticleDOI

Entity resolution: theory, practice & open challenges

TL;DR: This tutorial brings together perspectives on ER from a variety of fields, including databases, machine learning, natural language processing and information retrieval, to provide, in one setting, a survey of a large body of work.
References
More filters
Book

Introduction to Modern Information Retrieval

TL;DR: Reading is a need and a hobby at once and this condition is the on that will make you feel that you must read.
Journal ArticleDOI

A Theory for Record Linkage

TL;DR: A mathematical model is developed to provide a theoretical framework for a computer-oriented solution to the problem of recognizing those records in two files which represent identical persons, objects or events.

Algorithms on strings, trees, and sequences

Dan Gusfield
TL;DR: Ukkonen’s method is the method of choice for most problems requiring the construction of a suffix tree, and it will be presented first because it is easier to understand.
Related Papers (5)