A probabilistic model for entity disambiguation using relationships

Open Access

A probabilistic model for entity disambiguation using relationships

TLDR

In this paper it is argued a better solution exist which analyzes not only features but also relationships, and standard feature-based data cleaning approaches can be employed.

Abstract:

Graphs representing relationships among sets of entities are of increasing focus of interest in the context of data analysis applications. These graphs are typically constructed from existing datasets from which entities and relationships are extracted. For some of the entities, values in certain attributes would refer to other entities – such references determine relationships. Often, for certain datasets such references are given in the form of (string) descriptions. Each such description alone may not uniquely identify one entity as it is supposed to, but rather can match descriptions of multiple entities. Such cases are especially common if the datasets are collected not from one but multiple heterogeneous sources. Thus the correct linking of entities via relationships can be a nontrivial challenge which, if done incorrectly, can in turn impede further graph-based analyses. To overcome this problem, standard feature-based data cleaning approaches can be employed. In this paper we argue a better solution exist which analyzes not only features but also relationships.

Citations

PDF

Open Access

More filters

Patent

Method and apparatus for automatic entity disambiguation

Matthias Blume, +4 more

TL;DR: The authors used multiple search keys to efficiently find pairs of mentions that correspond to the same entity by performing within-document entity disambiguation (100) and cross-document (110) while skipping billions of unnecessary comparisons, yielding a system with very high throughput that can be applied to truly massive data.

...read moreread less

Unsupervised Name Disambiguation via Social Network Similarity

Bradley A. Malin

TL;DR: Unsupervised methods which simultaneously learn the number of entities represented by a particular name and which observations correspond to the same entity are investigated, suggesting methods which measure similarity based on community, rather than exact, similarity provide more robust disambiguation capability.

...read moreread less

Patent

System and method for creating and maintaining a database of disambiguated entity mentions and relations from a corpus of electronic documents

Michael A. Woytowitz, +1 more

TL;DR: In this paper, the authors present a method for creating an electronic database of disambiguated entity mentions and relations from a corpus of electronic documents, which automatically extracts from the corpus mentions about entities (e.g., references to people, organizations or places) and parses the entity mentions into "mention objects," and executes a series of grouping, comparison and hierarchical fuzzy object clustering algorithms to cluster together all of mention objects referring to the same entity and all of the mention objects associated with each other by a relationship.

...read moreread less

Patent

Fast accurate fuzzy matching

Uwe F. Mayer, +2 more

TL;DR: A computer-implemented technique for fuzzy matching is described in this article, which works quickly yet accurately to determine if a given computer-readable record is represented, by exact match or pretty close match, in a large collection of computerreadable records.

...read moreread less

Book ChapterDOI

Semantic Relatedness Approach for Named Entity Disambiguation

Anna Lisa Gentile, +3 more

TL;DR: This work addresses the problem of giving a sense to proper names in a text, that is, automatically associating words representing Named Entities with their referents, based on Semantic Relatedness Scores obtained with a graph based model over Wikipedia.

...read moreread less

References

PDF

Open Access

More filters

Journal ArticleDOI

Data mining and knowledge discovery: making sense out of data

U.M. Feyyad

- 01 Oct 1996 -

IEEE Intelligent Systems

TL;DR: Without a concerted effort to develop knowledge discovery techniques, organizations stand to forfeit much of the value from the data they currently collect and store.

...read moreread less

Journal ArticleDOI

A Theory for Record Linkage

Ivan P. Fellegi, +1 more

- 01 Dec 1969 -

Journal of the American Statistical Asso...

TL;DR: A mathematical model is developed to provide a theoretical framework for a computer-oriented solution to the problem of recognizing those records in two files which represent identical persons, objects or events.

...read moreread less

Proceedings Article

A comparison of string distance metrics for name-matching tasks

William W. Cohen, +2 more

TL;DR: Using an open-source, Java toolkit of name-matching methods, the authors experimentally compare string distance metrics on the task of matching entity names and find that the best performing method is a hybrid scheme combining a TFIDF weighting scheme, which is widely used in information retrieval, with the Jaro-Winkler string-distance scheme.

...read moreread less

Journal ArticleDOI

Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida

Matthew A. Jaro

- 01 Jun 1989 -

Journal of the American Statistical Asso...

TL;DR: The theoretical and practical issues encountered in conducting the matching operation and the results of that operation are discussed.

...read moreread less

Proceedings ArticleDOI

Efficient clustering of high-dimensional data sets with application to reference matching

Andrew McCallum, +2 more

TL;DR: This work presents a new technique for clustering large datasets, using a cheap, approximate distance measure to eciently divide the data into overlapping subsets the authors call canopies, and presents ex- perimental results on grouping bibliographic citations from the reference sections of research papers.

...read moreread less

Collapse

Procedia Computer Science

Network analysis of named entity interactions in written texts

Diego R. Amancio

- 17 Sep 2015 -

arXiv: Computation and Language

A probabilistic model for entity disambiguation using relationships

Citations

Method and apparatus for automatic entity disambiguation

Unsupervised Name Disambiguation via Social Network Similarity

System and method for creating and maintaining a database of disambiguated entity mentions and relations from a corpus of electronic documents

Fast accurate fuzzy matching

Semantic Relatedness Approach for Named Entity Disambiguation

References

Data mining and knowledge discovery: making sense out of data

A Theory for Record Linkage

A comparison of string distance metrics for name-matching tasks

Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida

Efficient clustering of high-dimensional data sets with application to reference matching

Related Papers (5)

Entity-Based Cross-Document Core f erencing Using the Vector Space Model

From Entities to Geometry: Towards exploiting Multiple Sources to Predict Relevance

SAEA: Self-Attentive Heterogeneous Sequence Learning Model for Entity Alignment

Vector Representation for Sub-Graph Encoding to Resolve Entities

Network analysis of named entity interactions in written texts