An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations

doi:10.1002/ASI.V61:9

Journal IssueDOI

An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations

Ricardo G. Cota, +4 more

- 01 Sep 2010 -

Journal of the Association for Informati...

- Vol. 61, Iss: 9, pp 1853-1870

Chats0

TLDR

The results show that the unsupervised method, when using all attributes, performs competitively against all other methods, under both metrics, loosing only in one case against a supervised method, whose result was very close to the authors'.

Abstract:

Name ambiguity in the context of bibliographic citations is a difficult problem which, despite the many efforts from the research community, still has a lot of room for improvement In this article, we present a heuristic-based hierarchical clustering method to deal with this problem The method successively fuses clusters of citations of similar author names based on several heuristics and similarity measures on the components of the citations (eg, coauthor names, work title, and publication venue title) During the disambiguation task, the information about fused clusters is aggregated providing more information for the next round of fusion In order to demonstrate the effectiveness of our method, we ran a series of experiments in two different collections extracted from real-world digital libraries and compared it, under two metrics, with four representative methods described in the literature We present comparisons of results using each considered attribute separately (ie, coauthor names, work title, and publication venue title) with the author name attribute and using all attributes together These results show that our unsupervised method, when using all attributes, performs competitively against all other methods, under both metrics, loosing only in one case against a supervised method, whose result was very close to ours Moreover, such results are achieved without the burden of any training and without using any privileged information such as knowing a priori the correct number of clusters © 2010 Wiley Periodicals, Inc

An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations

Citations

The DBLP Computer Science bibliography: Evolution, research issues, perspectives

A brief survey of automatic methods for author name disambiguation

Author name disambiguation: What difference does it make in author-based citation analysis?

Academic social networks: Modeling, analysis, mining and applications

Accuracy of simple, initials-based methods for author name disambiguation

References

Support-Vector Networks

An algorithm for suffix stripping

A vector space model for automatic indexing

Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering

Fast Kernel Classifiers with Online and Active Learning

Related Papers (5)

Two supervised learning approaches for name disambiguation in author citations

Name disambiguation in author citations using a K-way spectral clustering method

A brief survey of automatic methods for author name disambiguation

Author name disambiguation in MEDLINE

A Unified Probabilistic Framework for Name Disambiguation in Digital Library