scispace - formally typeset
Journal IssueDOI

An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations

Reads0
Chats0
TLDR
The results show that the unsupervised method, when using all attributes, performs competitively against all other methods, under both metrics, loosing only in one case against a supervised method, whose result was very close to the authors'.
Abstract
Name ambiguity in the context of bibliographic citations is a difficult problem which, despite the many efforts from the research community, still has a lot of room for improvement In this article, we present a heuristic-based hierarchical clustering method to deal with this problem The method successively fuses clusters of citations of similar author names based on several heuristics and similarity measures on the components of the citations (eg, coauthor names, work title, and publication venue title) During the disambiguation task, the information about fused clusters is aggregated providing more information for the next round of fusion In order to demonstrate the effectiveness of our method, we ran a series of experiments in two different collections extracted from real-world digital libraries and compared it, under two metrics, with four representative methods described in the literature We present comparisons of results using each considered attribute separately (ie, coauthor names, work title, and publication venue title) with the author name attribute and using all attributes together These results show that our unsupervised method, when using all attributes, performs competitively against all other methods, under both metrics, loosing only in one case against a supervised method, whose result was very close to ours Moreover, such results are achieved without the burden of any training and without using any privileged information such as knowing a priori the correct number of clusters © 2010 Wiley Periodicals, Inc

read more

Citations
More filters
Journal Article

The DBLP Computer Science bibliography: Evolution, research issues, perspectives

TL;DR: The DBLP Computer Science Bibliography of the University of Trier as discussed by the authors is a large collection of bibliographic information used by thousands of computer scientists, which is used for scientific communication.
Journal ArticleDOI

A brief survey of automatic methods for author name disambiguation

TL;DR: A taxonomy for characterizing the current author name disambiguation methods described in the literature is proposed, a brief survey of the most representative ones is presented and several open challenges are discussed.
Journal ArticleDOI

Author name disambiguation: What difference does it make in author-based citation analysis?

TL;DR: It is found that the traditional approach leads to extremely distorted rankings and substantially distorted mappings of authors in this field when based on first- or all-author citation counting, whereas last-author-based citation ranking and cocitation mapping both appear relatively immune to the author name ambiguity problem.
Journal ArticleDOI

Academic social networks: Modeling, analysis, mining and applications

TL;DR: This study investigates the background, the current status, and trends of academic social networks, and systematically review representative research tasks in this domain from three levels: actor, relationship, and network.
Journal ArticleDOI

Accuracy of simple, initials-based methods for author name disambiguation

TL;DR: In this paper, the authors derived realistic estimates for the accuracy of simple, initials-based methods using simulated bibliographic datasets in which the true identities of authors are known, and proposed a new name-based method that combines the features of first initial and all initials methods by implicitly taking into account the last name frequency and the size of the dataset.
References
More filters
Journal ArticleDOI

Support-Vector Networks

TL;DR: High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated and the performance of the support- vector network is compared to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
Journal ArticleDOI

An algorithm for suffix stripping

TL;DR: An algorithm for suffix stripping is described, which has been implemented as a short, fast program in BCPL, and performs slightly better than a much more elaborate system with which it has been compared.
Journal ArticleDOI

A vector space model for automatic indexing

TL;DR: An approach based on space density computations is used to choose an optimum indexing vocabulary for a collection of documents, demonstating the usefulness of the model.
Journal ArticleDOI

Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering

TL;DR: This survey tries to clarify the different problem definitions related to subspace clustering in general; the specific difficulties encountered in this field of research; the varying assumptions, heuristics, and intuitions forming the basis of different approaches; and how several prominent solutions tackle different problems.
Journal Article

Fast Kernel Classifiers with Online and Active Learning

TL;DR: This contribution presents an online SVM algorithm based on the premise that active example selection can yield faster training, higher accuracies, and simpler models, using only a fraction of the training example labels.
Related Papers (5)