scispace - formally typeset
Proceedings ArticleDOI

Understanding evolution of research themes: a probabilistic generative model for citations

Reads0
Chats0
TLDR
This paper proposes a novel way of analyzing literature citation to explore the research topics and the theme evolution by modeling article citation relations with a probabilistic generative model, and demonstrates that Citation-LDA can effectively discover the evolution of research themes, with better formed topics than (conventional) Content-L DA.
Abstract
Understanding how research themes evolve over time in a research community is useful in many ways (e.g., revealing important milestones and discovering emerging major research trends). In this paper, we propose a novel way of analyzing literature citation to explore the research topics and the theme evolution by modeling article citation relations with a probabilistic generative model. The key idea is to represent a research paper by a ``bag of citations'' and model such a ``citation document'' with a probabilistic topic model. We explore the extension of a particular topic model, i.e., Latent Dirichlet Allocation~(LDA), for citation analysis, and show that such a Citation-LDA can facilitate discovering of individual research topics as well as the theme evolution from multiple related topics, both of which in turn lead to the construction of evolution graphs for characterizing research themes. We test the proposed citation-LDA on two datasets: the ACL Anthology Network(AAN) of natural language research literatures and PubMed Central(PMC) archive of biomedical and life sciences literatures, and demonstrate that Citation-LDA can effectively discover the evolution of research themes, with better formed topics than (conventional) Content-LDA.

read more

Citations
More filters
Proceedings ArticleDOI

Modeling Concept Dependencies in a Scientific Corpus

TL;DR: An information-theoretic view of concept dependency is formulated and methods to construct a “concept graph” automatically from a text corpus are presented to support search capabilities that may be tuned to help users learn a subject rather than retrieve documents based on a single query.
Proceedings ArticleDOI

FEMA: flexible evolutionary multi-faceted analysis for dynamic behavioral pattern discovery

TL;DR: A Flexible Evolutionary Multi-faceted Analysis framework for both behavior prediction and pattern mining that utilizes a flexible and dynamic factorization scheme for analyzing human behavioral data sequences, which can incorporate various knowledge embedded in different object domains to alleviate the sparsity problem.
Journal ArticleDOI

SocoTraveler: Travel-package recommendations leveraging social influence of different relationship types

TL;DR: A probabilistic topic model is developed leveraging individual travel history and social influence of co-travelers to capture personal interests and a recommendation method is proposed to utilize the proposed model.
Journal ArticleDOI

Topic evolution based on the probabilistic topic model: a review

TL;DR: This paper reviews notable research on topic evolution based on the probabilistic topic model from multiple aspects over the past decade and describes applications of the topic evolution model and attempts to summarize model generalization performance evaluation and topic evolution evaluation methods.
Journal ArticleDOI

Understanding hierarchical structural evolution in a scientific discipline: A case study of artificial intelligence

TL;DR: It is concluded that different topics have different development patterns and that the recent artificial intelligence revolution stems from the interactions among the different topics.
References
More filters
Journal ArticleDOI

Latent dirichlet allocation

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Proceedings Article

Latent Dirichlet Allocation

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Journal ArticleDOI

An index to quantify an individual's scientific research output

TL;DR: The index h, defined as the number of papers with citation number ≥h, is proposed as a useful index to characterize the scientific output of a researcher.
Journal ArticleDOI

Finding scientific topics

TL;DR: A generative model for documents is described, introduced by Blei, Ng, and Jordan, and a Markov chain Monte Carlo algorithm is presented for inference in this model, which is used to analyze abstracts from PNAS by using Bayesian model selection to establish the number of topics.
Journal ArticleDOI

Unsupervised Learning by Probabilistic Latent Semantic Analysis

TL;DR: This paper proposes to make use of a temperature controlled version of the Expectation Maximization algorithm for model fitting, which has shown excellent performance in practice, and results in a more principled approach with a solid foundation in statistical inference.