Proceedings ArticleDOI
Detecting topic evolution in scientific literature: how can citations help?
Qi He,Bi Chen,Jian Pei,Baojun Qiu,Prasenjit Mitra,C. Lee Giles +5 more
- pp 957-966
Reads0
Chats0
TLDR
An iterative topic evolution learning framework is proposed by adapting the Latent Dirichlet Allocation model to the citation network and develop a novel inheritance topic model, which clearly shows that citations can help to understand topic evolution better.Abstract:
Understanding how topics in scientific literature evolve is an interesting and important problem. Previous work simply models each paper as a bag of words and also considers the impact of authors. However, the impact of one document on another as captured by citations, one important inherent element in scientific literature, has not been considered. In this paper, we address the problem of understanding topic evolution by leveraging citations, and develop citation-aware approaches. We propose an iterative topic evolution learning framework by adapting the Latent Dirichlet Allocation model to the citation network and develop a novel inheritance topic model. We evaluate the effectiveness and efficiency of our approaches and compare with the state of the art approaches on a large collection of more than 650,000 research papers in the last 16 years and the citation network enabled by CiteSeerX. The results clearly show that citations can help to understand topic evolution better.read more
Citations
More filters
Journal ArticleDOI
Topic discovery and evolution in scientific literature based on content and citations
Houkui Zhou,Huimin Yu,Roland Hu +2 more
TL;DR: This paper proposes a citation- content-latent Dirichlet allocation (LDA) topic discovery method that accounts for both document citation relations and the con-tent of the document itself via a probabilistic generative model and tests the algorithm on two online datasets to demonstrate that it effectively discovers important topics and reflects the topic evolution of important research themes.
Journal ArticleDOI
Discovering hierarchical topic evolution in time-stamped documents
TL;DR: Experimental results on two popular real‐world data sets verify that the proposed HTEM can capture coherent topics and discover their hierarchical evolutions and outperforms the baseline model in terms of likelihood on held‐out data.
Journal ArticleDOI
Understanding the evolution of multiple scientific research domains using a content and network approach
TL;DR: Experimental results on DBLP data related to IR, DB, and W3 domains showed that the W3 domain was getting closer to both IR andDB whereas the distance between IR and DB remained relatively constant.
ReportDOI
Tracking Topic Birth and Death in LDA
TL;DR: An algorithm to model the birth and death of topics within an LDA-like framework is proposed, which selects an initial number of topics, after which new topics are created and retired without further supervision.
Proceedings ArticleDOI
Topic sentiment trend model: Modeling facets and sentiment dynamics
TL;DR: A novel probabilistic model called topic sentiment trend model (TSTM) is proposed that can integrate the topic with sentiment, and analyze the temporal trend of the sentiment-topic.
References
More filters
Journal ArticleDOI
Latent dirichlet allocation
TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Proceedings Article
Latent Dirichlet Allocation
TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Journal ArticleDOI
Term Weighting Approaches in Automatic Text Retrieval
Gerard Salton,Chris Buckley +1 more
TL;DR: This paper summarizes the insights gained in automatic term weighting, and provides baseline single term indexing models with which other more elaborate content analysis procedures can be compared.
Journal ArticleDOI
Finding scientific topics
TL;DR: A generative model for documents is described, introduced by Blei, Ng, and Jordan, and a Markov chain Monte Carlo algorithm is presented for inference in this model, which is used to analyze abstracts from PNAS by using Bayesian model selection to establish the number of topics.
Journal ArticleDOI
Hierarchical Dirichlet Processes
TL;DR: This work considers problems involving groups of data where each observation within a group is a draw from a mixture model and where it is desirable to share mixture components between groups, and considers a hierarchical model, specifically one in which the base measure for the childDirichlet processes is itself distributed according to a Dirichlet process.