Topic discovery and evolution in scientific literature based on content and citations

doi:10.1631/FITEE.1601125

Journal Article•DOI•

Topic discovery and evolution in scientific literature based on content and citations

Houkui Zhou¹, Huimin Yu¹, Roland Hu¹•Institutions (1)

01 Oct 2017-Journal of Zhejiang University Science C (Zhejiang University Press)-Vol. 18, Iss: 10, pp 1511-1524

TL;DR: This paper proposes a citation- content-latent Dirichlet allocation (LDA) topic discovery method that accounts for both document citation relations and the con-tent of the document itself via a probabilistic generative model and tests the algorithm on two online datasets to demonstrate that it effectively discovers important topics and reflects the topic evolution of important research themes.

read less

Abstract: Researchers across the globe have been increasingly interested in the manner in which important research topics evolve over time within the corpus of scientific literature. In a dataset of scientific articles, each document can be considered to comprise both the words of the document itself and its citations of other documents. In this paper, we propose a citation- content-latent Dirichlet allocation (LDA) topic discovery method that accounts for both document citation relations and the con-tent of the document itself via a probabilistic generative model. The citation-content-LDA topic model exploits a two-level topic model that includes the citation information for ‘father’ topics and text information for sub-topics. The model parameters are estimated by a collapsed Gibbs sampling algorithm. We also propose a topic evolution algorithm that runs in two steps: topic segmentation and topic dependency relation calculation. We have tested the proposed citation-content-LDA model and topic evolution algorithm on two online datasets, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) and IEEE Computer Society (CS), to demonstrate that our algorithm effectively discovers important topics and reflects the topic evolution of important research themes. According to our evaluation metrics, citation-content-LDA outperforms both content-LDA and citation-LDA.

...read moreread less

Topic discovery and evolution in scientific literature based on content and citations

Citations

Cites background from "Topic discovery and evolution in sc..."

Cites background from "Topic discovery and evolution in sc..."

Cites methods from "Topic discovery and evolution in sc..."

References

"Topic discovery and evolution in sc..." refers methods in this paper

"Topic discovery and evolution in sc..." refers methods in this paper

"Topic discovery and evolution in sc..." refers methods in this paper

Related Papers (5)