scispace - formally typeset
Proceedings ArticleDOI

Detecting topic evolution in scientific literature: how can citations help?

Reads0
Chats0
TLDR
An iterative topic evolution learning framework is proposed by adapting the Latent Dirichlet Allocation model to the citation network and develop a novel inheritance topic model, which clearly shows that citations can help to understand topic evolution better.
Abstract
Understanding how topics in scientific literature evolve is an interesting and important problem. Previous work simply models each paper as a bag of words and also considers the impact of authors. However, the impact of one document on another as captured by citations, one important inherent element in scientific literature, has not been considered. In this paper, we address the problem of understanding topic evolution by leveraging citations, and develop citation-aware approaches. We propose an iterative topic evolution learning framework by adapting the Latent Dirichlet Allocation model to the citation network and develop a novel inheritance topic model. We evaluate the effectiveness and efficiency of our approaches and compare with the state of the art approaches on a large collection of more than 650,000 research papers in the last 16 years and the citation network enabled by CiteSeerX. The results clearly show that citations can help to understand topic evolution better.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Extraction of topic evolutions from references in scientific articles and its GPU acceleration

TL;DR: This paper provides a topic model for extracting topic evolutions as a corpus-wide transition matrix among latent topics, and compares the effectiveness of TERESA with that of LDA by introducing a new measure called diversity plus focusedness (D+F).
Proceedings Article

D-VITA: A Visual Interactive Text Analysis System Using Dynamic Topic Mining.

TL;DR: D-VITA is presented, an interactive text analysis system that exploits dynamic topic mining to detect the latent topic structure and topic dynamics in a collection of documents and supports end-users in understanding and exploiting the topic mining results.
Journal ArticleDOI

Tracking Knowledge Evolution Based on the Terminology Dynamics in 4P-Medicine.

TL;DR: The purpose of the article is to identify trends in the development of terminology in 4P-medicine by building a collection of terms especially extracted from the PubMed database and proposed special linguistic constructs such as megatokens for combining cross-lingual terms into a common semantic field.
Journal ArticleDOI

Measuring the innovation of method knowledge elements in scientific literature

TL;DR: The proposed approach can measure the innovation of MKEs in scientific literature effectively and is useful for both reviewers and funding agencies to assess the quality of academic papers.
DissertationDOI

An Integrated Framework for Patent Analysis and Mining

Longhui Zhang
TL;DR: Three interleaved aspects of patent mining techniques are delve into, including PatSearch, a framework of automatically generating the search query from a given patent application and retrieving relevant patents to user; and PatCom, an framework of investigating the relationship in terms of commonality and difference between patent documents pairs.
References
More filters
Journal ArticleDOI

Latent dirichlet allocation

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Proceedings Article

Latent Dirichlet Allocation

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Journal ArticleDOI

Term Weighting Approaches in Automatic Text Retrieval

TL;DR: This paper summarizes the insights gained in automatic term weighting, and provides baseline single term indexing models with which other more elaborate content analysis procedures can be compared.
Journal ArticleDOI

Finding scientific topics

TL;DR: A generative model for documents is described, introduced by Blei, Ng, and Jordan, and a Markov chain Monte Carlo algorithm is presented for inference in this model, which is used to analyze abstracts from PNAS by using Bayesian model selection to establish the number of topics.
Journal ArticleDOI

Hierarchical Dirichlet Processes

TL;DR: This work considers problems involving groups of data where each observation within a group is a draw from a mixture model and where it is desirable to share mixture components between groups, and considers a hierarchical model, specifically one in which the base measure for the childDirichlet processes is itself distributed according to a Dirichlet process.
Related Papers (5)