scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Topic evolution based on the probabilistic topic model: a review

26 Sep 2017-Frontiers of Computer Science (Higher Education Press)-Vol. 11, Iss: 5, pp 786-802
TL;DR: This paper reviews notable research on topic evolution based on the probabilistic topic model from multiple aspects over the past decade and describes applications of the topic evolution model and attempts to summarize model generalization performance evaluation and topic evolution evaluation methods.
Abstract: Accurately representing the quantity and characteristics of users’ interest in certain topics is an important problem facing topic evolution researchers, particularly as it applies to modern online environments. Search engines can provide information retrieval for a specified topic from archived data, but fail to reflect changes in interest toward the topic over time in a structured way. This paper reviews notable research on topic evolution based on the probabilistic topic model from multiple aspects over the past decade. First, we introduce notations, terminology, and the basic topic model explored in the survey, then we summarize three categories of topic evolution based on the probabilistic topic model: the discrete time topic evolution model, the continuous time topic evolutionmodel, and the online topic evolution model. Next, we describe applications of the topic evolution model and attempt to summarize model generalization performance evaluation and topic evolution evaluation methods, as well as providing comparative experimental results for different models. To conclude the review, we pose some open questions and discuss possible future research directions.
Citations
More filters
01 Jun 2018
TL;DR: In this article, the authors discuss Artificial Intelligence; A.I. (Artificial intelligence) and Topic Modeling (Topic Modeling) using text mining, which is a topic model for text classification.
Abstract: 최근 인공지능(Artificial Intelligence; A.I.)의 기술 발전과 함께 이에 대한 관심이 증가하고 있으며 관련 시장도 비약적으로 확대되고 있다. 아직은 초기단계이지만 2000년 이후 현재까지 계속 확장되고 있는 인공지능 기술 분야의 연구방향과 투자 분야에 대한 불확실성을 줄이는 것이 중요한 시점이다. 이러한 기술 변화와 시대적 요구에 따라서 본 연구는 빅데이터(Big Data) 분석방법 중 텍스트 마이닝(Text Mining)과 토픽모델링(Topic Modeling)을 활용하여 기술동향을 살펴보고, 핵심기술과 성장 가능성이 있는 연구의 향후 방향성을 제시하였다. 본 연구의 결과로부터 인공지능의 기술동향에 대한 이해를 바탕으로 향후 연구 방향에 대한 새로운 시사점을 도출할 수 있으리라 기대한다.

173 citations

01 Jan 2007
TL;DR: In this article, a Markov Chain Monte Carlo (MCMC) sampling algorithm is used to estimate model parameters and demonstrate the method by applying it to the problem of modeling spatial brain activation patterns across multiple images collected via functional magnetic resonance imaging (fMRI).
Abstract: Data sets involving multiple groups with shared characteristics frequently arise in practice. In this paper we extend hierarchical Dirichlet processes to model such data. Each group is assumed to be generated from a template mixture model with group level variability in both the mixing proportions and the component parameters. Variabilities in mixing proportions across groups are handled using hierarchical Dirichlet processes, also allowing for automatic determination of the number of components. In addition, each group is allowed to have its own component parameters coming from a prior described by a template mixture model. This group-level variability in the component parameters is handled using a random effects model. We present a Markov Chain Monte Carlo (MCMC) sampling algorithm to estimate model parameters and demonstrate the method by applying it to the problem of modeling spatial brain activation patterns across multiple images collected via functional magnetic resonance imaging (fMRI).

41 citations

Journal ArticleDOI
TL;DR: This paper proposes a novel topic model, referred as weighted Conditional random field regularized Correlated Topic Model (CCTM), which leverages semantic correlations to discover meaningful topics and topic correlations.

31 citations

Journal ArticleDOI
TL;DR: The main contribution of this review is in its construction of an evolution map that can be used to visualize and integrate the extant studies on topic modeling specifically in regards to cross-media research.
Abstract: Rapid advancements in internet and social media technologies have made information overload a rampant and widespread problem. Complex subjects, histories, or issues break down into branches, side stories, and intertwining narratives; a topic evolution map can assist in joining together and clarifying these disparate parts of an unfamiliar territory. This paper reviews the extant research on topic evolution map based on text and cross-media corpora over the past decade. We first define a series of necessary terms, then go on to describe the traditional topic evolution map per 1) topic evolution over time, based on the probabilistic generative model, and 2) topic evolution from a non-probabilistic perspective. Next, we discuss the current state of research on topic evolution map based on the cross-media corpus, including some open questions and possible future research directions. The main contribution of this review is in its construction of an evolution map that can be used to visualize and integrate the extant studies on topic modeling specifically in regards to cross-media research.

23 citations

Journal ArticleDOI
TL;DR: A method called TopicNet is proposed that applies latent Dirichlet allocation to extract functional topics for a collection of genes regulated by a given TF, and a rewiring score is defined to quantify regulatory-network changes in terms of the topic changes for this TF.
Abstract: Motivation Recently, many chromatin immunoprecipitation sequencing experiments have been carried out for a diverse group of transcription factors (TFs) in many different types of human cells. These experiments manifest large-scale and dynamic changes in regulatory network connectivity (i.e. network 'rewiring'), highlighting the different regulatory programs operating in disparate cellular states. However, due to the dense and noisy nature of current regulatory networks, directly comparing the gains and losses of targets of key TFs across cell states is often not informative. Thus, here, we seek an abstracted, low-dimensional representation to understand the main features of network change. Results We propose a method called TopicNet that applies latent Dirichlet allocation to extract functional topics for a collection of genes regulated by a given TF. We then define a rewiring score to quantify regulatory-network changes in terms of the topic changes for this TF. Using this framework, we can pinpoint particular TFs that change greatly in network connectivity between different cellular states (such as observed in oncogenesis). Also, incorporating gene expression data, we define a topic activity score that measures the degree to which a given topic is active in a particular cellular state. And we show how activity differences can indicate differential survival in various cancers. Availability and implementation The TopicNet framework and related analysis were implemented using R and all codes are available at https://github.com/gersteinlab/topicnet. Supplementary information Supplementary data are available at Bioinformatics online.

22 citations

References
More filters
Journal ArticleDOI
TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Abstract: We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model.

30,570 citations

Journal ArticleDOI
TL;DR: A generative model for documents is described, introduced by Blei, Ng, and Jordan, and a Markov chain Monte Carlo algorithm is presented for inference in this model, which is used to analyze abstracts from PNAS by using Bayesian model selection to establish the number of topics.
Abstract: A first step in identifying the content of a document is determining which topics that document addresses. We describe a generative model for documents, introduced by Blei, Ng, and Jordan [Blei, D. M., Ng, A. Y. & Jordan, M. I. (2003) J. Machine Learn. Res. 3, 993-1022], in which each document is generated by choosing a distribution over topics and then choosing each word in the document from a topic selected according to this distribution. We then present a Markov chain Monte Carlo algorithm for inference in this model. We use this algorithm to analyze abstracts from PNAS by using Bayesian model selection to establish the number of topics. We show that the extracted topics capture meaningful structure in the data, consistent with the class designations provided by the authors of the articles, and outline further applications of this analysis, including identifying “hot topics” by examining temporal dynamics and tagging abstracts to illustrate semantic content.

5,680 citations


"Topic evolution based on the probab..." refers background in this paper

  • ...Griffiths and Steyvers [38] and Hall et al. [39], for example, applied topic evolution models to science paper mining — specifically, they applied the LDA model to extract topics and determine changes in the strength of each topic over time, but indeed neglect the auxiliary information....

    [...]

  • ...Griffiths and Steyvers [38] and Hall et al....

    [...]

  • ...[36,38], which consists of the titles and abstracts...

    [...]

Journal ArticleDOI
TL;DR: Surveying a suite of algorithms that offer a solution to managing large document archives suggests they are well-suited to handle large amounts of data.
Abstract: Probabilistic topic modeling provides a suite of tools for the unsupervised analysis of large collections of documents. Topic modeling algorithms can uncover the underlying themes of a collection and decompose its documents according to those themes. This analysis can be used for corpus exploration, document search, and a variety of prediction problems.In this tutorial, I will review the state-of-the-art in probabilistic topic models. I will describe the three components of topic modeling:(1) Topic modeling assumptions(2) Algorithms for computing with topic models(3) Applications of topic modelsIn (1), I will describe latent Dirichlet allocation (LDA), which is one of the simplest topic models, and then describe a variety of ways that we can build on it. These include dynamic topic models, correlated topic models, supervised topic models, author-topic models, bursty topic models, Bayesian nonparametric topic models, and others. I will also discuss some of the fundamental statistical ideas that are used in building topic models, such as distributions on the simplex, hierarchical Bayesian modeling, and models of mixed-membership.In (2), I will review how we compute with topic models. I will describe approximate posterior inference for directed graphical models using both sampling and variational inference, and I will discuss the practical issues and pitfalls in developing these algorithms for topic models. Finally, I will describe some of our most recent work on building algorithms that can scale to millions of documents and documents arriving in a stream.In (3), I will discuss applications of topic models. These include applications to images, music, social networks, and other data in which we hope to uncover hidden patterns. I will describe some of our recent work on adapting topic modeling algorithms to collaborative filtering, legislative modeling, and bibliometrics without citations.Finally, I will discuss some future directions and open research problems in topic models.

4,529 citations

Proceedings ArticleDOI
20 Jun 2005
TL;DR: This work proposes a novel approach to learn and recognize natural scene categories by representing the image of a scene by a collection of local regions, denoted as codewords obtained by unsupervised learning.
Abstract: We propose a novel approach to learn and recognize natural scene categories. Unlike previous work, it does not require experts to annotate the training set. We represent the image of a scene by a collection of local regions, denoted as codewords obtained by unsupervised learning. Each region is represented as part of a "theme". In previous work, such themes were learnt from hand-annotations of experts, while our method learns the theme distributions as well as the codewords distribution over the themes without supervision. We report satisfactory categorization performances on a large set of 13 categories of complex scenes.

3,920 citations


"Topic evolution based on the probab..." refers methods in this paper

  • ...Apart from modeling text documents, topic models have also been utilized to model images [30] — this characteristic can make topic models based on npTOT useful in future topic evolution research on multimedia corpora formed by text, speech, images, and videos metadata....

    [...]

Journal ArticleDOI
TL;DR: This work considers problems involving groups of data where each observation within a group is a draw from a mixture model and where it is desirable to share mixture components between groups, and considers a hierarchical model, specifically one in which the base measure for the childDirichlet processes is itself distributed according to a Dirichlet process.
Abstract: We consider problems involving groups of data where each observation within a group is a draw from a mixture model and where it is desirable to share mixture components between groups. We assume that the number of mixture components is unknown a priori and is to be inferred from the data. In this setting it is natural to consider sets of Dirichlet processes, one for each group, where the well-known clustering property of the Dirichlet process provides a nonparametric prior for the number of mixture components within each group. Given our desire to tie the mixture models in the various groups, we consider a hierarchical model, specifically one in which the base measure for the child Dirichlet processes is itself distributed according to a Dirichlet process. Such a base measure being discrete, the child Dirichlet processes necessarily share atoms. Thus, as desired, the mixture models in the different groups necessarily share mixture components. We discuss representations of hierarchical Dirichlet processes ...

3,755 citations