scispace - formally typeset
Topic

Dynamic topic model

About: Dynamic topic model is a(n) research topic. Over the lifetime, 480 publication(s) have been published within this topic receiving 94697 citation(s).

...read more

Papers
More filters

Journal ArticleDOI
TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.

...read more

Abstract: We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model.

...read more

27,392 citations


Proceedings Article
03 Jan 2001-
Abstract: We propose a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams [6], and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI) [3]. In the context of text modeling, our model posits that each document is generated as a mixture of topics, where the continuous-valued mixture proportions are distributed as a latent Dirichlet random variable. Inference and learning are carried out efficiently via variational algorithms. We present empirical results on applications of this model to problems in text modeling, collaborative filtering, and text classification.

...read more

25,546 citations


Journal ArticleDOI
Thomas L. Griffiths1, Mark Steyvers2Institutions (2)
TL;DR: A generative model for documents is described, introduced by Blei, Ng, and Jordan, and a Markov chain Monte Carlo algorithm is presented for inference in this model, which is used to analyze abstracts from PNAS by using Bayesian model selection to establish the number of topics.

...read more

Abstract: A first step in identifying the content of a document is determining which topics that document addresses. We describe a generative model for documents, introduced by Blei, Ng, and Jordan [Blei, D. M., Ng, A. Y. & Jordan, M. I. (2003) J. Machine Learn. Res. 3, 993-1022], in which each document is generated by choosing a distribution over topics and then choosing each word in the document from a topic selected according to this distribution. We then present a Markov chain Monte Carlo algorithm for inference in this model. We use this algorithm to analyze abstracts from PNAS by using Bayesian model selection to establish the number of topics. We show that the extracted topics capture meaningful structure in the data, consistent with the class designations provided by the authors of the articles, and outline further applications of this analysis, including identifying “hot topics” by examining temporal dynamics and tagging abstracts to illustrate semantic content.

...read more

5,165 citations


Journal ArticleDOI
Thomas Hofmann1Institutions (1)
01 Aug 1999-
TL;DR: Probabilistic Latent Semantic Indexing is a novel approach to automated document indexing which is based on a statistical latent class model for factor analysis of count data.

...read more

Abstract: Probabilistic Latent Semantic Indexing is a novel approach to automated document indexing which is based on a statistical latent class model for factor analysis of count data. Fitted from a training corpus of text documents by a generalization of the Expectation Maximization algorithm, the utilized model is able to deal with domain{specific synonymy as well as with polysemous words. In contrast to standard Latent Semantic Indexing (LSI) by Singular Value Decomposition, the probabilistic variant has a solid statistical foundation and defines a proper generative data model. Retrieval experiments on a number of test collections indicate substantial performance gains over direct term matching methods as well as over LSI. In particular, the combination of models with different dimensionalities has proven to be advantageous.

...read more

4,451 citations


Journal ArticleDOI
David M. Blei1Institutions (1)
TL;DR: Surveying a suite of algorithms that offer a solution to managing large document archives suggests they are well-suited to handle large amounts of data.

...read more

Abstract: Probabilistic topic modeling provides a suite of tools for the unsupervised analysis of large collections of documents. Topic modeling algorithms can uncover the underlying themes of a collection and decompose its documents according to those themes. This analysis can be used for corpus exploration, document search, and a variety of prediction problems.In this tutorial, I will review the state-of-the-art in probabilistic topic models. I will describe the three components of topic modeling:(1) Topic modeling assumptions(2) Algorithms for computing with topic models(3) Applications of topic modelsIn (1), I will describe latent Dirichlet allocation (LDA), which is one of the simplest topic models, and then describe a variety of ways that we can build on it. These include dynamic topic models, correlated topic models, supervised topic models, author-topic models, bursty topic models, Bayesian nonparametric topic models, and others. I will also discuss some of the fundamental statistical ideas that are used in building topic models, such as distributions on the simplex, hierarchical Bayesian modeling, and models of mixed-membership.In (2), I will review how we compute with topic models. I will describe approximate posterior inference for directed graphical models using both sampling and variational inference, and I will discuss the practical issues and pitfalls in developing these algorithms for topic models. Finally, I will describe some of our most recent work on building algorithms that can scale to millions of documents and documents arriving in a stream.In (3), I will discuss applications of topic models. These include applications to images, music, social networks, and other data in which we hope to uncover hidden patterns. I will describe some of our recent work on adapting topic modeling algorithms to collaborative filtering, legislative modeling, and bibliometrics without citations.Finally, I will discuss some future directions and open research problems in topic models.

...read more

3,979 citations


Network Information
Related Topics (5)
Question answering

14K papers, 375.4K citations

83% related
Bitext word alignment

9 papers, 853 citations

82% related
Latent Dirichlet allocation

5.3K papers, 212.5K citations

82% related
Semi-supervised learning

12.1K papers, 611.2K citations

81% related
Constrained clustering

5.3K papers, 213.2K citations

81% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20217
202012
201911
201822
201724
201656

Top Attributes

Show by:

Topic's top 5 most impactful authors

David M. Blei

13 papers, 59.1K citations

Koji Eguchi

5 papers, 50 citations

John Lafferty

4 papers, 4.6K citations

Hiroshi Nakagawa

4 papers, 110 citations

Haizhou Li

4 papers, 8 citations