Topic
Latent Dirichlet allocation
About: Latent Dirichlet allocation is a research topic. Over the lifetime, 5351 publications have been published within this topic receiving 212555 citations. The topic is also known as: LDA.
Papers published on a yearly basis
Papers
More filters
••
TL;DR: This study proposes a feature grouping method based on the Latent Dirichlet Allocation (LDA) topic model for distinguishing effects from various online news topics and suggests that the proposed topic-sentiment synthesis forecasting models perform better than the older benchmark models.
128 citations
01 Jan 2007
TL;DR: In this article, collapsed variational Bayes and Gibbs sampling have been used for LDA, and showed that it is computationally efficient, easy to implement and significantly more accurate than standard variational bayesian inference.
Abstract: Latent Dirichlet allocation (LDA) is a Bayesian network that has recently gained much popularity in applications ranging from document modeling to computer vision Due to the large scale nature of these applications, current inference procedures like variational Bayes and Gibbs sampling have been found lacking In this paper we propose the collapsed variational Bayesian inference algorithm for LDA, and show that it is computationally efficient, easy to implement and significantly more accurate than standard variational Bayesian inference for LDA
127 citations
••
05 Jul 2008TL;DR: The dynamic hierarchical Dirichlet process (dHDP) is developed to model the time-evolving statistical properties of sequential data sets, and a relatively simple Markov Chain Monte Carlo sampler is developed.
Abstract: The dynamic hierarchical Dirichlet process (dHDP) is developed to model the time-evolving statistical properties of sequential data sets. The data collected at any time point are represented via a mixture associated with an appropriate underlying model, in the framework of HDP. The statistical properties of data collected at consecutive time points are linked via a random parameter that controls their probabilistic similarity. The sharing mechanisms of the time-evolving data are derived, and a relatively simple Markov Chain Monte Carlo sampler is developed. Experimental results are presented to demonstrate the model.
126 citations
25 Aug 2011
TL;DR: Gensim was created for large digital libraries, but its underlying algorithms for large-scale, distributed, online SVD and LDA are like the Swiss Army knife of data analysis---also useful on their own, outside of the domain of Natural Language Processing.
Abstract: \texttt{Gensim} is a pure Python library that fights on two
fronts: 1)~digital document indexing and similarity search; and
2)~fast, memory-efficient, scalable algorithms for Singular
Value Decomposition and Latent Dirichlet Allocation. The
connection between the two is unsupervised, semantic analysis
of plain text in digital collections. Gensim was created for
large digital libraries, but its underlying algorithms for
large-scale, distributed, online SVD and LDA are like the Swiss
Army knife of data analysis---also useful on their own, outside
of the domain of Natural Language Processing.
126 citations
•
12 Dec 2011TL;DR: It is shown that leveraging the structure from hierarchical labels improves out-of-sample label prediction substantially when compared to models that do not, and improved lower-dimensional representations of the bag- of-word data are also of interest.
Abstract: We introduce hierarchically supervised latent Dirichlet allocation (HSLDA), a model for hierarchically and multiply labeled bag-of-word data. Examples of such data include web pages and their placement in directories, product descriptions and associated categories from product hierarchies, and free-text clinical records and their assigned diagnosis codes. Out-of-sample label prediction is the primary goal of this work, but improved lower-dimensional representations of the bag-of-word data are also of interest. We demonstrate HSLDA on large-scale data from clinical document labeling and retail product categorization tasks. We show that leveraging the structure from hierarchical labels improves out-of-sample label prediction substantially when compared to models that do not.
126 citations