scispace - formally typeset
Search or ask a question
Topic

Latent Dirichlet allocation

About: Latent Dirichlet allocation is a research topic. Over the lifetime, 5351 publications have been published within this topic receiving 212555 citations. The topic is also known as: LDA.


Papers
More filters
Journal ArticleDOI
TL;DR: This study proposes a feature grouping method based on the Latent Dirichlet Allocation (LDA) topic model for distinguishing effects from various online news topics and suggests that the proposed topic-sentiment synthesis forecasting models perform better than the older benchmark models.

128 citations

01 Jan 2007
TL;DR: In this article, collapsed variational Bayes and Gibbs sampling have been used for LDA, and showed that it is computationally efficient, easy to implement and significantly more accurate than standard variational bayesian inference.
Abstract: Latent Dirichlet allocation (LDA) is a Bayesian network that has recently gained much popularity in applications ranging from document modeling to computer vision Due to the large scale nature of these applications, current inference procedures like variational Bayes and Gibbs sampling have been found lacking In this paper we propose the collapsed variational Bayesian inference algorithm for LDA, and show that it is computationally efficient, easy to implement and significantly more accurate than standard variational Bayesian inference for LDA

127 citations

Proceedings ArticleDOI
05 Jul 2008
TL;DR: The dynamic hierarchical Dirichlet process (dHDP) is developed to model the time-evolving statistical properties of sequential data sets, and a relatively simple Markov Chain Monte Carlo sampler is developed.
Abstract: The dynamic hierarchical Dirichlet process (dHDP) is developed to model the time-evolving statistical properties of sequential data sets. The data collected at any time point are represented via a mixture associated with an appropriate underlying model, in the framework of HDP. The statistical properties of data collected at consecutive time points are linked via a random parameter that controls their probabilistic similarity. The sharing mechanisms of the time-evolving data are derived, and a relatively simple Markov Chain Monte Carlo sampler is developed. Experimental results are presented to demonstrate the model.

126 citations

25 Aug 2011
TL;DR: Gensim was created for large digital libraries, but its underlying algorithms for large-scale, distributed, online SVD and LDA are like the Swiss Army knife of data analysis---also useful on their own, outside of the domain of Natural Language Processing.
Abstract: \texttt{Gensim} is a pure Python library that fights on two fronts: 1)~digital document indexing and similarity search; and 2)~fast, memory-efficient, scalable algorithms for Singular Value Decomposition and Latent Dirichlet Allocation. The connection between the two is unsupervised, semantic analysis of plain text in digital collections. Gensim was created for large digital libraries, but its underlying algorithms for large-scale, distributed, online SVD and LDA are like the Swiss Army knife of data analysis---also useful on their own, outside of the domain of Natural Language Processing.

126 citations

Proceedings Article
12 Dec 2011
TL;DR: It is shown that leveraging the structure from hierarchical labels improves out-of-sample label prediction substantially when compared to models that do not, and improved lower-dimensional representations of the bag- of-word data are also of interest.
Abstract: We introduce hierarchically supervised latent Dirichlet allocation (HSLDA), a model for hierarchically and multiply labeled bag-of-word data. Examples of such data include web pages and their placement in directories, product descriptions and associated categories from product hierarchies, and free-text clinical records and their assigned diagnosis codes. Out-of-sample label prediction is the primary goal of this work, but improved lower-dimensional representations of the bag-of-word data are also of interest. We demonstrate HSLDA on large-scale data from clinical document labeling and retail product categorization tasks. We show that leveraging the structure from hierarchical labels improves out-of-sample label prediction substantially when compared to models that do not.

126 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
86% related
Support vector machine
73.6K papers, 1.7M citations
86% related
Deep learning
79.8K papers, 2.1M citations
85% related
Feature extraction
111.8K papers, 2.1M citations
84% related
Convolutional neural network
74.7K papers, 2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023323
2022842
2021418
2020429
2019473
2018446