scispace - formally typeset
Search or ask a question
Topic

Latent Dirichlet allocation

About: Latent Dirichlet allocation is a research topic. Over the lifetime, 5351 publications have been published within this topic receiving 212555 citations. The topic is also known as: LDA.


Papers
More filters
Patent
19 Oct 2010
TL;DR: In this article, a topic model defining a set of topics is inferred by performing latent Dirichlet allocation (LDA) with an Indian Buffet Process (IBP) compound Dichlet prior probability distribution.
Abstract: In an inference system for organizing a corpus of objects, feature representations are generated comprising distributions over a set of features corresponding to the objects. A topic model defining a set of topics is inferred by performing latent Dirichlet allocation (LDA) with an Indian Buffet Process (IBP) compound Dirichlet prior probability distribution. The inference is performed using a collapsed Gibbs sampling algorithm by iteratively sampling (1) topic allocation variables of the LDA and (2) binary activation variables of the IBP compound Dirichlet prior. In some embodiments the inference is configured such that each inferred topic model is a clean topic model with topics defined as distributions over sub-sets of the set of features selected by the prior. In some embodiments the inference is configured such that the inferred topic model associates a focused sub-set of the set of topics to each object of the training corpus.

46 citations

Proceedings Article
16 Jun 2013
TL;DR: This work extends LDA by drawing topics from a Dirichlet process whose base distribution is a distribution over all strings rather than from a finiteDirichlet, and proposes heuristics to dynamically order, expand, and contract the set of words the authors consider in their vocabulary.
Abstract: Topic models based on latent Dirichlet allocation (LDA) assume a predefined vocabulary. This is reasonable in batch settings but not reasonable for streaming and online settings. To address this lacuna, we extend LDA by drawing topics from a Dirichlet process whose base distribution is a distribution over all strings rather than from a finite Dirichlet. We develop inference using online variational inference and -- to only consider a finite number of words for each topic -- propose heuristics to dynamically order, expand, and contract the set of words we consider in our vocabulary. We show our model can successfully incorporate new words and that it performs better than topic models with finite vocabularies in evaluations of topic quality and classification performance.

46 citations

Journal ArticleDOI
TL;DR: The proposed unsupervised framework provides an effective and efficient data mining solution to facilitating deep and comprehensive understanding on drivers’ behavioral characteristics, which will benefit the development of AVs and ADASs.

46 citations

Journal ArticleDOI
TL;DR: This work explores the application of probabilistic latent variable models to microbiome data, with a focus on Latent Dirichlet allocation, Non-negative matrix factorization, and Dynamic Unigram models and develops guidelines for when different methods are appropriate.
Abstract: The human microbiome is a complex ecological system, and describing its structure and function under different environmental conditions is important from both basic scientific and medical perspectives. Viewed through a biostatistical lens, many microbiome analysis goals can be formulated as latent variable modeling problems. However, although probabilistic latent variable models are a cornerstone of modern unsupervised learning, they are rarely applied in the context of microbiome data analysis, in spite of the evolutionary, temporal, and count structure that could be directly incorporated through such models. We explore the application of probabilistic latent variable models to microbiome data, with a focus on Latent Dirichlet allocation, Non-negative matrix factorization, and Dynamic Unigram models. To develop guidelines for when different methods are appropriate, we perform a simulation study. We further illustrate and compare these techniques using the data of Dethlefsen and Relman (2011, Incomplete recovery and individualized responses of the human distal gut microbiota to repeated antibiotic perturbation. Proceedings of the National Academy of Sciences108, 4554-4561), a study on the effects of antibiotics on bacterial community composition. Code and data for all simulations and case studies are available publicly.

46 citations

Journal ArticleDOI
TL;DR: Two supervised topic models for multi-label classification problems are developed that outperform the state-of-the-art approaches and extend Latent Dirichlet Allocation (LDA) via two observations, i.e., the frequencies of the labels and the dependencies among different labels.

45 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
86% related
Support vector machine
73.6K papers, 1.7M citations
86% related
Deep learning
79.8K papers, 2.1M citations
85% related
Feature extraction
111.8K papers, 2.1M citations
84% related
Convolutional neural network
74.7K papers, 2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023323
2022842
2021418
2020429
2019473
2018446