Latent dirichlet allocation

doi:10.5555/944919.944937

Open AccessJournal ArticleDOI

Latent dirichlet allocation

David M. Blei, +2 more

- 01 Mar 2003 -

Journal of Machine Learning Research

- Vol. 3, pp 993-1022

Chats0

TLDR

This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.

Abstract:

We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

Svetlana Lazebnik, +2 more

TL;DR: This paper presents a method for recognizing scene categories based on approximate global geometric correspondence that exceeds the state of the art on the Caltech-101 database and achieves high accuracy on a large database of fifteen natural scene categories.

...read moreread less

Book

Opinion Mining and Sentiment Analysis

Bo Pang, +1 more

TL;DR: This survey covers techniques and approaches that promise to directly enable opinion-oriented information-seeking systems and focuses on methods that seek to address the new challenges raised by sentiment-aware applications, as compared to those that are already present in more traditional fact-based analysis.

...read moreread less

Journal ArticleDOI

Data clustering: 50 years beyond K-means

Anil K. Jain

TL;DR: A brief overview of clustering is provided, well known clustering methods are summarized, the major challenges and key issues in designing clustering algorithms are discussed, and some of the emerging and useful research directions are pointed out.

...read moreread less

Journal ArticleDOI

Probabilistic topic models

David M. Blei

- 01 Apr 2012 -

Communications of The ACM

TL;DR: Surveying a suite of algorithms that offer a solution to managing large document archives suggests they are well-suited to handle large amounts of data.

...read moreread less

Book

Sentiment Analysis and Opinion Mining

Bing Liu

TL;DR: Sentiment analysis and opinion mining is the field of study that analyzes people's opinions, sentiments, evaluations, attitudes, and emotions from written language as discussed by the authors and is one of the most active research areas in natural language processing and is also widely studied in data mining, Web mining, and text mining.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Estimating a Dirichlet Distribution

Tom Minka

TL;DR: In this article, the Dirichlet distribution and its compound variant, Dirichletsmultinomial, are two of the most basic models for proportional data, such as the mix of vocabulary words in a text document, and the maximum likelihood estimate of these distributions is not available in closed-form.

...read moreread less

Proceedings Article

Probabilistic Models for Unified Collaborative and Content-Based Recommendation in Sparse-Data Environments

Alexandrin Popescul, +3 more

TL;DR: It is shown that secondary content information can often be used to overcome sparsity and appropriate mixture models incorporating secondary data produce significantly better quality recommenders than k-nearest neighbors (k-NN).

...read moreread less

Journal ArticleDOI

Approximate Bayesian Inference in Conditionally Independent Hierarchical Models (Parametric Empirical Bayes Models)

Robert E. Kass, +1 more

- 01 Sep 1989 -

Journal of the American Statistical Asso...

TL;DR: In this paper, conditionally independent hierarchical models of the kind used in parametric empirical Bayes (PEB) methodology are considered, where the observation vectors Yi for units i = 1, …, k are independently distributed with densities p(yi | θi, λ).

...read moreread less

Proceedings Article

Expectation-propagation for the generative aspect model

Tom Minka, +1 more

TL;DR: This paper showed that the simple variational methods of Blei et al. (2001) can lead to inaccurate inferences and biased learning for the generative aspect model and developed an alternative approach that leads to higher accuracy at comparable cost.

...read moreread less

BookDOI