Showing papers on "Latent Dirichlet allocation published in 2003"

PDF

Open Access

Journal Article•DOI•

[...]

David M. Blei¹, Andrew Y. Ng², Michael I. Jordan¹•Institutions (2)

University of California, Berkeley¹, Stanford University²

01 Mar 2003-Journal of Machine Learning Research

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.

...read moreread less

Abstract: We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model.

...read moreread less

30,570 citations

Journal Article•DOI•

Matching words and pictures

[...]

Kobus Barnard¹, Pinar Duygulu², David Forsyth³, Nando de Freitas⁴, David M. Blei³, Michael I. Jordan³ - Show less +2 more•Institutions (4)

University of Arizona¹, Middle East Technical University², University of California, Berkeley³, University of British Columbia⁴

01 Mar 2003-Journal of Machine Learning Research

TL;DR: A new approach for modeling multi-modal data sets, focusing on the specific case of segmented images with associated text, is presented, and a number of models for the joint distribution of image regions and words are developed, including several which explicitly learn the correspondence between regions and Words.

...read moreread less

Abstract: We present a new approach for modeling multi-modal data sets, focusing on the specific case of segmented images with associated text. Learning the joint distribution of image regions and words has many applications. We consider in detail predicting words associated with whole images (auto-annotation) and corresponding to particular image regions (region naming). Auto-annotation might help organize and access large collections of images. Region naming is a model of object recognition as a process of translating image regions to words, much as one might translate from one language to another. Learning the relationships between image regions and semantic correlates (words) is an interesting example of multi-modal data mining, particularly because it is typically hard to apply data mining techniques to collections of images. We develop a number of models for the joint distribution of image regions and words, including several which explicitly learn the correspondence between regions and words. We study multi-modal and correspondence extensions to Hofmann's hierarchical clustering/aspect model, a translation model adapted from statistical machine translation (Brown et al.), and a multi-modal extension to mixture of latent Dirichlet allocation (MoM-LDA). All models are assessed using a large collection of annotated images of real scenes. We study in depth the difficult problem of measuring performance. For the annotation task, we look at prediction performance on held out data. We present three alternative measures, oriented toward different types of task. Measuring the performance of correspondence methods is harder, because one must determine whether a word has been placed on the right region of an image. We can use annotation performance as a proxy measure, but accurate measurement requires hand labeled data, and thus must occur on a smaller scale. We show results using both an annotation proxy, and manually labeled data.

...read moreread less

1,726 citations

Proceedings Article•DOI•

Modeling annotated data

[...]

David M. Blei¹, Michael I. Jordan¹•Institutions (1)

University of California, Berkeley¹

28 Jul 2003

TL;DR: Three hierarchical probabilistic mixture models which aim to describe annotated data with multiple types, culminating in correspondence latent Dirichlet allocation, a latent variable model that is effective at modeling the joint distribution of both types and the conditional distribution of the annotation given the primary type.

...read moreread less

Abstract: We consider the problem of modeling annotated data---data with multiple types where the instance of one type (such as a caption) serves as a description of the other type (such as an image). We describe three hierarchical probabilistic mixture models which aim to describe such data, culminating in correspondence latent Dirichlet allocation, a latent variable model that is effective at modeling the joint distribution of both types and the conditional distribution of the annotation given the primary type. We conduct experiments on the Corel database of images and captions, assessing performance in terms of held-out likelihood, automatic annotation, and text-based image retrieval.

...read moreread less

1,199 citations

Proceedings Article•

Hierarchical Topic Models and the Nested Chinese Restaurant Process

[...]

Thomas L. Griffiths¹, Michael I. Jordan², Joshua B. Tenenbaum¹, David M. Blei²•Institutions (2)

Massachusetts Institute of Technology¹, University of California, Berkeley²

09 Dec 2003

TL;DR: A Bayesian approach is taken to generate an appropriate prior via a distribution on partitions that allows arbitrarily large branching factors and readily accommodates growing data collections.

...read moreread less

Abstract: We address the problem of learning topic hierarchies from data. The model selection problem in this domain is daunting—which of the large collection of possible trees to use? We take a Bayesian approach, generating an appropriate prior via a distribution on partitions that we refer to as the nested Chinese restaurant process. This nonparametric prior allows arbitrarily large branching factors and readily accommodates growing data collections. We build a hierarchical topic model by combining this prior with a likelihood that is based on a hierarchical variant of latent Dirichlet allocation. We illustrate our approach on simulated data and with an application to the modeling of NIPS abstracts.

...read moreread less

1,055 citations

Proceedings Article•DOI•

On an equivalence between PLSI and LDA

[...]

Mark Girolami¹, Ata Kaban²•Institutions (2)

University of the West of Scotland¹, University of Birmingham²

28 Jul 2003

TL;DR: PLSI is a maximum a posteriori estimated LDA model under a uniform Dirichlet prior, therefore the perceived shortcomings of PLSI can be resolved and elucidated within the LDA framework.

...read moreread less

Abstract: Latent Dirichlet Allocation (LDA) is a fully generative approach to language modelling which overcomes the inconsistent generative semantics of Probabilistic Latent Semantic Indexing (PLSI). This paper shows that PLSI is a maximum a posteriori estimated LDA model under a uniform Dirichlet prior, therefore the perceived shortcomings of PLSI can be resolved and elucidated within the LDA framework.

...read moreread less

230 citations

Journal Article•DOI•

Implications of the Dirichlet Assumption for Discretization of Continuous Variables in Naive Bayesian Classifiers

[...]

Chun-Nan Hsu¹, Hung-Ju Huang², Tzu-Tsung Wong³•Institutions (3)

Academia Sinica¹, National Chiao Tung University², National Cheng Kung University³

01 Dec 2003-Machine Learning

TL;DR: The Dirichlet assumption implies that lazy methods can perform as well as eager discretization methods, and the lazy method is extended to classify set-valued and multi-interval data with a naive Bayesian classifier.

...read moreread less

Abstract: In a naive Bayesian classifier, discrete variables as well as discretized continuous variables are assumed to have Dirichlet priors. This paper describes the implications and applications of this model selection choice. We start by reviewing key properties of Dirichlet distributions. Among these properties, the most important one is “perfect aggregation,” which allows us to explain why discretization works for a naive Bayesian classifier. Since perfect aggregation holds for Dirichlets, we can explain that in general, discretization can outperform parameter estimation assuming a normal distribution. In addition, we can explain why a wide variety of well-known discretization methods, such as entropy-based, ten-bin, and bin-log l, can perform well with insignificant difference. We designed experiments to verify our explanation using synthesized and real data sets and showed that in addition to well-known methods, a wide variety of discretization methods all perform similarly. Our analysis leads to a lazy discretization method, which discretizes continuous variables according to test data. The Dirichlet assumption implies that lazy methods can perform as well as eager discretization methods. We empirically confirmed this implication and extended the lazy method to classify set-valued and multi-interval data with a naive Bayesian classifier.

...read moreread less

42 citations

Proceedings Article•

Analysis of Local or Asymmetric Dependencies in Contingency Tables using the Imprecise Dirichlet Model.

[...]

Jean-Marc Bernard

01 Jan 2003

TL;DR: This work considers the statistical problem of analyzing the association between two categorical variables from cross-classified data and proposes measures which enable one to study the dependencies at a local level and to assess whether the data support some more or less strong association model.

...read moreread less

Abstract: We consider the statistical problem of analyzing the association between two categorical variables from cross-classified data. The focus is put on measures which enable one to study the dependencies at a local level and to assess whether the data support some more or less strong association model. Statistical inference is envisaged using an imprecise Dirichlet model.

...read moreread less

30 citations

Journal Article•DOI•

Bayesian inference and model selection in latent class logit models with parameter constraints: An application to market segmentation

[...]

Man Suk Oh, Jung Whan Choi, Dai Gyoung Kim¹•Institutions (1)

Hanyang University¹

01 Feb 2003-Journal of Applied Statistics

TL;DR: A latent class logit model with parameter constraints is considered and a method for determining an appropriate number of the latent classes within a Bayesian framework is proposed.

...read moreread less

Abstract: Latent class models have recently drawn considerable attention among many researchers and practitioners as a class of useful tools for capturing heterogeneity across different segments in a target market or population. In this paper, we consider a latent class logit model with parameter constraints and deal with two important issues in the latent class models--parameter estimation and selection of an appropriate number of classes--within a Bayesian framework. A simple Gibbs sampling algorithm is proposed for sample generation from the posterior distribution of unknown parameters. Using the Gibbs output, we propose a method for determining an appropriate number of the latent classes. A real-world marketing example as an application for market segmentation is provided to illustrate the proposed method.

...read moreread less

12 citations

Journal Article•DOI•

An extension of the Dirichlet prior for the analysis of longitudinal multinomial data

[...]

Paul Gustafson¹, Lawrence J. Walker¹•Institutions (1)

University of British Columbia¹

01 Apr 2003-Journal of Applied Statistics

TL;DR: In this article, a prior distribution for multinomial parameters is constructed by modifying the prior that posits independent Dirichlet distributions for the multi-parameter parameters across time.

...read moreread less

Abstract: Studies producing longitudinal multinomial data arise in several subject areas. This article suggests a Bayesian approach to the analysis of such data. Rather than infusing a latent model structure, we develop a prior distribution for the multinomial parameters which reflects the longitudinal nature of the observations. This distribution is constructed by modifying the prior that posits independent Dirichlet distributions for the multinomial parameters across time. Posterior analysis, which is implemented using Monte Carlo methods, can then be used to assess the temporal behaviour of the multinomial parameters underlying the observed data. We test this methodology on simulated data, opinion polling data, and data from a study concerning the development of moral reasoning.

...read moreread less

6 citations

Proceedings Article•DOI•

Representation method for a set of documents from the viewpoint of Bayesian statistics

[...]

M. Goto, T. Ishida, Shigeichi Hirasawa

17 Nov 2003

TL;DR: The Bayes optimal solutions for estimation of parameters and selection of the dimension of the hidden latent class in these models and analyze it's asymptotic properties are formulated.

...read moreread less

Abstract: In this paper, we consider the Bayesian approach for representation of a set of documents. In the field of representation of a set of documents, many previous models, such as the latent semantic analysis (LSA), the probabilistic latent semantic analysis (PLSA), the semantic aggregate model (SAM), the Bayesian latent semantic analysis (BLSA), and so on, were proposed. In this paper, we formulate the Bayes optimal solutions for estimation of parameters and selection of the dimension of the hidden latent class in these models and analyze it's asymptotic properties.

...read moreread less

5 citations

Journal Article•DOI•

On the Posterior Consistency of Mixtures of Dirichlet Process Priors with Censored Data

[...]

Yongdai Kim¹•Institutions (1)

Ewha Womans University¹

01 Sep 2003-Scandinavian Journal of Statistics

TL;DR: Large sample properties of the posterior distribution with a mixture of Dirichlet process priors are studied and it is shown that the posterior Distribution of the survival function is consistent with right censored data.

...read moreread less

Abstract: Mixtures of Dirichlet process priors offer a reasonable compromise between purely parametric and purely non-parametric models, and are popularly used in survival analysis and for testing problems with non-parametric alternatives. In this paper, we study large sample properties of the posterior distribution with a mixture of Dirichlet process priors. We show that the posterior distribution of the survival function is consistent with right censored data.

...read moreread less

Learning hierarchical latent class models

[...]

Nevin L. Zhang, Tomas Kocka, Gytis Karciauskas, Finn Verner Jensen

01 Jan 2003

TL;DR: An algorithm for learning HLC models is developed and the feasibility of learning H LC models that are large enough to be of practical interest is demonstrated.

...read moreread less

Abstract: Hierarchical latent class (HLC) models generalize latent class models. As models for cluster analysis, they suit more applications than the latter because they relax the often untrue conditional independence assumption. They also facilitate the discovery of latent causal structures and the induction of probabilistic models that capture complex dependencies and yet have low inferential complexity. In this paper, we investigate the problem of inducing HLC models from data. Two fundamental issues of general latent structure discovery are identified and methods to address those issues for HLC models are proposed. Based on the proposals, we develop an algorithm for learning HLC models and demonstrate the feasibility of learning HLC models that are large enough to be of practical interest.

...read moreread less

Book Chapter•DOI•

Overview of Approaches to Semantic Augmentation of Multimedia Databases for Efficient Access and Content Retrieval

[...]

Serhiy Kosinov¹, Stéphane Marchand-Maillet¹•Institutions (1)

University of Geneva¹

15 Sep 2003

TL;DR: In this paper, the authors present a distance-based discriminant analysis (DDA) method that defines the design of a basic building block classifier for distinguishing among a selected number of semantic categories and demonstrate how a set of DDA classifiers can be grouped into a hierarchical ensemble for prediction of an arbitrary set of semantic classes.

...read moreread less

Abstract: Ever-increasing amount of multimedia available online necessitates the development of new techniques and methods that can overcome the semantic gap problem. The said problem, encountered due to major disparities between inherent representational characteristics of multimedia and its semantic content sought by the user, has been a prominent research direction addressed by a great number of semantic augmentation approaches originating from such areas as machine learning, statistics, natural language processing, etc. In this paper, we review several of these recently developed techniques that bring together low-level representation of multimedia and its semantics in order to improve the efficiency of access and retrieval. We also present a distance-based discriminant analysis (DDA) method that defines the design of a basic building block classifier for distinguishing among a selected number of semantic categories. In addition to that, we demonstrate how a set of DDA classifiers can be grouped into a hierarchical ensemble for prediction of an arbitrary set of semantic classes.

...read moreread less

Bayesian Inference Using an Extension of the Dirichlet Process

[...]

Nils Lid Hjort, Andrea Ongaro

01 Jan 2003