scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Bayesian Density Estimation and Inference Using Mixtures

01 Jun 1995-Journal of the American Statistical Association (Taylor & Francis Group)-Vol. 90, Iss: 430, pp 577-588
TL;DR: In this article, the authors describe and illustrate Bayesian inference in models for density estimation using mixtures of Dirichlet processes and show convergence results for a general class of normal mixture models.
Abstract: We describe and illustrate Bayesian inference in models for density estimation using mixtures of Dirichlet processes. These models provide natural settings for density estimation and are exemplified by special cases where data are modeled as a sample from mixtures of normal distributions. Efficient simulation methods are used to approximate various prior, posterior, and predictive distributions. This allows for direct inference on a variety of practical issues, including problems of local versus global smoothing, uncertainty about density estimates, assessment of modality, and the inference on the numbers of components. Also, convergence results are established for a general class of normal mixture models.
Citations
More filters
Journal ArticleDOI
TL;DR: This work reviews a general methodology for model-based clustering that provides a principled statistical approach to important practical questions that arise in cluster analysis, such as how many clusters are there, which clustering method should be used, and how should outliers be handled.
Abstract: Cluster analysis is the automated search for groups of related observations in a dataset. Most clustering done in practice is based largely on heuristic but intuitively reasonable procedures, and most clustering methods available in commercial software are also of this type. However, there is little systematic guidance associated with these methods for solving important practical questions that arise in cluster analysis, such as how many clusters are there, which clustering method should be used, and how should outliers be handled. We review a general methodology for model-based clustering that provides a principled statistical approach to these issues. We also show that this can be useful for other problems in multivariate analysis, such as discriminant analysis and multivariate density estimation. We give examples from medical diagnosis, minefield detection, cluster recovery from noisy data, and spatial density estimation. Finally, we mention limitations of the methodology and discuss recent development...

4,123 citations


Cites methods from "Bayesian Density Estimation and Inf..."

  • ...Recently, Yeung, Fraley, Murua, Raftery, and Ruzzo (2001) applied the model-based method of Section 5....

    [...]

  • ...This approach was proposed for onedimensional density estimation by Escobar and West (1995) and MacEachern and Müller (1998), and extended to the multivariate case by Müller, Erkanli, and West (1996)....

    [...]

Journal ArticleDOI
TL;DR: This work considers problems involving groups of data where each observation within a group is a draw from a mixture model and where it is desirable to share mixture components between groups, and considers a hierarchical model, specifically one in which the base measure for the childDirichlet processes is itself distributed according to a Dirichlet process.
Abstract: We consider problems involving groups of data where each observation within a group is a draw from a mixture model and where it is desirable to share mixture components between groups. We assume that the number of mixture components is unknown a priori and is to be inferred from the data. In this setting it is natural to consider sets of Dirichlet processes, one for each group, where the well-known clustering property of the Dirichlet process provides a nonparametric prior for the number of mixture components within each group. Given our desire to tie the mixture models in the various groups, we consider a hierarchical model, specifically one in which the base measure for the child Dirichlet processes is itself distributed according to a Dirichlet process. Such a base measure being discrete, the child Dirichlet processes necessarily share atoms. Thus, as desired, the mixture models in the different groups necessarily share mixture components. We discuss representations of hierarchical Dirichlet processes ...

3,755 citations


Cites background or methods from "Bayesian Density Estimation and Inf..."

  • ...In the case of a single mixture model (J = 1), Escobar and West (1995) proposed a gamma prior and derived an auxiliary variable update for α0, and Rasmussen (2000) observed that (A.1) is log-concave in log(α0) and proposed using adaptive rejection sampling instead....

    [...]

  • ...(A.7) Again, other variables are independent of γ given m·· and K, hence we may apply the techniques of Escobar and West (1995) or Rasmussen (2000) to sampling γ ....

    [...]

  • ...The auxiliary variable method of Escobar and West (1995) requires a slight modification for the case where J > 1....

    [...]

  • ...A number of authors have studied such DP mixture models (Antoniak 1974; Escobar and West 1995; MacEachern and Müller 1998)....

    [...]

  • ...Our work is based on a tool from nonparametric Bayesian analysis known as the Dirichlet process (DP) mixture model [2, 3]....

    [...]

Journal ArticleDOI
TL;DR: This purpose of this introductory paper is to introduce the Monte Carlo method with emphasis on probabilistic machine learning and review the main building blocks of modern Markov chain Monte Carlo simulation.
Abstract: This purpose of this introductory paper is threefold. First, it introduces the Monte Carlo method with emphasis on probabilistic machine learning. Second, it reviews the main building blocks of modern Markov chain Monte Carlo simulation, thereby providing and introduction to the remaining papers of this special issue. Lastly, it discusses new interesting research horizons.

2,579 citations


Cites methods from "Bayesian Density Estimation and Inf..."

  • ...For simplicity, we avoid the treatment of nonparametric model averaging techniques; see for example (Escobar & West, 1995; Green & Richardson, 2000)....

    [...]

Journal ArticleDOI
TL;DR: In this article, Markov chain methods for sampling from the posterior distribution of a Dirichlet process mixture model are presented, and two new classes of methods are presented. But neither of these methods is suitable for handling general models with non-conjugate priors.
Abstract: This article reviews Markov chain methods for sampling from the posterior distribution of a Dirichlet process mixture model and presents two new classes of methods. One new approach is to make Metropolis—Hastings updates of the indicators specifying which mixture component is associated with each observation, perhaps supplemented with a partial form of Gibbs sampling. The other new approach extends Gibbs sampling for these indicators by using a set of auxiliary parameters. These methods are simple to implement and are more efficient than previous ways of handling general Dirichlet process mixture models with non-conjugate priors.

2,320 citations

Journal ArticleDOI
TL;DR: Stochastic variational inference lets us apply complex Bayesian models to massive data sets, and it is shown that the Bayesian nonparametric topic model outperforms its parametric counterpart.
Abstract: We develop stochastic variational inference, a scalable algorithm for approximating posterior distributions. We develop this technique for a large class of probabilistic models and we demonstrate it with two probabilistic topic models, latent Dirichlet allocation and the hierarchical Dirichlet process topic model. Using stochastic variational inference, we analyze several large collections of documents: 300K articles from Nature, 1.8M articles from The New York Times, and 3.8M articles from Wikipedia. Stochastic inference can easily handle data sets of this size and outperforms traditional variational inference, which can only handle a smaller subset. (We also show that the Bayesian nonparametric topic model outperforms its parametric counterpart.) Stochastic variational inference lets us apply complex Bayesian models to massive data sets.

2,291 citations


Cites background or methods from "Bayesian Density Estimation and Inf..."

  • ...In traditional mean-field variational inference, we optimize Equation 8 with coordinate ascent....

    [...]

  • ...…2006; Salakhutdinov and Mnih, 2008; Paisley and Carin, 2009; Hoffman et al., 2010b), certain Bayesian nonparametric mixture models (Antoniak, 1974; Escobar and West, 1995; Teh et al., 2006a), and others.2 Analyzing data with one of these models amounts to computingthe posterior distribution of…...

    [...]

  • ...Approximate posterior inference for BNP models in general is an active field of research (Escobar and West, 1995; Neal, 2000; Blei and Jordan, 2006; Teh et al., 2007)....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: In this article, a modified Monte Carlo integration over configuration space is used to investigate the properties of a two-dimensional rigid-sphere system with a set of interacting individual molecules, and the results are compared to free volume equations of state and a four-term virial coefficient expansion.
Abstract: A general method, suitable for fast computing machines, for investigating such properties as equations of state for substances consisting of interacting individual molecules is described. The method consists of a modified Monte Carlo integration over configuration space. Results for the two‐dimensional rigid‐sphere system have been obtained on the Los Alamos MANIAC and are presented here. These results are compared to the free volume equation of state and to a four‐term virial coefficient expansion.

35,161 citations


"Bayesian Density Estimation and Inf..." refers methods in this paper

  • ...Some of the earlier references on Markov Chain Monte Carlo methods include work of Geman and Geman ( 1984), Hasting ( l970), Metropolis et al. ( 1953), and Tanner and Wong ( 1987)....

    [...]

BookDOI
01 Jan 1986
TL;DR: The Kernel Method for Multivariate Data: Three Important Methods and Density Estimation in Action.
Abstract: Introduction. Survey of Existing Methods. The Kernel Method for Univariate Data. The Kernel Method for Multivariate Data. Three Important Methods. Density Estimation in Action.

15,499 citations


"Bayesian Density Estimation and Inf..." refers background or methods in this paper

  • ...…based on mixtures of standard components, such as normal mixtures, underly mainstream approaches to density estimation, including kernel techniques (Silverman 1986), nonparametric maximum likelihood (Lindsay 1983), and Bayesian approaches using mixtures of Dirichlet processes (Ferguson 1983)....

    [...]

  • ...…is the Student-t distribution with s degrees of freedom, mode m, and scale factor ~ ' 1 ~ and M = ( 1 + 7)S/s. Equivalently, using the reduced form (3), we have As discussed by Ferguson ( 1983), there are strong relationships between ( 4 )and standard kernel density estimates (Silverman 1986)....

    [...]

Journal ArticleDOI
TL;DR: A generalization of the sampling method introduced by Metropolis et al. as mentioned in this paper is presented along with an exposition of the relevant theory, techniques of application and methods and difficulties of assessing the error in Monte Carlo estimates.
Abstract: SUMMARY A generalization of the sampling method introduced by Metropolis et al. (1953) is presented along with an exposition of the relevant theory, techniques of application and methods and difficulties of assessing the error in Monte Carlo estimates. Examples of the methods, including the generation of random orthogonal matrices and potential applications of the methods to numerical problems arising in statistics, are discussed. For numerical problems in a large number of dimensions, Monte Carlo methods are often more efficient than conventional numerical methods. However, implementation of the Monte Carlo methods requires sampling from high dimensional probability distributions and this may be very difficult and expensive in analysis and computer time. General methods for sampling from, or estimating expectations with respect to, such distributions are as follows. (i) If possible, factorize the distribution into the product of one-dimensional conditional distributions from which samples may be obtained. (ii) Use importance sampling, which may also be used for variance reduction. That is, in order to evaluate the integral J = X) p(x)dx = Ev(f), where p(x) is a probability density function, instead of obtaining independent samples XI, ..., Xv from p(x) and using the estimate J, = Zf(xi)/N, we instead obtain the sample from a distribution with density q(x) and use the estimate J2 = Y{f(xj)p(x1)}/{q(xj)N}. This may be advantageous if it is easier to sample from q(x) thanp(x), but it is a difficult method to use in a large number of dimensions, since the values of the weights w(xi) = p(x1)/q(xj) for reasonable values of N may all be extremely small, or a few may be extremely large. In estimating the probability of an event A, however, these difficulties may not be as serious since the only values of w(x) which are important are those for which x -A. Since the methods proposed by Trotter & Tukey (1956) for the estimation of conditional expectations require the use of importance sampling, the same difficulties may be encountered in their use. (iii) Use a simulation technique; that is, if it is difficult to sample directly from p(x) or if p(x) is unknown, sample from some distribution q(y) and obtain the sample x values as some function of the corresponding y values. If we want samples from the conditional dis

14,965 citations