scispace - formally typeset
Search or ask a question
Posted Content

Bayesian finite mixtures: a note on prior specification and posterior computation

TL;DR: In this paper, the posterior distribution of the number k of components in a finite mixture of normals is computed by using a Poisson distribution as the prior for k. Two aspects of prior specification are also studied: an argument is made for the use of a poisson(1)-approximation of the distribution as a prior for the number of components, and methods are given for the selection of hyperparameter values with natural conjugate priors on the components parameters.
Abstract: A new method for the computation of the posterior distribution of the number k of components in a finite mixture is presented. Two aspects of prior specification are also studied: an argument is made for the use of a Poisson(1) distribution as the prior for k; and methods are given for the selection of hyperparameter values in the mixture of normals model, with natural conjugate priors on the components parameters.
Citations
More filters
Journal ArticleDOI
TL;DR: A new Markov chain Monte Carlo method for the Bayesian analysis of finite mixture distributions with an unknown number of components is presented and can be used for mixtures of components from any parametric family, under the assumption that the component parameters can be integrated out of the model analytically.
Abstract: A new Markov chain Monte Carlo method for the Bayesian analysis of finite mixture distributions with an unknown number of components is presented. The sampler is characterized by a state space consisting only of the number of components and the latent allocation variables. Its main advantage is that it can be used, with minimal changes, for mixtures of components from any parametric family, under the assumption that the component parameters can be integrated out of the model analytically. Artificial and real data sets are used to illustrate the method and mixtures of univariate and of multivariate normals are explicitly considered. The problem of label switching, when parameter inference is of interest, is addressed in a post-processing stage.

159 citations


Cites background from "Bayesian finite mixtures: a note on..."

  • ...As for the hyperparameters, we assume a symmetric prior and follow the approach of Nobile (2005)....

    [...]

  • ...Very little posterior mass is given to numbers of components outside the range from three to seven, in agreement with the estimate based on marginal likelihoods given in Nobile (2005)....

    [...]

  • ...Nobile (2005) contains an example of marked sensitivity of the posterior of k to changes in φ, for mixtures of univariate normal distributions....

    [...]

  • ...For a justification of the Poi(1) prior on k, see Nobile (2005)....

    [...]

Journal ArticleDOI
TL;DR: The most commonly used method of inference for MFMs is reversible jump Markov chain Monte Carlo, but it can be nontrivial to design good reversible jump moves, especially in high-dimensional spaces as discussed by the authors.
Abstract: A natural Bayesian approach for mixture models with an unknown number of components is to take the usual finite mixture model with symmetric Dirichlet weights, and put a prior on the number of components—that is, to use a mixture of finite mixtures (MFM). The most commonly used method of inference for MFMs is reversible jump Markov chain Monte Carlo, but it can be nontrivial to design good reversible jump moves, especially in high-dimensional spaces. Meanwhile, there are samplers for Dirichlet process mixture (DPM) models that are relatively simple and are easily adapted to new applications. It turns out that, in fact, many of the essential properties of DPMs are also exhibited by MFMs—an exchangeable partition distribution, restaurant process, random measure representation, and stick-breaking representation—and crucially, the MFM analogues are simple enough that they can be used much like the corresponding DPM properties. Consequently, many of the powerful methods developed for inference in DPMs ...

156 citations

Posted Content
TL;DR: It turns out that many of the essential properties of DPMs are also exhibited by MFMs, and the MFM analogues are simple enough that they can be used much like the corresponding DPM properties; this simplifies the implementation of MFMs and can substantially improve mixing.
Abstract: A natural Bayesian approach for mixture models with an unknown number of components is to take the usual finite mixture model with Dirichlet weights, and put a prior on the number of components---that is, to use a mixture of finite mixtures (MFM). While inference in MFMs can be done with methods such as reversible jump Markov chain Monte Carlo, it is much more common to use Dirichlet process mixture (DPM) models because of the relative ease and generality with which DPM samplers can be applied. In this paper, we show that, in fact, many of the attractive mathematical properties of DPMs are also exhibited by MFMs---a simple exchangeable partition distribution, restaurant process, random measure representation, and in certain cases, a stick-breaking representation. Consequently, the powerful methods developed for inference in DPMs can be directly applied to MFMs as well. We illustrate with simulated and real data, including high-dimensional gene expression data.

151 citations


Cites background or methods from "Bayesian finite mixtures: a note on..."

  • ...Several inference methods have been proposed for this type of model (Nobile 1994; Phillips and Smith 1996; Richardson and Green 1997; Stephens 2000; Nobile and Fearnside 2007; McCullagh and Yang 2008), the most commonly used method being reversible jump Markov chain Monte Carlo (Green 1995;…...

    [...]

  • ...Although theMFMenables consistent inference for k in principle (Nobile 1994), there are several issues that need to be carefully considered in practice; see Section A.3....

    [...]

  • ...As seen in Figure 7, the tendency of DPM samples to have tiny extra clusters causes the number of clusters t to be somewhat inflated, apparently making the DPM posterior on t fail to concentrate, while the MFM posterior on t concentrates at the true value (by Section 5.2 and Nobile 1994)....

    [...]

  • ...Nobile (1994) approximated the marginal likelihood p(x1:n|k) of each k to compute the posterior on k, and uses standard methods given k. Phillips and Smith (1996) and Stephens (2000) use jump diffusion and point process approaches, respectively, to sample from p(k, π, θ |x1:n)....

    [...]

  • ...Assuming the same γ for all k is a genuine restriction, albeit a fairly natural one, often made in such models even when not strictly necessary (Nobile 1994; Phillips and Smith 1996; Richardson and Green 1997; Stephens 2000; Green and Richardson 2001; Nobile and Fearnside 2007)....

    [...]

Posted Content
TL;DR: In this article, the authors consider approximate Bayesian model choice for model selection problems that involve models whose Fisher-information matrices may fail to be invertible along other competing submodels.
Abstract: We consider approximate Bayesian model choice for model selection problems that involve models whose Fisher-information matrices may fail to be invertible along other competing submodels. Such singular models do not obey the regularity conditions underlying the derivation of Schwarz's Bayesian information criterion (BIC) and the penalty structure in BIC generally does not reflect the frequentist large-sample behavior of their marginal likelihood. While large-sample theory for the marginal likelihood of singular models has been developed recently, the resulting approximations depend on the true parameter value and lead to a paradox of circular reasoning. Guided by examples such as determining the number of components of mixture models, the number of factors in latent factor models or the rank in reduced-rank regression, we propose a resolution to this paradox and give a practical extension of BIC for singular model selection problems.

84 citations


Cites methods from "Bayesian finite mixtures: a note on..."

  • ...Although we envision that the use of a uniform prior on models in definition 1 is reasonable for many applications, deviations from this default can be of interest; compare, for instance, Nobile (2005) who discussed priors for the number of components in mixture models....

    [...]

Journal ArticleDOI
TL;DR: A probabilistic approach based on Bayesian networks for modelling non-homogeneous and non-linear gene regulatory processes and is in excellent agreement with biological findings, predicting dichotomies that one would expect to find in the studied systems.
Abstract: Method: The objective of the present article is to propose and evaluate a probabilistic approach based on Bayesian networks for modelling non-homogeneous and non-linear gene regulatory processes. The method is based on a mixture model, using latent variables to assign individual measurements to different classes. The practical inference follows the Bayesian paradigm and samples the network structure, the number of classes and the assignment of latent variables from the posterior distribution with Markov Chain Monte Carlo (MCMC), using the recently proposed allocation sampler as an alternative to RJMCMC. Results: We have evaluated the method using three criteria: network reconstruction, statistical significance and biological plausibility. In terms of network reconstruction, we found improved results both for a synthetic network of known structure and for a small real regulatory network derived from the literature. We have assessed the statistical significance of the improvement on gene expression time series for two different systems (viral challenge of macrophages, and circadian rhythms in plants), where the proposed new scheme tends to outperform the classical BGe score. Regarding biological plausibility, we found that the inference results obtained with the proposed method were in excellent agreement with biological findings, predicting dichotomies that one would expect to find in the studied systems. Availability: Two supplementary papers on theoretical (T) and experi-mental (E) aspects and the datasets used in our study are available from http://www.bioss.ac.uk/associates/marco/supplement/ Contact:[email protected], [email protected]

75 citations


Additional excerpts

  • ...This prior is known to be suitable for finite mixture models (Nobile, 2005)....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: In this article, the authors compare several methods of estimating Bayes factors when it is possible to simulate observations from the posterior distributions, via Markov chain Monte Carlo or other techniques, provided that each posterior distribution is well behaved in the sense of having a single dominant mode.
Abstract: The Bayes factor is a ratio of two posterior normalizing constants, which may be difficult to compute. We compare several methods of estimating Bayes factors when it is possible to simulate observations from the posterior distributions, via Markov chain Monte Carlo or other techniques. The methods that we study are all easily applied without consideration of special features of the problem, provided that each posterior distribution is well behaved in the sense of having a single dominant mode. We consider a simulated version of Laplace's method, a simulated version of Bartlett correction, importance sampling, and a reciprocal importance sampling technique. We also introduce local volume corrections for each of these. In addition, we apply the bridge sampling method of Meng and Wong. We find that a simulated version of Laplace's method, with local volume correction, furnishes an accurate approximation that is especially useful when likelihood function evaluations are costly. A simple bridge sampli...

2,191 citations


"Bayesian finite mixtures: a note on..." refers methods in this paper

  • ...…should be noted that estimation of the marginal likelihood from MCMC output is not as simple as other posterior inference using MCMC, and as a consequence several methods have been proposed, see e.g. Chib (1995), Raftery (1996), DiCiccio et al. (1997), Gelman and Meng (1998) and references therein....

    [...]

Journal ArticleDOI
TL;DR: In this paper, a hierarchical prior model is proposed to deal with weak prior information while avoiding the mathematical pitfalls of using improper priors in the mixture context, which can be used as a basis for a thorough presentation of many aspects of the posterior distribution.
Abstract: New methodology for fully Bayesian mixture analysis is developed, making use of reversible jump Markov chain Monte Carlo methods that are capable of jumping between the parameter subspaces corresponding to different numbers of components in the mixture A sample from the full joint distribution of all unknown variables is thereby generated, and this can be used as a basis for a thorough presentation of many aspects of the posterior distribution The methodology is applied here to the analysis of univariate normal mixtures, using a hierarchical prior model that offers an approach to dealing with weak prior information while avoiding the mathematical pitfalls of using improper priors in the mixture context

2,018 citations

Journal ArticleDOI
TL;DR: This work exploits the fact that the marginal density can be expressed as the prior times the likelihood function over the posterior density, so that Bayes factors for model comparisons can be routinely computed as a by-product of the simulation.
Abstract: In the context of Bayes estimation via Gibbs sampling, with or without data augmentation, a simple approach is developed for computing the marginal density of the sample data (marginal likelihood) given parameter draws from the posterior distribution. Consequently, Bayes factors for model comparisons can be routinely computed as a by-product of the simulation. Hitherto, this calculation has proved extremely challenging. Our approach exploits the fact that the marginal density can be expressed as the prior times the likelihood function over the posterior density. This simple identity holds for any parameter value. An estimate of the posterior density is shown to be available if all complete conditional densities used in the Gibbs sampler have closed-form expressions. To improve accuracy, the posterior density is estimated at a high density point, and the numerical standard error of resulting estimate is derived. The ideas are applied to probit regression and finite mixture models.

1,954 citations


"Bayesian finite mixtures: a note on..." refers background or methods in this paper

  • ...The method exploits a fundamental probability identity already used by Chib (1995), but combines it with the representation of mixture marginal likelihoods given in Nobile (2004)....

    [...]

  • ...Expression (10) forms the basis of a method of marginal likelihood estimation, see Chib (1995) and Raftery (1996)....

    [...]

  • ...Chib (1995), Raftery (1996), DiCiccio et al. (1997), Gelman and Meng (1998) and references therein....

    [...]

  • ...…can be rewritten as f(x) = f(θ)f(x|θ) f(θ|x) (10) where f(θ) and f(x|θ) are assumed computable, including their normalizing constants, and the formula holds for any parameter value θ. Expression (10) forms the basis of a method of marginal likelihood estimation, see Chib (1995) and Raftery (1996)....

    [...]

  • ...The method exploits a fundamental probability identity already used by Chib (1995), but combines it with the representation of mixture marginal likelihoods given in Nobile (2004). I also discuss a more specific topic, hyperparameter selection in a finite mixture of univariate normals....

    [...]

Journal ArticleDOI
TL;DR: It is demonstrated that this fails in general to solve the ‘label switching’ problem, and an alternative class of approaches, relabelling algorithms, which arise from attempting to minimize the posterior expected loss under a class of loss functions are described.
Abstract: Summary. In a Bayesian analysis of finite mixture models, parameter estimation and clustering are sometimes less straightforward than might be expected. In particular, the common practice of estimating parameters by their posterior mean, and summarizing joint posterior distributions by marginal distributions, often leads to nonsensical answers. This is due to the so-called 'label switching' problem, which is caused by symmetry in the likelihood of the model parameters. A frequent response to this problem is to remove the symmetry by using artificial identifiability constraints. We demonstrate that this fails in general to solve the problem, and we describe an alternative class of approaches, relabelling algorithms, which arise from attempting to minimize the posterior expected loss under a class of loss functions. We describe in detail one particularly simple and general relabelling algorithm and illustrate its success in dealing with the label switching problem on two examples.

1,060 citations


"Bayesian finite mixtures: a note on..." refers methods in this paper

  • ...…on the parameters does not always work and other methods have been proposed, see Richardson and Green (1997) and its discussion (especially the contributions of G. Celeux and M. Stephens), Celeux, Hurn and Robert (2000), Stephens (2000b), Frühwirth-Schnatter (2001), Nobile and Fearnside (2005)....

    [...]

  • ...If a prior distribution π(k) is specified, then one can obtain a sample from the joint posterior of (k, λ, θ) by means of Markov chain Monte Carlo methods, see e.g. Richardson and Green (1997), Phillips and Smith (1996), Stephens (2000a), Nobile and Fearnside (2005)....

    [...]

Journal ArticleDOI
TL;DR: It is shown that the acceptance ratio method and thermodynamic integration are natural generalizations of importance sampling, which is most familiar to statistical audiences.
Abstract: Computing (ratios of) normalizing constants of probability models is a fundamental computational problem for many statistical and scientific studies. Monte Carlo simulation is an effective technique, es- pecially with complex and high-dimensional models. This paper aims to bring to the attention of general statistical audiences of some effective methods originating from theoretical physics and at the same time to ex- plore these methods from a more statistical perspective, through estab- lishing theoretical connections and illustrating their uses with statistical problems. We show that the acceptance ratio method and thermodynamic integration are natural generalizations of importance sampling, which is most familiar to statistical audiences. The former generalizes importance sampling through the use of a single "bridge" density and is thus a case of bridge sampling in the sense of Meng and Wong. Thermodynamic integration, which is also known in the numerical analysis literature as Ogata's method for high-dimensional integration, corresponds to the use of infinitely many and continuously connected bridges (and thus a "path"). Our path sampling formulation offers more flexibility and thus potential efficiency to thermodynamic integration, and the search of op- timal paths turns out to have close connections with the Jeffreys prior density and the Rao and Hellinger distances between two densities. We provide an informative theoretical example as well as two empirical ex- amples (involving 17- to 70-dimensional integrations) to illustrate the potential and implementation of path sampling. We also discuss some open problems.

1,035 citations


"Bayesian finite mixtures: a note on..." refers methods in this paper

  • ...…should be noted that estimation of the marginal likelihood from MCMC output is not as simple as other posterior inference using MCMC, and as a consequence several methods have been proposed, see e.g. Chib (1995), Raftery (1996), DiCiccio et al. (1997), Gelman and Meng (1998) and references therein....

    [...]