scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Perfect samplers for mixtures of distributions

TL;DR: In this article, the authors consider the construction of perfect samplers for posterior distributions associated with mixtures of exponential families and conjugate priors, starting with a perfect slice sampler in the spirit of Mira and co-workers.
Abstract: Summary. We consider the construction of perfect samplers for posterior distributions associated with mixtures of exponential families and conjugate priors, starting with a perfect slice sampler in the spirit of Mira and co-workers. The methods rely on a marginalization akin to Rao-Blackwellization and illustrate the duality principle of Diebolt and Robert. A first approximation embeds the finite support distribution on the latent variables within a continuous support distribution that is easier to simulate by slice sampling, but we later demonstrate that the approximation can be very poor. We conclude by showing that an alternative perfect sampler based on a single backward chain can be constructed. This alternative can handle much larger sample sizes than the slice sampler first proposed.

Summary (1 min read)

2.2. The slice sampler

  • While, for large values of n, a handwaving argument could justify the switch to a continuous state space, there exists a rigorous argument which validates this continuous embedding.
  • If the authors apply the slice sampler to the continuous state space chain, the monotonicity argument holds.
  • In particular, by moving the lower and upper chains downwards and upwards, the authors simply retard the moment of coalescence but ensure that the chains of interest will have coalesced at that moment.
  • (In fact, this is impossible for large sample sizes.).
  • Any value below (or above) will be acceptable.

3. The general case

  • There is very little of what has been said in Section 2 that does not apply to the general case.
  • The problem with the general case is not in extending the method, which does not depend on k, intrinsically, even though Algorithm 1] must be adapted to select the proper number of uniform random variables, but rather with nding a maximum starting value 1.
  • The implementation of the slice sampler also gets more di cult as k increases; the authors are then forced to settle for simple accept-reject methods which are correct but may be slow.
  • The authors describe in Sections 3.1 and 3.2 the particular cases of exponential and normal mixtures to show that perfect sampling can also be achieved in such settings.
  • Note that the treatment of the Poisson case also extends to the general case, even if it may imply one numerical maximisation.

4. Conclusion

  • The authors have obtained what they believe to be the rst general iid sampling method for mixture posterior distributions.
  • This is of direct practical interest since mixtures are heavily used in statistical modelling and the corresponding inference is delicate (Titterington et al., 1985 , Robert, 1996) .
  • The authors have also illustrated that perfect sampling can be achieved for realistic statistical models and not only for toy problems.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

Perfect Slice Samplers for Mixtures of Distributions
G. Casella
y
Cornell University, Ithaca, New York, USA
K.L. Mengersen
z
Queensland University of Technology, Brisbane, Australia
C.P. Robert
x
CREST, Insee, Paris, France
D.M. Titterington
University of Glasgow, Glasgow, Scotland
Summary. We propose a perfect sampler for mixtures of distributions, in the spirit of Mira and
Roberts (1999), building on Hobert, Robert and Titterington (1999). The method relies on a
marginalisation akin to Rao-Blackwellisation which illustrates the Duality Principle of Diebolt
and Robert (1994) and utilises an envelope argument which embeds the finite support distri-
bution on the latent variables within a continuous support distribution, easier to simulate by
slice sampling. We also provide a number of illustrations in the cases of normal and exponen-
tial mixtures which show that the technique does not suffer from severe slow-down when the
number of observations or the number of components increases. We thus obtain a general
iid sampling method for mixture posterior distributions and illustrate convincingly that perfect
sampling can be achieved for realistic statistical models and not only for toy problems.
1. Introduction
Perfect sampling, which originated with Propp and Wilson 1996, has b een develop ed in
recent years as a technique for taking advantage of the MCMC algorithms, which enable
us to simulate from a distribution
which may not be explicitly known, without suering
from the drawback of MCMC, namely that the distribution of interest is only the asymptotic
distribution of the generated Markov chain. See Fismen (1998) for an excellent introduction,
as well as Wilson (1999), whose Website is constantly up dated, and Møller and Nicholls
(1999) for recent statistical applications.
When considering realistic statistical mo dels like those involving nite mixtures of dis-
tributions (Titterington
et al.
, 1985), with densities of the form
k
X
i
=1
p
i
f
(
x
j
i
)
;
(1)
y
Supported by National Science Foundation Grant DMS-9971586. This is technical rep ort BU-
1453-M in the Department of Biometrics, Cornell University, Ithaca NY 14853.
z
Supported by an ASC Large Grant and by CNRS during a visit to CREST in August 1999.
x
Supported by EU TMR network ERBFMRXCT960095 on
"Computational and Statistical
Methods for the Analysis of Spatial Data"
. The author wishes to thank the Statistics Department,
University of Glasgow, for its hospitality and supp ort. He is also grateful to Peter Green, Antonietta
Mira and Gareth Rob erts for helpful discussions.

MCMC algorithms are necessary for processing the posterior distribution of the parameters
(
p
i
;
i
)
(see, e.g., Celeux
et al.
, 1999). It is however quite a delicate exercise to come up
with a perfect sampling version, as shown by the rst attempt of Hobert
et al.
(1999), who
can only pro cess a mixture like (1) when the parameters
i
are known and when
k
3
.
The reasons for this diculty are that p erfect sampling techniques, while not requiring
monotonicity structures in the Markov transition, work b etter under such an assumption,
and that exhibiting such monotonicity in the mixture mo del requires hard work. One of
the key features of Hobert
et al.
' (1999) solution, along with the specic representation
of the Dirichlet distribution in terms of basic exponential random variables, is to exploit
the
Duality Principle
established by Dieb olt and Rob ert (1994) for latent variable models.
In set-ups where the chain of interest
(
(
t
)
)
is generated conditionally on a second chain
(
z
(
t
)
)
whose support is nite, the probabilistic prop erties of the chain of interest can be
derived from the prop erties of the chain
(
z
(
t
)
)
, whose niteness facilitates theoretical study.
While this is not of direct practical relevance, since the support of
(
z
(
t
)
)
is of size
k
n
for
k
component mixtures with
n
observations, monotonicity structures can often be observed
on the
(
z
(
t
)
)
chain.
This pap er extends the result of Hob ert
et al.
(1999) to the case of general nite
mixtures of distributions, under conjugate priors, that is, when either the
p
i
's, the
i
's or
both are unknown, by prop osing a dierent approach to the problem. The foundation of the
technique used here relies on the facts that, under conjugate priors, the marginal p osterior
distribution of the latent variables
z
is known in closed form, up to a constant, as exhibited
and exploited for imp ortance sampling in Casella
et al.
(1999), and that, moreover, a slice
sampler can b e implemented for this distribution. We can thus use the results of Mira and
Roberts (1999), who show how a general p erfect sampler can b e adapted to (univariate)
slice samplers, by taking advantage of the fact that the slice sampler is naturally monotone
for the order induced by the distribution of interest. Indeed, a naive implementation of the
slice sampler in the parameter space is imp ossible, given the complexity of the posterior
distribution. The slice region
8
<
:
;
n
Y
i
=1
2
4
k
X
j
=1
p
j
f
(
x
i
j
j
)
3
5
9
=
;
is complex and is usually not connected, which prevents the use of standard techniques
such as ray lancing. While it is equally dicult to describe the dual region on the discrete
chain, we can take advantage of an envelop e argument, as in Kendall (1998), to simulate a
continuous version of the discrete chain for which slice sampling is feasible.
The pap er is organised as follows. In Section 2, we provide a detailed description of
the p erfect sampling technique in the sp ecial case of a two component exponential mixture,
establishing the foundations which are extended to the general case in Section 3, where we
show that the metho d can be implemented for an arbitrary number of components in the
normal and exponential cases, as illustrated in Sections 3.1 and 3.2.
2. A first example
2.1. Marginalisation
Consider a sample
(
X
1
; : : : ; X
n
)
from a two comp onent exponential mixture, with density
p
0
exp(
?
0
x
) + (1
?
p
)
1
exp(
?
1
x
)
:
(2)

We assume (in this section only) that the
p
i
's, i.e. here just
p
, are known and that the prior
distribution on
j
is a
G
a
(
j
;
j
)
distribution. Recall that (2) can be interpreted as the
marginal distribution of the joint distribution
X; Z
p
(1
?
z
)
(1
?
p
)
z
z
exp(
?
z
x
)
;
where
Z
can take the values
0
and
1
. As shown in Casella
et al.
(1999), the joint posterior
distribution on the
Z
i
's and the
j
's is prop ortional to
n
Y
i
=1
p
(1
?
z
i
)
(1
?
p
i
)
z
i
z
i
exp(
?
z
i
x
i
)
k
Y
j
=1
j
?
1
j
exp(
?
j
j
)
;
and leads to the following distribution on the
Z
i
's:
Z
1
; : : : ; Z
n
j
x
1
; : : : ; x
n
p
n
0
(1
?
p
)
n
1
Z
0
+
n
0
?
1
0
exp
f?
0
(
0
+
s
0
)
g
1
+
n
1
?
1
1
exp
f?
1
(
1
+
s
1
)
g
d
0
d
1
;
where
n
j
denotes the number of
Z
i
's equal to
j
and
s
j
is the sum of the
x
i
's which have
corresponding
Z
i
's equal to
j
. This means that the marginal p osterior distribution on the
Z
i
's is prop ortional to
Z
1
; : : : ; Z
n
j
x
1
; : : : ; x
n
p
n
0
(1
?
p
)
n
1
?(
0
+
n
0
?
1)?(
1
+
n
1
?
1)
(
0
+
s
0
)
0
+
n
0
(
1
+
s
1
)
1
+
n
1
:
(3)
Now, this distribution app ears not to b e useful, given that the main purp ose of inference
in mixture set-ups is to gather information on the parameters themselves rather than on
the latent variables. This is not the case, however, b ecause (a) posterior exp ectations of
functions of these parameters can often b e approximated from the distribution of the
Z
i
's
using the Rao-Blackwellisation technique of Gelfand and Smith (1990), and (b) p erfect
simulation from (3) leads to perfect simulation from the marginal p osterior distribution of
the parameters
by a simple call to the conditional distribution
(
j
z
)
, once coalescence
is attained.
2.2. The slice sampler
To construct an op erational slice sampler for the distribution (3), we rst note that the
distribution factors through the sucient statistic
(
n
0
; s
0
)
, since
n
1
=
n
?
n
0
and
s
1
=
S
?
s
0
,
where
S
denotes the sum of all observations. Moreover, (3) is also the distribution of the
pair
(
n
0
; s
0
)
, given that, for a xed value of
n
0
, the sum
s
0
is in one-to-one corresp ondence
with the
Z
i
's (with probability one). This suciency prop erty is striking in that it results in
a simulation method which integrates out the parameters and does not simulate the latent
variables! If we denote (3) by
(
n
0
; s
0
)
, a standard slice sampler (Damien
et al.
, 1999) thus
requires sampling alternately
(i) from the uniform distribution on
[0
;
(
n
0
; s
0
)]
, that is pro ducing
=
U
(
n
0
; s
0
)
, where
U
U
([0
;
1])
, and
(ii) from the uniform distribution on
f
(
n
0
; s
0
);
(
n
0
; s
0
)
g
:

The rst step is straightforward but the second one can b e quite complex, given the nite
support of
s
0
and the number of cases to b e considered, namely
?
n
n
0
.
We can however take advantage of the following p oints to overcome this diculty.
(i) As p ointed out in Mira and Roberts (1999), the natural sto chastic ordering asso ciated
with a slice sampler is the ordering induced by
(
n
0
; s
0
)
. If
(
!
1
)
(
!
2
)
, the
corresponding slices satisfy
A
2
=
f
!
;
(
!
)
u
(
!
2
)
g A
1
=
f
!
;
(
!
)
u
(
!
1
)
g
;
and, therefore, simulation from a uniform distribution on
A
2
can proceed by accep-
tance/rejection of a uniform sampling on
A
1
. From a perfect sampling p oint of view, if
!
0
1
U
(
A
1
)
belongs to
A
2
, it is also acceptable as a simulation from
U
(
A
2
)
; if it does
not b elong to
A
2
, the simulated value
!
0
2
will preserve the ordering
(
!
0
1
)
(
!
0
2
)
.
(ii) There exist a maximal and a minimal element,
~
1
and
~
0
, for this order, which can
be identied in this particular case. Therefore, monotone
coupling from the past
(CFTP)
(Propp and Wilson 1996) applies, that is, it is sucient to run two chains
starting from
~
1
and
~
0
, and check if both chains coalesce at time
0
. Following a now
standard monotonicity argument, all chains in between the extreme chains will have
coalesced when those two coalesce. Note here the crucial appeal of running the slice
sampler on the latent variable chain rather than on the dual parameter chain. It is
nearly imp ossible to nd
~
0
and
~
1
for the latter, since this is equivalent to nding the
maximum likelihoo d estimator (for
~
1
) and a minimum likelihoo d estimator (for
~
0
),
the second of which does not exist for non-compact cases. Note also that knowledge
only of the maximal element
~
1
is necessary in order to run the monotone slice sampler,
given that the minimal element
~
0
is never really used. For the chain starting from
~
0
, the next value is selected at random from the entire state space of the
!
's, since
(
!
)
u
(
~
0)
does not imp ose any constraint on
!
.
(iii) While it is far from obvious how to do perfect sampling from the discrete distribution
(3), there exists an envelop e argument, in the spirit of Kendall (1998), which embeds
(3) in a continuous distribution, for which slice sampling is much easier. Indeed, (3)
can then b e considered as a density function for
s
0
, conditionally on
n
0
, such that
s
0
varies on the interval
[
s
0
(
n
0
)
;
s
0
(
n
0
)]
, where
s
0
(
n
0
) =
x
(1)
+
: : :
+
x
(
n
0
)
;
s
0
(
n
0
) =
x
(
n
)
+
: : :
+
x
(
n
?
n
0
+1)
;
are the minimum and maximum p ossible values for
s
0
, and the
x
(
i
)
's denote the order
statistics of the sample
x
1
; : : : ; x
n
, with
x
(1)
x
(
n
)
. While, for large values
of
n
, a handwaving argument could justify the switch to a continuous state space,
there exists a rigorous argument which validates this continuous embedding. In fact,
if
~
0
and
~
1
are now dened on the continuous (in
s
0
) state space, they are minorant
and ma jorant, respectively, of the points in the discrete state space (for the order
induced by
). If we apply the slice sampler to the continuous state space chain, the
monotonicity argument holds. By moving the images of
~
0
and
~
1
to a lower value and
larger value in the nite state space, respectively, we then ensure that the images of all
the p oints in the nite state are contained in this mo died interval. This is a typical
envelope argument as in Kendall (1998). In particular, by moving the lower and
upper chains downwards and upwards, we simply retard the moment of coalescence

but ensure that the chains of interest will have coalesced at that moment. Note in
addition that the envelop e mo dication can b e implemented in a rudimentary (or
generic) fashion as there is no need to determine the particular value of
s
0
that is
nearest to the image of
~
0
or to
~
1
. (In fact, this is impossible for large sample sizes.)
Any value below (or above) will be acceptable. It is therefore sucient to obtain, in
a burn-in stage, (that is, b efore running the CFTP sampler), a collection of values of
s
0
which will serve as reference values in the envelope step.
2.3. More details
To show more clearly how to implement the ideas of Section 2.2, consider distribution (3).
To generate from the uniform distribution on
(
n
0
; s
0
) ;
p
n
0
0
(1
?
p
0
)
n
?
n
0
?(
0
+
n
0
?
1)?(
1
+
n
?
n
0
?
1)
(
0
+
s
0
)
0
+
n
0
(
1
+
S
?
s
0
)
1
+
n
?
n
0
it is sucient to draw a value of
(
n
0
; s
0
)
at random from the set
f
0
n
0
n; s
0
2
[
s
0
(
n
0
)
;
s
0
(
n
0
)]
g
;
that is, to draw
n
0
uniformly b etween
0
and
n
, until
max
s
(
n
0
; s
)
, and then to
draw
s
0
uniformly from the
s
's satisfying
(
n
0
; s
)
. This can b e done by virtue of
the monotonicity in
s
of
(
n
0
; s
)
. This function is decreasing and then increasing, with
minimum at
s
0
=
(
n
0
+
0
)(
1
+
S
)
?
(
n
?
n
0
+
1
)
0
n
+
0
+
1
;
provided this value is in
[
s
0
(
n
0
)
;
s
0
(
n
0
)]
. The maximum is obviously attained at one of
the two extremes,
s
0
(
n
0
)
or
s
0
(
n
0
)
. Not only does this facilitate checking of whether
max
s
(
n
0
; s
)
, but it also provides easy generation of
s
0
conditionally on
n
0
. The
range of values for which
(
n
0
; s
)
can indeed b e determined exactly, and is either an
interval or the union of two intervals. The joint generation of
(
n
0
; s
0
)
thus dep ends on two
uniform random variables
u; u
0
, and we denote the pro cedure by
(
! ; ; u; u
0
)
if
!
is the
current value of
(
n
0
; s
0
)
. The associated CFTP algorithm [1] is then as given in Figure 1.
Note that, once the two chains
!
(
t
)
0
and
!
(
t
)
1
have coalesced, they remain a single unique
chain until
t
= 0
since the value
!
(
t
+1)
0
is always accepted at Step 3.
Figures 24 provide some illustrations of the paths of the two chains started at
~
0
and
~
1
for various values of
n
and the parameters. They also provide the corresp onding values
of the log p osteriors
log
(
!
(
t
)
0
)
and
log
(
!
(
t
)
1
)
. As
n
increases, the graph of
log
(
!
(
t
)
1
)
gets atter; this is caused by a scaling eect namely that the dierence b etween
log
(
!
(
t
)
0
)
and
log
(
!
(
t
)
1
)
also increases with
n
. As mentioned ab ove, once Algorithm [1] has been
completed and, within both chains,
!
(0)
i
are equal and thus distributed from the station-
ary distribution
, it is straightforward to generate from the marginal distribution on the
parameters
(
0
;
1
)
through the conditional distribution
(
0
;
1
j
n
0
; s
0
)
.
2.4. Other two-component settings
The above results obviously apply more generally than for distribution (2). If the weight,
p
, is also unknown and distributed as a Beta
B
e
(
0
;
1
)
random variable, for instance, (2)

Citations
More filters
Book ChapterDOI
10 May 2011
TL;DR: This volume focuses on perfect sampling or exact sampling algorithms, so named because such algorithms use Markov chains and yet obtain genuine i.i.d. draws—hence perfect or exact—from their limiting distributions within a finite numbers of iterations.
Abstract: Hamiltonian dynamics can be used to produce distant proposals for the Metropolis algorithm, thereby avoiding the slow exploration of the state space that results from the diffusive behaviour of simple random-walk proposals. Though originating in physics, Hamiltonian dynamics can be applied to most problems with continuous state spaces by simply introducing fictitious “momentum” variables. A key to its usefulness is that Hamiltonian dynamics preserves volume, and its trajectories can thus be used to define complex mappings without the need to account for a hard-to-compute Jacobian factor — a property that can be exactly maintained even when the dynamics is approximated by discretizing time. In this review, I discuss theoretical and practical aspects of Hamiltonian Monte Carlo, and present some of its variations, including using windows of states for deciding on acceptance or rejection, computing trajectories using fast approximations, tempering during the course of a trajectory to handle isolated modes, and short-cut methods that prevent useless trajectories from taking much computation time.

1,453 citations

Journal ArticleDOI
TL;DR: The solutions to the label switching problem of Markov chain Monte Carlo methods, such as artificial identifiability constraints, relabelling algorithms and label invariant loss functions are reviewed.
Abstract: In the past ten years there has been a dramatic increase of interest in the Bayesian analysis of finite mixture models. This is primarily because of the emergence of Markov chain Monte Carlo (MCMC) methods. While MCMC provides a convenient way to draw inference from complicated statistical models, there are many, perhaps underappreciated, problems associated with the MCMC analysis of mixtures. The problems are mainly caused by the nonidentifiability of the components under symmetric priors, which leads to so-called label switching in the MCMC output. This means that ergodic averages of component specific quantities will be identical and thus useless for inference. We review the solutions to the label switching problem, such as artificial identifiability constraints, relabelling algorithms and label invariant loss functions. We also review various MCMC sampling schemes that have been suggested for mixture models and discuss posterior sensitivity to prior specification.

679 citations

Book ChapterDOI
TL;DR: This chapter aims to introduce the prior modeling, estimation, and evaluation of mixture distributions in a Bayesian paradigm, and shows that mixture distributions provide a flexible, parametric framework for statistical modeling and analysis.
Abstract: Publisher Summary Mixture distributions comprise a finite or infinite number of components, possibly of different distributional types, that can describe different features of data. The Bayesian paradigm allows for probability statements to be made directly about the unknown parameters, prior or expert opinion to be included in the analysis, and hierarchical descriptions of both local-scale and global features of the model. This chapter aims to introduce the prior modeling, estimation, and evaluation of mixture distributions in a Bayesian paradigm. The chapter shows that mixture distributions provide a flexible, parametric framework for statistical modeling and analysis. Focus is on the methods rather than advanced examples, in the hope that an understanding of the practical aspects of such modeling can be carried into many disciplines. It also points out the fundamental difficulty in doing inference with such objects, along with a discussion about prior modeling, which is more restrictive than usual, and the constructions of estimators, which also is more involved than the standard posterior mean solution. Finally, this chapter gives some pointers to the related models and problems like mixtures of regressions and hidden Markov models as well as Dirichlet priors.

466 citations

Journal ArticleDOI
TL;DR: In this article, a model-based method to cluster units within a panel is proposed, where the underlying model is autoregressive and non-Gaussian, allowing for both skewness and fat tails, and the units are clustered according to their dynamic behavior, equilibrium level and the effect of covariates.
Abstract: We propose a model-based method to cluster units within a panel. The underlying model is autoregressive and non-Gaussian, allowing for both skewness and fat tails, and the units are clustered according to their dynamic behavior, equilibrium level, and the effect of covariates. Inference is addressed from a Bayesian perspective, and model comparison is conducted using Bayes factors. Particular attention is paid to prior elicitation and posterior propriety. We suggest priors that require little subjective input and have hierarchical structures that enhance inference robustness. We apply our methodology to GDP growth of European regions and to employment growth of Spanish firms.

113 citations


Cites background from "Perfect samplers for mixtures of di..."

  • ...Casella et al. (2002) introduced a perfect sampling scheme, which is not easily extended to nonexponential families....

    [...]

Journal ArticleDOI
TL;DR: An approximation method to evaluate the posterior distribution and Bayes estimators by Gibbs sampling, relying on the missing data structure of the mixture model, is presented.
Abstract: This paper deals with a Bayesian analysis of a finite Beta mixture model. We present approximation method to evaluate the posterior distribution and Bayes estimators by Gibbs sampling, relying on the missing data structure of the mixture model. Experimental results concern contextual and non-contextual evaluations. The non-contextual evaluation is based on synthetic histograms, while the contextual one model the class-conditional densities of pattern-recognition data sets. The Beta mixture is also applied to estimate the parameters of SAR images histograms.

111 citations

References
More filters
Book
01 Jan 1986
TL;DR: This course discusses Mathematical Aspects of Mixtures, Sequential Problems and Procedures, and Applications of Finite Mixture Models.
Abstract: Statistical Problems. Applications of Finite Mixture Models. Mathematical Aspects of Mixtures. Learning About the Parameters of a Mixture. Learning About the Components of a Mixture. Sequential Problems and Procedures.

3,464 citations


"Perfect samplers for mixtures of di..." refers background in this paper

  • ...When considering realistic statistical models like those involving finite mixtures of distributions (Titterington et al., 1985), with densities of the form k Pi f(x Ii), ki=1 k(1) k EPi = 1, i=l...

    [...]

  • ...When considering realistic statistical models like those involving finite mixtures of distributions (Titterington et al., 1985), with densities of the form k∑ i=1 pi f.x | θi/; k∑ i=1 pi = 1; .1/ Address for correspondence: C. P. Robert, Ceremade, Université Paris-Dauphine, Place du Maréchal de…...

    [...]

Journal ArticleDOI
TL;DR: In this paper, a hierarchical prior model is proposed to deal with weak prior information while avoiding the mathematical pitfalls of using improper priors in the mixture context, which can be used as a basis for a thorough presentation of many aspects of the posterior distribution.
Abstract: New methodology for fully Bayesian mixture analysis is developed, making use of reversible jump Markov chain Monte Carlo methods that are capable of jumping between the parameter subspaces corresponding to different numbers of components in the mixture A sample from the full joint distribution of all unknown variables is thereby generated, and this can be used as a basis for a thorough presentation of many aspects of the posterior distribution The methodology is applied here to the analysis of univariate normal mixtures, using a hierarchical prior model that offers an approach to dealing with weak prior information while avoiding the mathematical pitfalls of using improper priors in the mixture context

2,018 citations


"Perfect samplers for mixtures of di..." refers result in this paper

  • ...The constraint on this closed form representation is obviously that the prior distributions must be conjugate, but this is often the case in the literature (see Diebolt and Robert (1994) or Richardson and Green (1997))....

    [...]

Journal ArticleDOI
TL;DR: In this paper, a variant of the ergodic aperiodic Markov chain sampling method is proposed, which runs from a distant point in the past up to the present, where the distance into the past that one needs to go is determined during the running of the algorithm itself.
Abstract: For many applications it is useful to sample from a finite set of objects in accordance with some particular distribution. One approach is to run an ergodic (i.e., irreducible aperiodic) Markov chain whose stationary distribution is the desired distribution on this set; after the Markov chain has run for M steps, with M sufficiently large, the distribution governing the state of the chain approximates the desired distribution. Unfortunately, it can be difficult to determine how large M needs to be. We describe a simple variant of this method that determines on its own when to stop and that outputs samples in exact accordance with the desired distribution. The method uses couplings which have also played a role in other sampling schemes; however, rather than running the coupled chains from the present into the future, one runs from a distant point in the past up until the present, where the distance into the past that one needs to go is determined during the running of the algorithm itself. If the state space has a partial order that is preserved under the moves of the Markov chain, then the coupling is often particularly efficient. Using our approach, one can sample from the Gibbs distributions associated with various statistical mechanics models (including Ising, random-cluster, ice, and dimer) or choose uniformly at random from the elements of a finite distributive lattice. © 1996 John Wiley & Sons, Inc.

1,235 citations

01 Jan 1997
TL;DR: In this article, a hierarchical prior model is used to deal with weak prior information while avoiding the mathematical pitfalls of using improper priors in the mixture context, and a sample from the full joint distribution of all unknown variables is generated, and this can be used as a basis for a thorough presentation of many aspects of the posterior distribution.
Abstract: SUMMARY New methodology for fully Bayesian mixture analysis is developed, making use of reversible jump Markov chain Monte Carlo methods that are capable of jumping between the parameter subspaces corresponding to different numbers of components in the mixture. A sample from the full joint distribution of all unknown variables is thereby generated, and this can be used as a basis for a thorough presentation of many aspects of the posterior distribution. The methodology is applied here to the analysis of univariate normal mixtures, using a hierarchical prior model that offers an approach to dealing with weak prior information while avoiding the mathematical pitfalls of using improper priors in the mixture context.

1,229 citations

Journal ArticleDOI
TL;DR: In this paper, Gibbs sampling is used to evaluate the posterior distribution and Bayes estimators by Gibbs sampling, relying on the missing data structure of the mixture model. And the data augmentation method is shown to converge geometrically, since a duality principle transfers properties from the discrete missing data chain to the parameters.
Abstract: SUMMARY A formal Bayesian analysis of a mixture model usually leads to intractable calculations, since the posterior distribution takes into account all the partitions of the sample. We present approximation methods which evaluate the posterior distribution and Bayes estimators by Gibbs sampling, relying on the missing data structure of the mixture model. The data augmentation method is shown to converge geometrically, since a duality principle transfers properties from the discrete missing data chain to the parameters. The fully conditional Gibbs alternative is shown to be ergodic and geometric convergence is established in the normal case. We also consider non-informative approximations associated with improper priors, assuming that the sample corresponds exactly to a k-component mixture.

895 citations


"Perfect samplers for mixtures of di..." refers background or result in this paper

  • ...The constraint on this closed form representation is obviously that the prior distributions must be conjugate, but this is often the case in the literature (see Diebolt and Robert (1994) or Richardson and Green (1997))....

    [...]

  • ...One of the key features of the solution of Hobert et al. (1999) is to exploit the duality principle established by Diebolt and Robert (1994) for latent variable models....

    [...]

Frequently Asked Questions (1)
Q1. What contributions have the authors mentioned in the paper "Perfect slice samplers for mixtures of distributions" ?

The authors propose a perfect sampler for mixtures of distributions, in the spirit of Mira and Roberts ( 1999 ), building on Hobert, Robert and Titterington ( 1999 ). The authors also provide a number of illustrations in the cases of normal and exponential mixtures which show that the technique does not suffer from severe slow-down when the number of observations or the number of components increases. The authors thus obtain a general iid sampling method for mixture posterior distributions and illustrate convincingly that perfect sampling can be achieved for realistic statistical models and not only for toy problems.