scispace - formally typeset
Open AccessJournal ArticleDOI

A Bayesian Framework for Image Segmentation With Spatially Varying Mixtures

Reads0
Chats0
TLDR
A new Bayesian model is proposed for image segmentation based upon Gaussian mixture models (GMM) with spatial smoothness constraints that exploits the Dirichlet compound multinomial (DCM) probability density and a Gauss-Markov random field on theDirichlet parameters to impose smoothness.
Abstract
A new Bayesian model is proposed for image segmentation based upon Gaussian mixture models (GMM) with spatial smoothness constraints. This model exploits the Dirichlet compound multinomial (DCM) probability density to model the mixing proportions (i.e., the probabilities of class labels) and a Gauss-Markov random field (MRF) on the Dirichlet parameters to impose smoothness. The main advantages of this model are two. First, it explicitly models the mixing proportions as probability vectors and simultaneously imposes spatial smoothness. Second, it results in closed form parameter updates using a maximum a posteriori (MAP) expectation-maximization (EM) algorithm. Previous efforts on this problem used models that did not model the mixing proportions explicitly as probability vectors or could not be solved exactly requiring either time consuming Markov Chain Monte Carlo (MCMC) or inexact variational approximation methods. Numerical experiments are presented that demonstrate the superiority of the proposed model for image segmentation compared to other GMM-based approaches. The model is also successfully compared to state of the art image segmentation methods in clustering both natural images and images degraded by noise.

read more

Content maybe subject to copyright    Report

2278 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 9, SEPTEMBER 2010
A Bayesian Framework for Image Segmentation With
Spatially Varying Mixtures
Christophoros Nikou, Member, IEEE, Aristidis C. Likas, Senior Member, IEEE, and
Nikolaos P. Galatsanos, Senior Member, IEEE
Abstract—A new Bayesian model is proposed for image seg-
mentation based upon Gaussian mixture models (GMM) with
spatial smoothness constraints. This model exploits the Dirichlet
compound multinomial (DCM) probability density to model the
mixing proportions (i.e., the probabilities of class labels) and a
Gauss–Markov random field (MRF) on the Dirichlet parameters
to impose smoothness. The main advantages of this model are two.
First, it explicitly models the mixing proportions as probability
vectors and simultaneously imposes spatial smoothness. Second,
it results in closed form parameter updates using a maximum a
posteriori (MAP) expectation-maximization (EM) algorithm. Pre-
vious efforts on this problem used models that did not model the
mixing proportions explicitly as probability vectors or could not
be solved exactly requiring either time consuming Markov Chain
Monte Carlo (MCMC) or inexact variational approximation
methods. Numerical experiments are presented that demonstrate
the superiority of the proposed model for image segmentation
compared to other GMM-based approaches. The model is also
successfully compared to state of the art image segmentation
methods in clustering both natural images and images degraded
by noise.
Index Terms—Bayesian model, Dirichlet compound multinomial
distribution, Gauss–Markov random field prior, Gaussian mixture,
image segmentation, spatially varying finite mixture model.
I. INTRODUCTION
M
ANY approaches have been proposed to solve the
image segmentation problem [1], [2]. Among them,
clustering based methods rely on arranging data into groups
having common characteristics [3], [4]. During the last decade,
the main research directions in the relevant literature are fo-
cused on graph theoretic approaches [5]–[8], methods based
upon the mean shift algorithm [9], [10] and rate distortion
theory techniques [11], [12].
Modeling the probability density function (pdf) of pixel
attributes (e.g., intensity, texture) with finite mixture models
(FMM) [13]–[15] is a natural way to cluster data because it
Manuscript received July 17, 2008; revised September 04, 2009; accepted-
March 09, 2010. First published April 08, 2010; current version published Au-
gust 18, 2010. The associate editor coordinating the review of this manuscript
and approving it for publication was Dr. Eero P. Simoncelli.
C. Nikou and A. C. Likas are with the Department of Computer Science,
University of Ioannina, 45110 Ioannina, Greece (e-mail: cnikou@cs.uoi.gr
arly@cs.uoi.gr).
N. P. Galatsanos is with the Department of Electrical and Computer Engi-
neering, University of Patras, 26500 Rio, Greece (e-mail: ngalatsanos@upa-
tras.gr).
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TIP.2010.2047903
automatically provides a grouping based upon the components
of the mixture that generated them. Furthermore, the likelihood
of a FMM is a rigorous metric for clustering performance
[14]. FMM based pdf modeling has been used successfully in
a number of applications ranging from bioinformatics [16] to
image retrieval [17]. The parameters of the FMM model with
Gaussian components can be estimated through maximum like-
lihood (ML) estimation using the Expectation-Maximization
(EM) algorithm [13], [14], [18]. However, it is well-known
that the EM algorithm finds, in general, a local maximum of
the likelihood. Furthermore, it can be shown that Gaussian
components allow efficient representation of a large variety of
pdf. Thus, Gaussian mixture models (GMM), are commonly
employed in image segmentation tasks [14].
A drawback of the standard ML approach for image segmen-
tation is that commonality of location is not taken into account
when grouping the data. In other words, the prior knowledge
that adjacent pixels most likely belong to the same cluster is not
used. To overcome this shortcoming, spatial smoothness con-
straints have been imposed.
Imposing spatial smoothness is key to certain image pro-
cessing applications since it is an important
a priori known
property of images [19]. Examples of such applications include
denoising, restoration, inpainting and segmentation problems.
In a probabilistic framework, smoothness is expressed through
a prior imposed on image features. A common approach is the
use of an MRF. Many MRF variants have been proposed, see
for example [20]. However, determination of the amount of the
imposed smoothness automatically requires knowledge of the
normalization constant of the MRF. Since this is not known
analytically, learning strategies were proposed [21]–[23].
Research efforts in imposing spatial smoothness for image
segmentation can be grouped into two categories. In the
methods of the first category, spatial smoothness is imposed on
the discrete hidden variables of the FMM that represent class
labels, see for example [7], [24]–[26]. These approaches may
be categorized in a more general area involving simultaneous
image recovery and segmentation which is better known as
image modeling [27]–[30]. More specifically, spatial regular-
ization is achieved by imposing a discrete Markov random field
(DMRF) on the classification labels of neighboring pixels that
penalizes solutions where neighboring pixels belong to dif-
ferent classes. Another method in this category is proposed in
[7] which is based upon the optimization of an energy function
having a term for the quality of the clustering and a term for
the spatial tightness. Minimization of the energy function is
accomplished using graph cuts [31].
1057-7149/$26.00 © 2010 IEEE

NIKOU et al.: A BAYESIAN FRAMEWORK FOR IMAGE SEGMENTATION WITH SPATIALLY VARYING MIXTURES 2279
The Gaussian scale mixtures (GSM) and their extension of
mixtures of projected GSM (MPGSM) and the fields of GSM
(FoGSM) were also used in image denoising in the wavelet do-
main in [32]–[34]. In GSM denoising [32], clusters of wavelet
coefficients are modeled as the product of a Gaussian random
vector and a positive scaling variable. In MPGS denoising [34],
the model is extended to handle different local image character-
istics and incorporates dimensionality reduction through linear
projections. By these means, the number of model parameters
is reduced and fast model training is obtained. In the case of
FoGSM [33], multiscale subbands are modeled by a product
of an exponentiated homogeneous Gaussian Markov random
field (hGMRF) and a second independent hGMRF. In [33], it
is demonstrated that samples drawn from a FoGSM model have
marginal and joint statistics similar to subband coefficients of
photographic images.
To estimate the smoothness parameters, Woolrich
et al. pro-
posed in [35] and [36] a model based upon a logistic transform
that approximates the previously mentioned DMRF with a con-
tinuous Gaussian Markov random field. However, for this model
inference of the contextual mixing proportions (posterior class
label probabilities) of each pixel cannot be obtained in closed
form. Thus, in [35], inference based upon Markov Chain Monte
Carlo (MCMC) is proposed, while in [36] inference based upon
Variational Bayes (VB) is employed. Although MCMC methods
have been studied in statistics for a long time and several gen-
eral criteria have been proposed to determine their convergence
[37], [38], inference based upon them may be notoriously time
consuming. On the other hand, VB-based inference is approxi-
mate and there is no easy way to assert the tightness of the vari-
ational bound. Moreover, similar in spirit approaches to avoid
local maxima of the likelihood, which is a drawback of the ML
solution, rely on the stochastic EM and its variants [39], [40].
In the second category of methods, the MRF-based smooth-
ness constraint is not imposed on the labels but on the contextual
mixing proportions. This model is called spatially variant finite
mixture model (SVFMM) [41] and avoids the inference prob-
lems of DMRFs. In this model maximum a posteriori (MAP) es-
timation of the contextual mixing proportions via the MAP-EM
algorithm is possible. However, the main disadvantage of this
model is that the M-step of the proposed algorithm cannot be
obtained in closed form and is formulated as a constrained opti-
mization problem that requires a projection of the solution onto
the unit simplex (positive and summing up to one components)
[41], [42]. Consequently, the parameters that control the spatial
smoothness cannot be estimated automatically from the data.
In [43], a new family of smoothness priors was pro-
posed for the contextual mixing proportions based upon the
Gauss–Markov random fields that takes into account cluster
statistics, thus, enforcing different smoothness strength for
each cluster. The model was also refined to capture information
in different spatial directions. Moreover, all the parameters
controlling the degree of smoothness for each cluster, as well
as the label probabilities for the pixels, are estimated in closed
form via the maximum a posteriori (MAP) methodology. The
advantage of this family of models is that inference is obtained
using an EM algorithm with closed form update equations.
However, the implied model still does not take into account
explicitly that the mixing proportions are probabilities, thus,
the constraint that they are positive and must sum to one is not
guaranteed by the update equations. As a result, the M-step
of this EM algorithm also requires a reparatory projection
step which is ad-hoc and not an implicit part of the assumed
Bayesian model. A synergy between this category of priors and
line processes, to account for edge preservation, was presented
in [44].
In this paper, we present a new hierarchical Bayesian model
for mixture model-based image segmentation with spatial con-
straints. This model assumes the contextual mixing proportions
to follow a Dirichlet compound multinomial (DCM) distribu-
tion. More precisely, the class to which a pixel belongs is mod-
eled by a discrete multinomial distribution whose parameters
follow a Dirichlet law [45]. Furthermore, spatial smoothness is
imposed by assuming a Gauss–Markov random field (GMRF)
prior for the parameters of the Dirichlet. The parameters of the
multinomial distribution are integrated out in a fully Bayesian
framework and the updates of the parameters of the Dirichlet
are computed in closed form through the EM algorithm.
The Dirichlet distribution has been previously proposed as
a prior for text categorization [46], [47], object recognition
and detection [48] and scene classification [49]. The differ-
ence of the proposed model with respect to existing methods
is twofold. At first, text, scene or object categorization are
supervised learning problems while the proposed segmentation
method is unsupervised. Also, in the existing studies, estima-
tion of the parameters of the Dirichlet distribution is generally
accomplished by variational inference or by simplified logistic
models. The advantage of the herein proposed model is that,
not only the E-step can be expressed in closed form, but also
our model explicitly assumes that the contextual mixing pro-
portions are probability vectors. Inference through the EM
algorithm leads to a third degree polynomial equation for the
parameters of the Dirichlet distribution. Therefore, the closed
form M-step yields parameter values automatically satisfying
the necessary probability constraints.
Another approach to handle non stationary images and re-
lying on MRF is the triplet Markov field (TMF) model [50]
which was also applied to image segmentation [51], [52]. The
main difference of TMF with respect to our model is that, in
TMF, the random field is imposed jointly on the hidden vari-
ables, the observation and a set of auxiliary variables which de-
termine the type of the stationarity. In contrast, in our model, the
random field is imposed on the contextual mixing proportions.
Numerical experiments are presented to assess the perfor-
mance of the proposed model both with simulated data where
the ground truth is known and real natural images where the
performance is assessed both visually and quantitatively.
The remainder of the manuscript is organized as follows:
background for the spatially variant finite mixture model is
given in Section II. The proposed modeling of probabilities
of the pixel labels with a DCM distribution is presented in
Section III. In Section IV, the MAP-EM algorithm for the
estimation of the proposed model parameters is developed.
Experimental results of the application of our model to natural
and artificial images are presented in Section V and conclusions
and directions for future research are given in Section VI.

2280 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 9, SEPTEMBER 2010
II. BACKGROUND ON
SPATIALLY
VARIANT
FINITE MIXTURE
MODELING
Let
denote the vector of features
(e.g., intensity, textural features) representing the
th spatial
location,
,ofa -dimensional vector valued image
modeled as independently distributed random variables. The
SVFMM [41]–[43] provides a modification of the classical
FMM approach [13], [14] for pixel labeling. It assumes a
mixture model with
components each one having a vector of
parameters
defining the density function.
Pixel
is characterized by its probability vector
where is the number of components. We
define
as the set of probability
vectors and
the set of component pa-
rameters. The parameters
, are the contextual
mixing proportions for each pixel
and represent the probabili-
ties of the
th pixel to belong to the th class
and must satisfy the constraints
(1)
The standard finite mixture model [41] assumes that the proba-
bility density function of an observation
is expressed by
(2)
with
being a Gaussian distribution with parameters
, where is the mean
vector and
is the covariance matrix of the -dimensional
Gaussian distribution. This notation implies that
are consid-
ered as random variables and
as parameters.
The spatially varying finite mixture models use a prior den-
sity distribution
for the random variables . Therefore,
denoting
the set of pixels feature vectors , with
, which we assume to be statistically independent and
following Bayes rules, we obtain the posterior probability den-
sity function given by
(3)
with the log-density
(4)
A typical example of
is the Gauss–Markov random field
prior [43], expressed by
(5)
where the parameter
captures the spatial smoothness of
cluster
and enforces different degree of smoothness in each
cluster in order to better adapt the model to the data.
The graphical model for the spatially variant version of the fi-
nite mixture model (SVFMM) is presented in Fig. 1. In the stan-
dard FMM, the feature vector
for a given pixel depends upon
the state of the discrete hidden variable
denoting the mixture
component responsible for generating the observation
. That
is, if
pixel belongs to class . In that case, the mixing
proportion for a given class is simply the percentage of pixels
belonging to that class. In the case of the SVFMM, each pixel
has its own set of mixing proportions , generally called con-
textual mixing proportions or probabilities of the pixel labels.
These contextual mixing proportions are spatially constrained
by a smoothness prior. The strength of this prior could either be
unique for the whole set of pixels [42] or could vary based upon
the local statistics of each image class, thus, making the model
less stationary (5).
The EM algorithm [18] for MAP estimation of the model pa-
rameters requires the computation of the conditional expecta-
tion values of the hidden variables at the E-step of iteration step
(6)
In the M-step, considering that the complete data log-likelihood
is linear in the “hidden” variables [18], the maximization of the
complete data log-likelihood
(7)
yields the model parameters. The function
in (7) can be
maximized independently for each parameter providing the fol-
lowing update equations of the mixture model parameters at step
(8)
The probabilities
are computed by setting
which yields a second degree equation with respect to
(9)
where
is the number of pixels in the neighborhood of the
th pixel.

NIKOU et al.: A BAYESIAN FRAMEWORK FOR IMAGE SEGMENTATION WITH SPATIALLY VARYING MIXTURES 2281
Fig. 1. Graphical model for the spatially variant finite mixture model
(SVFMM).
It is easily verified that (9) has always a real nonnegative
solution for
. However, the main drawback of the SVFMM
is that it imposes spatial smoothness on
without explicitly
taking into account that it is a probability vector (
,
, ). For this purpose, reparatory
computations were introduced in the M-step to enforce the vari-
ables
to satisfy these constraints. A gradient projection al-
gorithm was used in [41] and quadratic programming was pro-
posed in [42]. This approach was shown to improve both the
criterion function (7) and the performance of the model. How-
ever, reparatory projections compromise the assumed Bayesian
model.
III. D
IRICHLET COMPOUND MULTINOMIAL MODELING OF
CONTEXTUAL MIXING PROPORTIONS
To overcome the limitations of SVFMM, we propose in this
section, a new Bayesian model for mixture-based the image seg-
mentation problem based upon a hierarchical prior for the the
contextual mixing proportions
, which are assumed to follow
a DCM distribution. The DCM distribution is a multinomial
whose parameters are generated by a Dirichlet distribution [45],
thus
are probability vectors. Similar in spirit priors have been
proposed, in the totally different context of text modeling [47]
where the DCM parameters are estimated through an iterative
gradient descent optimization method. Also in a recent work
[53], a new family of exponential distributions is proposed ca-
pable of approximating the DCM probability law in order to
make its parameter estimation faster than [47]. In what follows,
we describe how to compute them in closed form. Furthermore,
spatial smoothness is imposed on the parameters of the Dirichlet
distributions which are computed in closed form through a cubic
equation having always one real non negative solution that sat-
isfies the constraints of the Dirichlet parameters.
A. Dirichlet Compound Multinomial Distribution
More precisely, for the
th pixel, , the
class label
is considered to be a random variable fol-
lowing a multinomial distribution with probability vector
with being the number of classes.
Let also
to be the set of pa-
rameters for the whole image. By the multinomial definition it
holds that
(10)
with
(11)
The model described by (10) represents the probability
that pixel belongs to class , as one of the possible out-
comes of a multinomial process with
realizations. Each of
the
outcomes of the process appears with probability ,
. Generally speaking, this is a generative model
for the image. When the multinomial distribution is used to
generate a clustered image, the distribution of the number of
emissions (i.e., counts) of an individual class follows a binomial
law.
The DCM distribution assumes that parameters
of
the multinomial follow a Dirichlet distribution param-
eterized by
where
, , is the vector of
Dirichlet parameters for
(12)
where
, , and is the
Gamma function.
Under the Bayesian framework, the probability label for the
th image pixel is obtained by marginalizing the parameters
(13)
Substituting (10) and (12) into (13) we obtain , with some easy
manipulation, the following expression for the label probabili-
ties:
(14)
with
.
B. Hierarchical Image Model
We now assume a generative model for the image where the
determination of component
generating the th pixel is an
outcome of a DCM process with only one realization. Conse-
quently, the vector of the hidden variables
(6) has the th

2282 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 9, SEPTEMBER 2010
component equal to one and all the others set to zero. This is
also illustrated in the contextual mixing proportions which in
that case are the posterior probabilities
(15)
Thus, taking into account that we have one realization of the
DCM process
and that , the label
probabilities for the
th pixel in (14) become
(16)
The new model may become spatially variant by introducing
a spatial prior on the parameters
of the Dirichlet distribution.
More specifically, we assume a Gauss–Markov type random
field prior for our model since its parameters may be estimated
in closed form [43]
(17)
The main characteristic of this prior is that it enforces smooth-
ness of different degree in each cluster, thus providing better
adaptation to the data [43].
The graphical model for this hierarchical approach is pre-
sented in Fig. 2. We will refer to this new model as the Dirichlet
Compound Multinomial-based Spatially Variant Finite Mixture
Model (DCM-SVFMM). The generative image model works as
follows: a sample
(probability vector) is first drawn from a
Dirichlet distribution with parameter
, thus obtaining a multi-
nomial distribution with parameter
. The “hidden” variable ,
denoting the class of observation
, is the outcome of a multino-
mial process parameterized by
. Moreover, the parameters of
the Dirichlet distribution are spatially constrained by a smooth-
ness prior as it is also the case for the standard SVFMM.
IV. MAP-EM E
STIMATION
Employing the proposed model one can derive the corre-
sponding MAP algorithm using the EM methodology. Applying
the Gauss–Markov prior in (17) to parameters
yields the
following MAP function to be maximized in the M-step of the
Fig. 2. Graphical model for feature vector
x
following a spatially variant finite
mixture model (SVFMM) with a Dirichlet compound multinomial (DCM) prior.
EM algorithm, see (18) at the bottom of the page, where we
define
To compute the model parameters at the M-step we
have to maximize (18) with respect to
, that is, to compute
its partial derivative and set the result to zero. Considering a
neighborhood with
and setting
gives a third degree polynomial equation with respect to ,
for
and
(19)
where
Based upon polynomial theory, it can be proved that (19) has
only one real nonnegative solution satisfying the constraint in
(12), that is
. Specifically, the constant term of the third
degree (19) is negative, thus the product of the roots should be
positive. This implies that, for three real roots, either three roots
should be positive or two roots should be negative and one pos-
itive. The coefficient of the quadratic term is positive which im-
plies that the sum of the roots should be negative. Therefore, two
roots should be negative and one positive. For a pair of complex
conjugate roots and a real root, the real root is always positive
(18)

Figures
Citations
More filters
Posted Content

Discrete multivariate distributions

Oleg Yu. Vorobyev, +1 more
- 05 Nov 2008 - 
TL;DR: In this paper, the authors introduced two new discrete distributions: multivariate Binomial distribution and multivariate Poisson distribution, which were created in eventology as more correct generalizations of Binomial and Poisson distributions.
Journal ArticleDOI

Multivariate Mixture Model for Myocardial Segmentation Combining Multi-Source Images

TL;DR: In this article, the authors proposed a method for simultaneous registration and segmentation of multi-source images, using the multivariate mixture model (MvMM) and maximum of loglikelihood (LL) framework.
Journal ArticleDOI

Survey of contemporary trends in color image segmentation

TL;DR: A comprehensive survey of color image segmentation strategies adopted over the last decade is provided, though notable contributions in the gray scale domain will also be discussed.
Journal ArticleDOI

Robust Student's-t Mixture Model With Spatial Constraints and Its Application in Medical Image Segmentation

TL;DR: A new finite Student's-t mixture model (SMM) is proposed that exploits Dirichlet distribution andDirichlet law to incorporate the local spatial constrains in an image and is successfully compared to the state-of-the-art finite mixture models.
Journal ArticleDOI

Estimating the Granularity Coefficient of a Potts-Markov Random Field Within a Markov Chain Monte Carlo Algorithm

TL;DR: This paper addresses the problem of estimating the Potts parameter β jointly with the unknown parameters of a Bayesian model within a Markov chain Monte Carlo (MCMC) algorithm with results that are as good as those obtained with the actual value of β.
References
More filters
Journal ArticleDOI

Latent dirichlet allocation

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Proceedings Article

Latent Dirichlet Allocation

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Book

Pattern Recognition and Machine Learning

TL;DR: Probability Distributions, linear models for Regression, Linear Models for Classification, Neural Networks, Graphical Models, Mixture Models and EM, Sampling Methods, Continuous Latent Variables, Sequential Data are studied.
Journal ArticleDOI

Pattern Recognition and Machine Learning

Radford M. Neal
- 01 Aug 2007 - 
TL;DR: This book covers a broad range of topics for regular factorial designs and presents all of the material in very mathematical fashion and will surely become an invaluable resource for researchers and graduate students doing research in the design of factorial experiments.
Related Papers (5)
Frequently Asked Questions (13)
Q1. What are the contributions in "A bayesian framework for image segmentation with spatially varying mixtures" ?

In this paper, a new Bayesian model is proposed for image segmentation based upon Gaussian mixture models ( GMM ) with spatial smoothness constraints. 

For the 3-class image of 256 256 pixels,using as feature only the intensity, the algorithm performs one EM iteration per second and convergence may be achieved in 10–50 iterations, depending upon the amount of noise. 

As one of their purposes is to investigatethe behavior of the compared methods to noise without any bias, the authors have decided to perform the comparative experiments using the MRF texture features for all the methods. 

Important open questions in a segmentation algorithm concern the estimation of the number of image segments as well as the automatic determination of salient features in the case of multidimensional feature vectors [64], [65]. 

The explanation is that when the added noise is not smoothed out by PCA as it is the case in MRF texture features, the Ncut and GBMS methods are not robust and provide erroneous segmentations. 

In the termination criterion of the EM algorithm, considered here, convergence was defined as the percentage of change in the log-likelihood (4) between two consecutive iterations to be less than 0.001%, or . 

A notable advantage of the DCM-SVFMM method is that it does not need any parameter to be fixed before training which is not the case neither in graph based methods nor in the mean-shift algorithm where the result strongly depends upon the selected parameters. 

Since the EM algorithm is sensitive to initialization, in their experiments, the authors have executed a number of iterations of the EM algorithm with a set of randomly generated initial conditions and kept the one giving the maximum value for the log-likelihood. 

The function in (7) can be maximized independently for each parameter providing the following update equations of the mixture model parameters at step(8)The probabilities are computed by setting which yields a second degree equation with respect to(9)where is the number of pixels in the neighborhood of the th pixel. 

Under the Bayesian framework, the probability label for the th image pixel is obtained by marginalizing the parameters(13)Substituting (10) and (12) into (13) the authors obtain , with some easy manipulation, the following expression for the label probabilities:(14)with . 

denoting the set of pixels feature vectors , with , which the authors assume to be statistically independent and following Bayes rules, the authors obtain the posterior probability density function given by(3)with the log-density(4)A typical example of is the Gauss–Markov random field prior [43], expressed by(5)where the parameter captures the spatial smoothness of cluster and enforces different degree of smoothness in each cluster in order to better adapt the model to the data. 

Let denote the vector of features (e.g., intensity, textural features) representing the th spatial location, , of a -dimensional vector valued image modeled as independently distributed random variables. 

The contextual mixing proportions for each pixel are constrained to follow a Dirichlet compound multinomial distribution, thus, avoiding the projection step in the standard EM algorithm [42].