How many iterations can be achieved in the GBMS?

For the 3-class image of 256 256 pixels,using as feature only the intensity, the algorithm performs one EM iteration per second and convergence may be achieved in 10–50 iterations, depending upon the amount of noise.

What is the purpose of the comparative experiments?

As one of their purposes is to investigatethe behavior of the compared methods to noise without any bias, the authors have decided to perform the comparative experiments using the MRF texture features for all the methods.

What are the open questions in a segmentation algorithm?

Important open questions in a segmentation algorithm concern the estimation of the number of image segments as well as the automatic determination of salient features in the case of multidimensional feature vectors [64], [65].

What is the reason why the Ncut and GBMS methods are not robust?

The explanation is that when the added noise is not smoothed out by PCA as it is the case in MRF texture features, the Ncut and GBMS methods are not robust and provide erroneous segmentations.

What is the termination criterion of the EM algorithm?

In the termination criterion of the EM algorithm, considered here, convergence was defined as the percentage of change in the log-likelihood (4) between two consecutive iterations to be less than 0.001%, or .

What is the advantage of the DCM-SVFMM method?

A notable advantage of the DCM-SVFMM method is that it does not need any parameter to be fixed before training which is not the case neither in graph based methods nor in the mean-shift algorithm where the result strongly depends upon the selected parameters.

What is the criterion for the termination of the EM algorithm?

Since the EM algorithm is sensitive to initialization, in their experiments, the authors have executed a number of iterations of the EM algorithm with a set of randomly generated initial conditions and kept the one giving the maximum value for the log-likelihood.

What is the probability label for the th image pixel?

Under the Bayesian framework, the probability label for the th image pixel is obtained by marginalizing the parameters(13)Substituting (10) and (12) into (13) the authors obtain , with some easy manipulation, the following expression for the label probabilities:(14)with .

What is the simplest way to estimate the mixing proportions of the pixel?

The contextual mixing proportions for each pixel are constrained to follow a Dirichlet compound multinomial distribution, thus, avoiding the projection step in the standard EM algorithm [42].

(Open Access) A Bayesian Framework for Image Segmentation With Spatially Varying Mixtures (2010) | Christophoros Nikou

Q: What are the contributions in "A bayesian framework for image segmentation with spatially varying mixtures" ?

In this paper, a new Bayesian model is proposed for image segmentation based upon Gaussian mixture models ( GMM ) with spatial smoothness constraints.

2278 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 9, SEPTEMBER 2010

A Bayesian Framework for Image Segmentation With

Spatially Varying Mixtures

Christophoros Nikou, Member, IEEE, Aristidis C. Likas, Senior Member, IEEE, and

Nikolaos P. Galatsanos, Senior Member, IEEE

Abstract—A new Bayesian model is proposed for image seg-

mentation based upon Gaussian mixture models (GMM) with

spatial smoothness constraints. This model exploits the Dirichlet

compound multinomial (DCM) probability density to model the

mixing proportions (i.e., the probabilities of class labels) and a

Gauss–Markov random ﬁeld (MRF) on the Dirichlet parameters

to impose smoothness. The main advantages of this model are two.

First, it explicitly models the mixing proportions as probability

vectors and simultaneously imposes spatial smoothness. Second,

it results in closed form parameter updates using a maximum a

posteriori (MAP) expectation-maximization (EM) algorithm. Pre-

vious efforts on this problem used models that did not model the

mixing proportions explicitly as probability vectors or could not

be solved exactly requiring either time consuming Markov Chain

Monte Carlo (MCMC) or inexact variational approximation

methods. Numerical experiments are presented that demonstrate

the superiority of the proposed model for image segmentation

compared to other GMM-based approaches. The model is also

successfully compared to state of the art image segmentation

methods in clustering both natural images and images degraded

by noise.

Index Terms—Bayesian model, Dirichlet compound multinomial

distribution, Gauss–Markov random ﬁeld prior, Gaussian mixture,

image segmentation, spatially varying ﬁnite mixture model.

I. INTRODUCTION

ANY approaches have been proposed to solve the

image segmentation problem [1], [2]. Among them,

clustering based methods rely on arranging data into groups

having common characteristics [3], [4]. During the last decade,

the main research directions in the relevant literature are fo-

cused on graph theoretic approaches [5]–[8], methods based

upon the mean shift algorithm [9], [10] and rate distortion

theory techniques [11], [12].

Modeling the probability density function (pdf) of pixel

attributes (e.g., intensity, texture) with ﬁnite mixture models

(FMM) [13]–[15] is a natural way to cluster data because it

Manuscript received July 17, 2008; revised September 04, 2009; accepted-

March 09, 2010. First published April 08, 2010; current version published Au-

gust 18, 2010. The associate editor coordinating the review of this manuscript

and approving it for publication was Dr. Eero P. Simoncelli.

C. Nikou and A. C. Likas are with the Department of Computer Science,

University of Ioannina, 45110 Ioannina, Greece (e-mail: cnikou@cs.uoi.gr

arly@cs.uoi.gr).

N. P. Galatsanos is with the Department of Electrical and Computer Engi-

neering, University of Patras, 26500 Rio, Greece (e-mail: ngalatsanos@upa-

tras.gr).

Color versions of one or more of the ﬁgures in this paper are available online

at http://ieeexplore.ieee.org.

Digital Object Identiﬁer 10.1109/TIP.2010.2047903

automatically provides a grouping based upon the components

of the mixture that generated them. Furthermore, the likelihood

of a FMM is a rigorous metric for clustering performance

[14]. FMM based pdf modeling has been used successfully in

a number of applications ranging from bioinformatics [16] to

image retrieval [17]. The parameters of the FMM model with

Gaussian components can be estimated through maximum like-

lihood (ML) estimation using the Expectation-Maximization

(EM) algorithm [13], [14], [18]. However, it is well-known

that the EM algorithm ﬁnds, in general, a local maximum of

the likelihood. Furthermore, it can be shown that Gaussian

components allow efﬁcient representation of a large variety of

pdf. Thus, Gaussian mixture models (GMM), are commonly

employed in image segmentation tasks [14].

A drawback of the standard ML approach for image segmen-

tation is that commonality of location is not taken into account

when grouping the data. In other words, the prior knowledge

that adjacent pixels most likely belong to the same cluster is not

used. To overcome this shortcoming, spatial smoothness con-

straints have been imposed.

Imposing spatial smoothness is key to certain image pro-

cessing applications since it is an important

a priori known

property of images [19]. Examples of such applications include

denoising, restoration, inpainting and segmentation problems.

In a probabilistic framework, smoothness is expressed through

a prior imposed on image features. A common approach is the

use of an MRF. Many MRF variants have been proposed, see

for example [20]. However, determination of the amount of the

imposed smoothness automatically requires knowledge of the

normalization constant of the MRF. Since this is not known

analytically, learning strategies were proposed [21]–[23].

Research efforts in imposing spatial smoothness for image

segmentation can be grouped into two categories. In the

methods of the ﬁrst category, spatial smoothness is imposed on

the discrete hidden variables of the FMM that represent class

labels, see for example [7], [24]–[26]. These approaches may

be categorized in a more general area involving simultaneous

image recovery and segmentation which is better known as

image modeling [27]–[30]. More speciﬁcally, spatial regular-

ization is achieved by imposing a discrete Markov random ﬁeld

(DMRF) on the classiﬁcation labels of neighboring pixels that

penalizes solutions where neighboring pixels belong to dif-

ferent classes. Another method in this category is proposed in

[7] which is based upon the optimization of an energy function

having a term for the quality of the clustering and a term for

the spatial tightness. Minimization of the energy function is

accomplished using graph cuts [31].

NIKOU et al.: A BAYESIAN FRAMEWORK FOR IMAGE SEGMENTATION WITH SPATIALLY VARYING MIXTURES 2279

The Gaussian scale mixtures (GSM) and their extension of

mixtures of projected GSM (MPGSM) and the ﬁelds of GSM

(FoGSM) were also used in image denoising in the wavelet do-

main in [32]–[34]. In GSM denoising [32], clusters of wavelet

coefﬁcients are modeled as the product of a Gaussian random

vector and a positive scaling variable. In MPGS denoising [34],

the model is extended to handle different local image character-

istics and incorporates dimensionality reduction through linear

projections. By these means, the number of model parameters

is reduced and fast model training is obtained. In the case of

FoGSM [33], multiscale subbands are modeled by a product

of an exponentiated homogeneous Gaussian Markov random

ﬁeld (hGMRF) and a second independent hGMRF. In [33], it

is demonstrated that samples drawn from a FoGSM model have

marginal and joint statistics similar to subband coefﬁcients of

photographic images.

To estimate the smoothness parameters, Woolrich

et al. pro-

posed in [35] and [36] a model based upon a logistic transform

that approximates the previously mentioned DMRF with a con-

tinuous Gaussian Markov random ﬁeld. However, for this model

inference of the contextual mixing proportions (posterior class

label probabilities) of each pixel cannot be obtained in closed

form. Thus, in [35], inference based upon Markov Chain Monte

Carlo (MCMC) is proposed, while in [36] inference based upon

Variational Bayes (VB) is employed. Although MCMC methods

have been studied in statistics for a long time and several gen-

eral criteria have been proposed to determine their convergence

[37], [38], inference based upon them may be notoriously time

consuming. On the other hand, VB-based inference is approxi-

mate and there is no easy way to assert the tightness of the vari-

ational bound. Moreover, similar in spirit approaches to avoid

local maxima of the likelihood, which is a drawback of the ML

solution, rely on the stochastic EM and its variants [39], [40].

In the second category of methods, the MRF-based smooth-

ness constraint is not imposed on the labels but on the contextual

mixing proportions. This model is called spatially variant ﬁnite

mixture model (SVFMM) [41] and avoids the inference prob-

lems of DMRFs. In this model maximum a posteriori (MAP) es-

timation of the contextual mixing proportions via the MAP-EM

algorithm is possible. However, the main disadvantage of this

model is that the M-step of the proposed algorithm cannot be

obtained in closed form and is formulated as a constrained opti-

mization problem that requires a projection of the solution onto

the unit simplex (positive and summing up to one components)

[41], [42]. Consequently, the parameters that control the spatial

smoothness cannot be estimated automatically from the data.

In [43], a new family of smoothness priors was pro-

posed for the contextual mixing proportions based upon the

Gauss–Markov random ﬁelds that takes into account cluster

statistics, thus, enforcing different smoothness strength for

each cluster. The model was also reﬁned to capture information

in different spatial directions. Moreover, all the parameters

controlling the degree of smoothness for each cluster, as well

as the label probabilities for the pixels, are estimated in closed

form via the maximum a posteriori (MAP) methodology. The

advantage of this family of models is that inference is obtained

using an EM algorithm with closed form update equations.

However, the implied model still does not take into account

explicitly that the mixing proportions are probabilities, thus,

the constraint that they are positive and must sum to one is not

guaranteed by the update equations. As a result, the M-step

of this EM algorithm also requires a reparatory projection

step which is ad-hoc and not an implicit part of the assumed

Bayesian model. A synergy between this category of priors and

line processes, to account for edge preservation, was presented

in [44].

In this paper, we present a new hierarchical Bayesian model

for mixture model-based image segmentation with spatial con-

straints. This model assumes the contextual mixing proportions

to follow a Dirichlet compound multinomial (DCM) distribu-

tion. More precisely, the class to which a pixel belongs is mod-

eled by a discrete multinomial distribution whose parameters

follow a Dirichlet law [45]. Furthermore, spatial smoothness is

imposed by assuming a Gauss–Markov random ﬁeld (GMRF)

prior for the parameters of the Dirichlet. The parameters of the

multinomial distribution are integrated out in a fully Bayesian

framework and the updates of the parameters of the Dirichlet

are computed in closed form through the EM algorithm.

The Dirichlet distribution has been previously proposed as

a prior for text categorization [46], [47], object recognition

and detection [48] and scene classiﬁcation [49]. The differ-

ence of the proposed model with respect to existing methods

is twofold. At ﬁrst, text, scene or object categorization are

supervised learning problems while the proposed segmentation

method is unsupervised. Also, in the existing studies, estima-

tion of the parameters of the Dirichlet distribution is generally

accomplished by variational inference or by simpliﬁed logistic

models. The advantage of the herein proposed model is that,

not only the E-step can be expressed in closed form, but also

our model explicitly assumes that the contextual mixing pro-

portions are probability vectors. Inference through the EM

algorithm leads to a third degree polynomial equation for the

parameters of the Dirichlet distribution. Therefore, the closed

form M-step yields parameter values automatically satisfying

the necessary probability constraints.

Another approach to handle non stationary images and re-

lying on MRF is the triplet Markov ﬁeld (TMF) model [50]

which was also applied to image segmentation [51], [52]. The

main difference of TMF with respect to our model is that, in

TMF, the random ﬁeld is imposed jointly on the hidden vari-

ables, the observation and a set of auxiliary variables which de-

termine the type of the stationarity. In contrast, in our model, the

random ﬁeld is imposed on the contextual mixing proportions.

Numerical experiments are presented to assess the perfor-

mance of the proposed model both with simulated data where

the ground truth is known and real natural images where the

performance is assessed both visually and quantitatively.

The remainder of the manuscript is organized as follows:

background for the spatially variant ﬁnite mixture model is

given in Section II. The proposed modeling of probabilities

of the pixel labels with a DCM distribution is presented in

Section III. In Section IV, the MAP-EM algorithm for the

estimation of the proposed model parameters is developed.

Experimental results of the application of our model to natural

and artiﬁcial images are presented in Section V and conclusions

and directions for future research are given in Section VI.

2280 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 9, SEPTEMBER 2010

II. BACKGROUND ON

SPATIALLY

VARIANT

FINITE MIXTURE

MODELING

Let

denote the vector of features

(e.g., intensity, textural features) representing the

th spatial

location,

,ofa -dimensional vector valued image

modeled as independently distributed random variables. The

SVFMM [41]–[43] provides a modiﬁcation of the classical

FMM approach [13], [14] for pixel labeling. It assumes a

mixture model with

components each one having a vector of

parameters

deﬁning the density function.

Pixel

is characterized by its probability vector

where is the number of components. We

deﬁne

as the set of probability

vectors and

the set of component pa-

rameters. The parameters

, are the contextual

mixing proportions for each pixel

and represent the probabili-

ties of the

th pixel to belong to the th class

and must satisfy the constraints

(1)

The standard ﬁnite mixture model [41] assumes that the proba-

bility density function of an observation

is expressed by

(2)

with

being a Gaussian distribution with parameters

, where is the mean

vector and

is the covariance matrix of the -dimensional

Gaussian distribution. This notation implies that

are consid-

ered as random variables and

as parameters.

The spatially varying ﬁnite mixture models use a prior den-

sity distribution

for the random variables . Therefore,

denoting

the set of pixels feature vectors , with

, which we assume to be statistically independent and

following Bayes rules, we obtain the posterior probability den-

sity function given by

(3)

with the log-density

(4)

A typical example of

is the Gauss–Markov random ﬁeld

prior [43], expressed by

(5)

where the parameter

captures the spatial smoothness of

cluster

and enforces different degree of smoothness in each

cluster in order to better adapt the model to the data.

The graphical model for the spatially variant version of the ﬁ-

nite mixture model (SVFMM) is presented in Fig. 1. In the stan-

dard FMM, the feature vector

for a given pixel depends upon

the state of the discrete hidden variable

denoting the mixture

component responsible for generating the observation

. That

is, if

pixel belongs to class . In that case, the mixing

proportion for a given class is simply the percentage of pixels

belonging to that class. In the case of the SVFMM, each pixel

has its own set of mixing proportions , generally called con-

textual mixing proportions or probabilities of the pixel labels.

These contextual mixing proportions are spatially constrained

by a smoothness prior. The strength of this prior could either be

unique for the whole set of pixels [42] or could vary based upon

the local statistics of each image class, thus, making the model

less stationary (5).

The EM algorithm [18] for MAP estimation of the model pa-

rameters requires the computation of the conditional expecta-

tion values of the hidden variables at the E-step of iteration step

(6)

In the M-step, considering that the complete data log-likelihood

is linear in the “hidden” variables [18], the maximization of the

complete data log-likelihood

(7)

yields the model parameters. The function

in (7) can be

maximized independently for each parameter providing the fol-

lowing update equations of the mixture model parameters at step

(8)

The probabilities

are computed by setting

which yields a second degree equation with respect to

(9)

where

is the number of pixels in the neighborhood of the

th pixel.

NIKOU et al.: A BAYESIAN FRAMEWORK FOR IMAGE SEGMENTATION WITH SPATIALLY VARYING MIXTURES 2281

Fig. 1. Graphical model for the spatially variant ﬁnite mixture model

(SVFMM).

It is easily veriﬁed that (9) has always a real nonnegative

solution for

. However, the main drawback of the SVFMM

is that it imposes spatial smoothness on

without explicitly

taking into account that it is a probability vector (

, ). For this purpose, reparatory

computations were introduced in the M-step to enforce the vari-

ables

to satisfy these constraints. A gradient projection al-

gorithm was used in [41] and quadratic programming was pro-

posed in [42]. This approach was shown to improve both the

criterion function (7) and the performance of the model. How-

ever, reparatory projections compromise the assumed Bayesian

model.

III. D

IRICHLET COMPOUND MULTINOMIAL MODELING OF

CONTEXTUAL MIXING PROPORTIONS

To overcome the limitations of SVFMM, we propose in this

section, a new Bayesian model for mixture-based the image seg-

mentation problem based upon a hierarchical prior for the the

contextual mixing proportions

, which are assumed to follow

a DCM distribution. The DCM distribution is a multinomial

whose parameters are generated by a Dirichlet distribution [45],

thus

are probability vectors. Similar in spirit priors have been

proposed, in the totally different context of text modeling [47]

where the DCM parameters are estimated through an iterative

gradient descent optimization method. Also in a recent work

[53], a new family of exponential distributions is proposed ca-

pable of approximating the DCM probability law in order to

make its parameter estimation faster than [47]. In what follows,

we describe how to compute them in closed form. Furthermore,

spatial smoothness is imposed on the parameters of the Dirichlet

distributions which are computed in closed form through a cubic

equation having always one real non negative solution that sat-

isﬁes the constraints of the Dirichlet parameters.

A. Dirichlet Compound Multinomial Distribution

More precisely, for the

th pixel, , the

class label

is considered to be a random variable fol-

lowing a multinomial distribution with probability vector

with being the number of classes.

Let also

to be the set of pa-

rameters for the whole image. By the multinomial deﬁnition it

holds that

(10)

with

(11)

The model described by (10) represents the probability

that pixel belongs to class , as one of the possible out-

comes of a multinomial process with

realizations. Each of

the

outcomes of the process appears with probability ,

. Generally speaking, this is a generative model

for the image. When the multinomial distribution is used to

generate a clustered image, the distribution of the number of

emissions (i.e., counts) of an individual class follows a binomial

law.

The DCM distribution assumes that parameters

the multinomial follow a Dirichlet distribution param-

eterized by

where

, , is the vector of

Dirichlet parameters for

(12)

where

, , and is the

Gamma function.

Under the Bayesian framework, the probability label for the

th image pixel is obtained by marginalizing the parameters

(13)

Substituting (10) and (12) into (13) we obtain , with some easy

manipulation, the following expression for the label probabili-

ties:

(14)

with

B. Hierarchical Image Model

We now assume a generative model for the image where the

determination of component

generating the th pixel is an

outcome of a DCM process with only one realization. Conse-

quently, the vector of the hidden variables

(6) has the th

2282 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 9, SEPTEMBER 2010

component equal to one and all the others set to zero. This is

also illustrated in the contextual mixing proportions which in

that case are the posterior probabilities

(15)

Thus, taking into account that we have one realization of the

DCM process

and that , the label

probabilities for the

th pixel in (14) become

(16)

The new model may become spatially variant by introducing

a spatial prior on the parameters

of the Dirichlet distribution.

More speciﬁcally, we assume a Gauss–Markov type random

ﬁeld prior for our model since its parameters may be estimated

in closed form [43]

(17)

The main characteristic of this prior is that it enforces smooth-

ness of different degree in each cluster, thus providing better

adaptation to the data [43].

The graphical model for this hierarchical approach is pre-

sented in Fig. 2. We will refer to this new model as the Dirichlet

Compound Multinomial-based Spatially Variant Finite Mixture

Model (DCM-SVFMM). The generative image model works as

follows: a sample

(probability vector) is ﬁrst drawn from a

Dirichlet distribution with parameter

, thus obtaining a multi-

nomial distribution with parameter

. The “hidden” variable ,

denoting the class of observation

, is the outcome of a multino-

mial process parameterized by

. Moreover, the parameters of

the Dirichlet distribution are spatially constrained by a smooth-

ness prior as it is also the case for the standard SVFMM.

IV. MAP-EM E

STIMATION

Employing the proposed model one can derive the corre-

sponding MAP algorithm using the EM methodology. Applying

the Gauss–Markov prior in (17) to parameters

yields the

following MAP function to be maximized in the M-step of the

Fig. 2. Graphical model for feature vector

following a spatially variant ﬁnite

mixture model (SVFMM) with a Dirichlet compound multinomial (DCM) prior.

EM algorithm, see (18) at the bottom of the page, where we

deﬁne

To compute the model parameters at the M-step we

have to maximize (18) with respect to

, that is, to compute

its partial derivative and set the result to zero. Considering a

neighborhood with

and setting

gives a third degree polynomial equation with respect to ,

for

and

(19)

where

Based upon polynomial theory, it can be proved that (19) has

only one real nonnegative solution satisfying the constraint in

(12), that is

. Speciﬁcally, the constant term of the third

degree (19) is negative, thus the product of the roots should be

positive. This implies that, for three real roots, either three roots

should be positive or two roots should be negative and one pos-

itive. The coefﬁcient of the quadratic term is positive which im-

plies that the sum of the roots should be negative. Therefore, two

roots should be negative and one positive. For a pair of complex

conjugate roots and a real root, the real root is always positive

(18)

A Bayesian Framework for Image Segmentation With Spatially Varying Mixtures

Figures

Citations

Discrete multivariate distributions

Multivariate Mixture Model for Myocardial Segmentation Combining Multi-Source Images

Survey of contemporary trends in color image segmentation

Robust Student's-t Mixture Model With Spatial Constraints and Its Application in Medical Image Segmentation

Estimating the Granularity Coefficient of a Potts-Markov Random Field Within a Markov Chain Monte Carlo Algorithm

References

Maximum likelihood from incomplete data via the EM algorithm

Latent dirichlet allocation

Latent Dirichlet Allocation

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning

Related Papers (5)

Segmentation of brain MR images through a hidden Markov random field model and the expectation-maximization algorithm

Maximum likelihood from incomplete data via the EM algorithm

A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics

Normalized cuts and image segmentation

Mean shift: a robust approach toward feature space analysis

Frequently Asked Questions (13)

Q1. What are the contributions in "A bayesian framework for image segmentation with spatially varying mixtures" ?

Q2. How many iterations can be achieved in the GBMS?

Q3. What is the purpose of the comparative experiments?

Q4. What are the open questions in a segmentation algorithm?

Q5. What is the reason why the Ncut and GBMS methods are not robust?

Q6. What is the termination criterion of the EM algorithm?

Q7. What is the advantage of the DCM-SVFMM method?

Q8. What is the criterion for the termination of the EM algorithm?

Q9. How can the function in (7) be maximized independently for each parameter?

Q10. What is the probability label for the th image pixel?

Q11. What is the posterior probability density function of a pixel?

Q12. What is the vector of features of a -dimensional vector valued image?

Q13. What is the simplest way to estimate the mixing proportions of the pixel?