(Open Access) Recursive unsupervised learning of finite mixture models (2004) | Z. Zivkovic

Q: What have the authors contributed in "Recursive unsupervised learning of finite mixture models - pattern analysis and machine intelligence, ieee transactions on" ?

In this paper, the authors propose an online ( recursive ) algorithm that estimates the parameters of the mixture and that simultaneously selects the number of components.

Q: How many trials did the algorithm perform?

With random initialization, the authors performed 100 trials and the new algorithm was always able to find the correct solution while simultaneously estimating the parameters of the mixture and selecting the number of components.

Q: How many iterations is required to fit the mixture?

The batch algorithm from [6] is fitting the mixture and selecting 11, 12, or 13 components using typically 300 to 400 iterations for a 900 samples data set.

Q: How many components are used in the batch algorithm?

The batch algorithms assume a known number of components: three for the “Three Gaussians” and the “Iris” data, 13 for the “Shrinking Spiral,” and four for the “Enzyme” data set.

Q: What is the optimum log-likelihood for the available data?

Most of the practical model selectiontechniques are based on maximizing the following type of criteria:JðM;~ ðMÞÞ ¼ log pðX ;~ ðMÞÞ P ðMÞ: ð3ÞHere, log pðX ;~ ðMÞÞ is the log-likelihood for the available data.

UvA-DARE is a service provided by the library of the University of Amsterdam (http

://dare.uva.nl)

UvA-DARE (Digital Academic Repository)

Recursive unsupervised learning of finite mixture models

Zivkovic, Z.; van der Heijden, F.

DOI

10.1109/TPAMI.2004.1273970

Publication date

2004

Published in

IEEE Transactions on Pattern Analysis and Machine Intelligence

Link to publication

Citation for published version (APA):

Zivkovic, Z., & van der Heijden, F. (2004). Recursive unsupervised learning of finite mixture

models.

IEEE Transactions on Pattern Analysis and Machine Intelligence

(5), 651-656.

https://doi.org/10.1109/TPAMI.2004.1273970

General rights

It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s)

and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open

content license (like Creative Commons).

Disclaimer/Complaints regulations

If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please

let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material

inaccessible and/or remove it from the website. Please Ask the Library: https://uba.uva.nl/en/contact, or a letter

to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You

will be contacted as soon as possible.

Download date:10 Aug 2022

Recursive Unsupervised Learning of

Finite Mixture Models

Zoran Zivkovic, Member, IEEE Computer Society,

and Ferdina nd van d er Heijden, Member,

IEEE Comput er Society

Abstract—There are two open problems when finite mixture densities are used to

model multivariate data: the selection of the number of components and the

initialization. In this paper, we propose an online (recursive) algorithm that

estimates the parameters of the mixture and that simultaneously selects the

number of components. The new algorithm starts with a large number of randomly

initialized components. A prior is used as a bias for maximally structured models.

A stochas tic approximation recursive learning algorithm is proposed to search for

the maximum a posteriori (MAP) solution and to discard the irrelevant

components.

Index Terms—Online (recursive) estimation, unsupervised learning, finite

mixtures, model selection, EM-algorithm.

1INTRODUCTION

FINITE mixture probability density models have been analyzed

many times and used extensively for modeling multivariate data

[16], [8]. In [3] and [6], an efficient heuristic was used to

simultaneously estimate the parameters of a mixture and select

the appropriate number of its components. The idea is to start with

a large number of components and introduce a prior to express our

preference for compact models. During some iterative search

procedure for the MAP solution, the prior drives the irrelevant

components to extinction. The “entropic-prior” from [3] leads to a

MAP estimate that minimizes the entropy and, hence, leads to a

compact model. The Dirichlet prior from [6] gives a solution that is

related to model selection using the “Minimum Message Length”

(MML) criterion [20].

This paper is inspired by the aforementioned papers [3], [6].

Our contribution is in developing an online version which is

potentially very useful in many situations since it is highly

memory and time efficient. We use a stochastic approximation

procedure to estimate the parameters of the mixture recursively.

More on the behavior of approximate recursive equations can be

found in [13], [5], [15]. We propose a way to include the suggested

prior from [6] in the recursive equations. This enables the online

selection of the number of components of the mixture. We show

that the new algorithm can reach solutions similar to those

obtained by batch algorithms.

In Sections 2 and 3 of the paper, we introduce the notation and

discuss some standard problems associated with finite mixture

fitting. In Section 4, we describe the mentioned heuristic that

enables us to estimate the parameters of the mixture and to

simultaneously select the number of its components. Further, in

Section 5, we develop an online version. The final practical

algorithm we used in our experiments is described in Section 6. In

Section 7, we demonstrate how the new algorithm performs for a

number of standard problems and compare it to some batch

algorithms.

2PARAMETER ESTIMATION

A mixture density with M components for a d-dimensional random

variable

xx is given by:

pð

xx;

Þ¼

m¼1



xx;



Þ; with

m¼1



¼ 1; ð1Þ

where

 ¼f

; ::; 

;



; ::;



g are the parameters. The number of

parameter depends on the number of components M and the

notation

ðMÞ will be used to stress this when needed. The mth

component of the mixture is denoted by p

xx;



Þ and



are its

parameters. The mixing weights denoted by 

are nonnegative

and add up to one.

Given a set of t data samples X¼f

ð1Þ

; ...;

ðtÞ

g the maximum

likelihood (ML) estimate of the parameter values is:



 ¼ arg max



ðlog pðX;

ÞÞ:

The Expectation Maximization (EM) algorithm [4] is commonly

used to search for the solution. The EM algorithm is an iterative

procedure that searches for a local maximum of the log-likelihood

function. In order to apply the EM algorithm, we need to introduce

for each

xx a discrete unobserved indicator vector

yy ¼½y

...y



The indicator vector specifies (by means of position coding) the

mixture component from which the observation

xx is drawn. The

new joint density function can be written as a product:

pð

xx;

yy;

Þ¼pð

yy; 

; ::; 

Þpð

xxj

yy;



; ::;



Þ¼

m¼1



xx;



;

where exactly one of the y

from

yy can be equal to 1 and the others

are zero. The indicators

yy have a multinomial distribution defined

by the mixing weights 

; ::; 

. The EM algorithm starts with

some initial parameter estimate



ð0Þ

. If we denote the set of

unobserved data by Y¼f

ð1Þ

; ...;

ðtÞ

g the estimate



ðkÞ

from the

kth iteration of the EM algorithm is obtained using the previous

estimate



ðk1Þ

E step : Qð

;



ðk1Þ

Þ¼E

ðlog pðX; Y;

ÞjX;



ðk1Þ

Þ¼

all possible Y

pðYjX;



ðk1Þ

Þ log pðX; Y;

Þ

M step :



ðkÞ

¼ arg max



ðQð

;



ðk1Þ

ÞÞ:

ð2Þ

The attractiveness of the EM algorithm is that it is easy to

implement and it converges to a local maximum of the log-

likelihood function. However, one of the serious limitations of the

EM algorithm is that it can end up in a poor local maximum if not

properly initialized. The selection of the initial parameter values is

still an open question that was studied many times. Some recent

efforts were reported in [3], [6], [17], [18], [19].

3MODEL SELECTION

Note that, in order to use the EM algorithm, we need to know the

appropriate number of components M. Too many components lead

to “overfitting” and too few to “underfitting.” Choosing an

appropriate number of components is important. Sometimes, for

example, the appropriate number of components can reveal some

important existing underlying structure that characterizes the data.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 26, NO. 5, MAY 2004 651

. Z. Zivkovic is with the Informatics Institute, University of Amsterdam,

Kruislaan 403, 1098SJ Amsterdam, The Netherlands.

E-mail: zivkovic@science.uva.nl.

. F. van der Heijden is with the Laboratory for Measurement and

Instrumentation, University of Twente, PO Box 217, 7500AE Enschede,

The Netherlands. E-mail: f.vanderheijden@utwente.nl.

Manuscript received 18 Nov. 2002; revised 24 June 2003; accepted 3 Nov.

2003.

Recommended for acceptance by Y. Amit.

For information on obtaining reprints of this article, please send e-mail to:

tpami@computer.org, and reference IEEECS Log Number 117789.

0162-8828/04/$20.00 ß 2004 IEEE Published by the IEEE Computer Society

Full Bayesian approaches sample from the full a po steriori

distribution with the number of compo nents M considered

unknown. This ispossible using Markov chain Monte Carlo methods

as reported in [11], [10]. However, these methods are still far too

computationally demanding. Most of the practical model selection

techniques are based on maximizing the following type of criteria:

JðM;

ðMÞÞ ¼ log pðX;

ðMÞÞ  P ðMÞ: ð3Þ

Here, log pðX;

ðMÞÞ is the log-likelihood for the available data.

This part can be maximized using the EM. However, introducing

more mixture components always increases the log-likelihood. The

balance is achieved by introducing P ðMÞ that penalizes complex

solutions. Some examples of such criteria are the Akaike Informa-

tion C riterion [1], the Bayesian I nference Criterion [14], the

Minimum Description Length [12], the Minimum Message Length

(MML) [20], etc. For a detailed review, see, for example, [8].

4SOLUTION USING MAP ESTIMATION

The standard procedure for selecting M is the following: Find the

ML estimate for different M-s and choose the M that maximizes

(3). Suppose that we introduce a prior pð

ðMÞÞ for the mixture

parameters that penalizes complex solutions in a similar way as

P ðMÞ from (3). Instead of (3), we could use:

log pðX;

ðMÞÞ þ log pð

ðMÞÞ: ð4Þ

As in [6] and [3], we use the simplest prior choice, the prior only on

the mixing weights 

-s. For example, the Dirichlet prior (see [7],

chapter 16) for the mixing weights is given by:

pð

ðMÞÞ / exp

m¼1

log 

m¼1



: ð5Þ

The procedure is then as follows: We start with a large number

of randomly initialized components M and search for the MAP

solution using some iterative procedure, for ex ample, the

EM algorithm. The prior drives the irrelevant components to

extinction. In this way, while searching for the MAP solution,

the number of components M is reduced until the balance is

achieved.

It can be shown that the standard MML model selection

criterion can be approximated by the Dirichlet prior with the

coefficients c

equal to N=2, where N presents the number of

parameters per component of the mixture. See [6] for details. The

parameters c

have a meaningful interpretation. For a multinomial

distribution, the c

presents the prior evidence (in the MAP sense)

for the class m (number of samples a priori belonging to that class).

Negative prior evidence means that we will accept that the class m

exists only if there is enough evidence from the data for the

existence of this class. If there are many parameters per

component, we will need many data samples to estimate them.

In this sense, the presented linear connection between the c

and

N seems very logical. The procedure from [6] starts with all the



-s equal to 1=M. Although there is no proof of optimality, it

seems reasonable to discard the component m when its weight 

becomes negative. This also ensures that the mixing weights stay

nonnegative.

The “entropic prior” from [3] has a similar form: pð

ðMÞÞ / exp

ðH ð

, ...;

ÞÞ, where Hð

; ...;

Þ¼

m¼1



log 

is the

entropy measure for the underlying multinomial distribution and 

is a parameter. We use the mentioned Dirichlet prior because it leads

to a closed form solution.

5RECURSIVE (ONLINE)SOLUTION

For the ML estimate, the following holds:



log pðX;



Þ¼0. The

mixing weights are constrained to sum up to 1. We take this into

account by introducing the Lagrange multiplier  and get:

@ ^

log pðX;



Þþð

m¼1

^

 1Þ



¼ 0. From here, after getting

rid of , it follows that the ML estimate for t data samples should

satisfy ^

ðtÞ

i¼1

ðtÞ

ðiÞ

Þ with the “ownerships” defined as:

ðtÞ

xxÞ¼^

ðtÞ

xx;



ðtÞ

Þ=pð

xx;



ðtÞ

Þ: ð6Þ

Similarly, for the MAP solution, we have

@ ^

ðlog pðX;



Þ + log

pð



Þþð

m¼1

^

 1ÞÞ ¼ 0, where pð



Þ is the mentioned Dirichlet

prior (5). For t data samples, we get:

^

ðtÞ

i¼1

ðtÞ

ðiÞ

Þc

; ð7Þ

where K ¼

m¼1

i¼1

ðtÞ

ð~xx

ðiÞ

ÞcÞ¼t  Mc (since

m¼1

ðtÞ

¼ 1). The parameters of the prior are c

¼c (and c ¼ N=2 as

mentioned before). We rewrite (7) as:

^

ðtÞ



 c=t

1  Mc=t

; ð8Þ

where



i¼1

ðtÞ

ðiÞ

Þ is the mentioned ML estimate and the

bias from the prior is introduced through c=t. The bias decreases

for larger data sets (larger t). However, if a small bias is acceptable

we can keep it constant by fixing c=t to c

¼ c=T with some large T .

This means that the bias will always be the same as if it would have

been for a data set with T samples. If we assume that the

parameter estimates do not change much when a new sample

~xx

ðtþ1Þ

is added and, therefore, o

ðtþ1Þ

ð~xx

ðiÞ

Þ can be approximated by

ðtÞ

ðiÞ

Þ that uses the previous parameter estimates, we get the

following well behaved and easy to use recursive update equation:

^

ðtþ1Þ

¼ ^

ðtÞ

þð1 þ tÞ

1

ðtÞ

ðtþ1Þ

1  Mc

 ^

ðtÞ



ð1 þ tÞ

1

1  Mc

ð9Þ

Here, T should be sufficiently large to make sure that Mc

< 1.We

start with initial ^

ð0Þ

¼ 1=M and discard the mth component when

^

ðtþ1Þ

< 0. Note that the straightforward recursive version of (7)

given by: ^

ðtþ1Þ

¼ ^

ðtÞ

þð1 þ t  McÞ

1

ðo

ðtÞ

ðtþ1Þ

Þ^

ðtÞ

Þ,isnot

very useful. For small t, the update is negative and the weights

for the components with high o

ðtÞ

ðtþ1Þ

Þ are decreased instead of

increased. In order to avoid the negative update, we could start

with a larger value for t, but then we cancel out the influence of the

prior. This motivates the important choice we made to fix the

influence of the prior.

The most commonly used mixture is the Gaussian mixture. A

mixture component p

xx;



Þ¼Nð

xx;



Þ has its mean



and

its covariance matrix C

as the parameters. The prior has influence

only on the mixing weights and we can use the recursive equations:



ðtþ1Þ



ðtÞ

þðt þ 1Þ

1

ðtÞ

ðtþ1Þ

^

ðtÞ

ðtþ1Þ





ðtÞ

Þð10Þ

ðtþ1Þ

ðtÞ

þðt þ 1Þ

1

ðtÞ

ðtþ1Þ

^

ðtÞ



ðtþ1Þ





ðtÞ

Þð

ðtþ1Þ





ðtÞ



ðtÞ



ð11Þ

from [15] for the rest of the parameters.

652 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 26, NO. 5, MAY 2004

6ASIMPLE PRACTICAL ALGORITHM

For an online procedure, it is reasonable to fix the influence of the

new samples by replacing the term ð1 þ tÞ

1

from the recursive

update equations (9), (10), and (11) by  ¼ 1=T. There are also

some practical reasons for using a fixed small constant . It reduces

the problems with instability of the equations for small t.

Furthermore, a fixed  helps in forgetting the out-of-date statistics

(random initialization and component deletion) more rapidly. It is

equivalent to introducing an exponentially decaying envelope:

ð1  Þ

ti

is applied to the influence of the sample

ðtiÞ

For the sake of clarity, we present here the whole algorithm we

used in our experiments. We start with a large number of

components M and with a random initialization of the parameters

(see next section for an example). We have c

¼ N=2. Further-

more, we use Gaussian mixture components with full covariance

matrices. Therefore, if the data is d-dimensional, we have N ¼

d þ dðd þ 1Þ=2 (the number of parameters for a Gaussian with a full

covariance matrix). The online algorithm is then given by:

. Input: new data sample

ðtþ1Þ

, current parameter estimates



ðtÞ

. Calculate “ownerships:” o

ðtÞ

ðtþ1Þ

Þ¼^

ðtÞ

ðtþ1Þ

;



ðtÞ

Þ=

pð

ðtþ1Þ

;



ðtÞ

Þ.

. Update mixture weights: ^

ðtþ1Þ

¼ ^

ðtÞ

þ ð

ðtÞ

ðtþ1Þ

1Mc

 ^

ðtÞ

Þ



1Mc

. Check if there are irrelevant components: if ^

ðtþ1Þ

< 0,

discard the component m, set M ¼ M  1 and renormalize

the remaining mixing weights.

. Update the rest of the parameters:



ðtþ1Þ



ðtÞ

þ w

 (where w ¼ 

ðtÞ

ð~xx

ðtþ1Þ

^

ðtÞ

and

 ¼

ðtþ1Þ



ðtÞ

Þ.

ðtþ1Þ

ðtÞ

þ wð





ðtÞ

Þ (tip: limit the update

speed w ¼ minð20; wÞ).

. Output: new parameter estimates



ðtþ1Þ

This simple algorithm can be implemented in only a few lines

of code. The recommended upper limit 20 for w simply means

that the updating speed is limited for the covariance matrices of

the components representing less than 5 percent of the data. This

was necessary since



is a singular matrix and the covariance

matrices may become singular if updated too fast.

7EXPERIMENTS

In this section, we demonstrate the algorithm performance on a

few standard problems. We show summary results from 100 trials

for each data set. For the real-world data sets, we randomly sample

from the data to generate longer sequences needed for our

sequential algorithm. First, for each of the problems, we present

in Fig. 1 how the selected number of components of the mixture

was changing when new samples are sequentially added. The

number of components that was finally selected is presented in the

form of a histogram for the 100 trials. In Fig. 2, we present a

comparison with some batch algorithms and study the influence of

the parameter .

The random initialization of the parameters is the same as in

[6]. The means



ð0Þ

of the mixture components are initialized by

some randomly chosen data points. The initial covariance matrices

are a fraction (1=10 here) of the mean global diagonal covariance

matrix:

ð0Þ

10d

trace

i¼1

ðiÞ





Þð

ðiÞ





Þ

where



 ¼

i¼1

ðiÞ

is the global mean of the data and I is the

identity matri x with proper dimensions. We used the first

n ¼ 100 samples (it is also possible to estimate this initial

covariance matrix recursively). Finally, we set the initial mixing

weights to ^

ð0Þ

¼ 1=M. The initial number of components M

should be large enough so that the initialization reasonably covers

the data. We used here the same initial number of components as

in [6].

7.1 The “Three Gaussians” Data Set

First, we analyze a Gaussian mixture with mixing weights



¼ 

¼ 1=3, means 

¼½0  2

, 

¼½00

, 

¼½02

and covariance matrices

¼ C

00:2



A modified version of the EM called “DAEM” from [17] was able

to find the correct solution using a “bad” initialization. For a data

set with 900 samples, they needed more than 200 iterations to get

close to the solution. Here, we start with M ¼ 30 mixture

components. With random initialization, we performed 100 trials

and the new algorithm was always able to find the correct solution

while simultaneously estimating the parameters of the mixture and

selecting the number of components. A similar batch algorithm

from [6] needs about 200 iterations to identify the three

components (on a data set with 900 samples). From the plot in

Fig. 1, we see that already after 9,000 samples the new algorithm is

usually able to identify the three components. The computation

costs for 9,000 samples are approximately the same as for only

10 iterations of the EM algorithm on a data set with 900 samples.

Consequently, the new algorithm for this data set is about 20 times

faster in finding a similar solution (a typical solution is presented

in Fig. 1 by the “ ¼ 2 contours” of the Guassian components). In

[9], some approximate recursive versions of the EM algorithm

were compared to the standard EM algorithm and it was shown

that the recursive versions are usually faster. This is in correspon-

dence with our results. Empirically, we decided that 50 samples

per class are enough and used  ¼ 1=150.

7.2 The “Iris” Data Set

We disregard the class information from the well-known 3-class,4-

dimensional “Iris” data set [2]. From the 100 trials, the clusters

were properly identified 81 times. This shows that the order in

which the data is presented can influence the recursive solution.

The data set had only 150 samples (50 per class) that were repeated

many times. We expect that the algorithm would perform better

with more data samples. We used  ¼ 1=150. The typical solution

in Fig. 1 is presented by projecting the 4-dimensional data to the

first two principal components.

7.3 The “Shrinking Spiral” Data Set

This data set presents a 1-dimensional manifold (“shrinking spiral”)

in the three dimensions with added noise: ~xx ¼½ð13  0:5tÞ cos t

ð0:5t  13Þ sin ttþ

nn,witht  Uniform½0; 4 and the noise

nn Nð0;IÞ. The modified EM called “SMEM” from [18] was

reported to be able to fit a 10 component mixture in about

350 iterations. The batch algorithm from [6] is fitting the mixture

and selecting 11, 12, or 13 components using typically 300 to

400 iterations for a 900 samples data set. From the graph in Fig. 1, it is

clear that we achieve similar results, but much faster. About

18,000 samples was enough to arrive at a similar solution.

Consequently, again, the new algorithm is about 20 times faster.

There are no clusters in this data set. The fixed  has as the effect that

the influence of the old data is downweighted by the exponential

decaying envelope ð1  Þ

tk

(for k<t). For comparison with the

other algorithms that used 900 samples, we limited the influence of

the older samples to 5 percent of the influence of the current sample

by  ¼logð0:05Þ=900. In Fig. 1, we present a typical solution by

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 26, NO. 5, MAY 2004 653

showing for each component the eigenvector corresponding to the

largest eigenvalue of the covariance matrix.

7.4 The “Enzyme” Data Set

The 1-dimensional “Enzyme” data set has 245 data samples. It was

shown in [11] using the MCMC that the number of components

supported by the data is most likely four, but two and three are

also good choices. Our algorithm arrived at similar solutions. In a

similar way as before, we used  ¼logð0:05Þ=245.

7.5 Comparison with Some Batch Algorithms

The following s tandard batch methods were considered for

comparison: the EM algorithm initialized using the result from

k-means clustering; the SMEM method [18]; the greedy EM method

[19] that st arts wi th a single component and adds new

ones—reported to be faster than the elaborate SMEM. We used

900 samples for the “Three Gaussians” and the “Shrinking Spiral”

data sets. The batch algorithms assume a known number of

components: three for the “Three Gaussians” and the “Iris” data,

13 for the “Shrinking Spiral,” and four for the “Enzyme” data set.

Our new unsupervised recursive algorithm RUEM has selected on

average approximately the same number of components for the

chosen . All the iterative batch algorithms in our experiments

stop if the change in the log-likelihoo d is less than 10

5

. The

results are presented in Fig. 2a. The best likelihood and the lowest

standard deviation are reported in bold. We also added the ideal

ML result obtained using a carefully initialized EM. For the “Iris”

data, the EM was initialized using the means and the covariances

of the three classes. However, the solution where the two close

clusters are modeled using one component was better in terms of

likelihood. This “wrong” solution was found occasionally by some

of the algorithms. The results from the RUEM are biased.

Furthermore, the parameter  is controlling the speed of updating

the parameters and, therefore, also the effective amount of data

that is considered. Therefore, we present also the results

“polished” by additionally applying the EM algorithm and using

the same sample size for the batch algorithms. The RUEM results

and the “polished” results are better or similar to the batch results.

We also observe that the greedy EM algorithm has problems with

the “Iris” and the “Shrinking spiral” data.

7.6 The Influence of the Parameter 

In Figs. 2b and 2c, we show the influence of the parameter  on the

selected number of components. We also plot the log-likelihood

654 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 26, NO. 5, MAY 2004

Fig. 1. Model selection results for a few standard problems (summary from 100 trials).

Recursive unsupervised learning of finite mixture models

Figures

Citations

Improved adaptive Gaussian mixture model for background subtraction

Efficient adaptive density estimation per image pixel for the task of background subtraction

Efficient greedy learning of Gaussian mixture models

An overview of clustering methods

Multivariate online kernel density estimation with Gaussian kernels

References

Maximum likelihood from incomplete data via the EM algorithm

A new look at the statistical model identification

Estimating the Dimension of a Model

Estimating the dimension of a model

Bayesian Data Analysis

Related Papers (5)

Unsupervised learning of finite mixture models

Adaptive background mixture models for real-time tracking

Pfinder: real-time tracking of the human body

Maximum likelihood from incomplete data via the EM algorithm

Non-parametric Model for Background Subtraction

Frequently Asked Questions (7)

Q1. What have the authors contributed in "Recursive unsupervised learning of finite mixture models - pattern analysis and machine intelligence, ieee transactions on" ?

Q2. How many trials did the algorithm perform?

Q3. How many iterations is required to fit the mixture?

Q4. What is the main limitation of the EM algorithm?

Q5. What is the heuristic used to estimate the parameters of a mixture?

Q6. How many components are used in the batch algorithm?

Q7. What is the optimum log-likelihood for the available data?