What have the authors contributed in "Articulated body motion capture by annealed particle filtering" ?

In contrast, here the authors aim for general tracking without special preparation of subjects or restrictive assumptions. The principal contribution of this paper is the development of a modified particle filter for search in high dimensional configuration spaces. The new algorithm, termed annealed particle filtering, is shown to be capable of recovering full articulated body motion efficiently.

What is the problem with using articulated models?

The problem with using articulated models is the high dimensionality of the configuration space and the exponentially increasing computational cost that results.

What is the main contribution of this paper?

The principal contribution of this paper is the development of a modified particle filter for searching high dimensional configuration spaces which does not rely on such assumptions.

How many particles were used to process 4 seconds of video?

Experiments with 40000 particles were carried out taking over 30 hours to process just 4 seconds of video, still with negative results.

What is the common way to evaluate a probabilistic likelihood?

Often an intuitive weighting function w(Zk;X) can be constructed that approximates the probabilistic likelihood p(ZkjXk) but which requires much less computational effort to evaluate.

What is the main obstacle to practical human motion capture?

The main obstacle to practical human motion capture is the high number of dimensions associated with an articulated full-body model.

(Open Access) Articulated body motion capture by annealed particle filtering (2000) | Jonathan Deutscher

Q: How many DOF do models for character animation have?

The model used in this paper for example has 29 DOF, and models employed for commercial character animation usually have over 40.

Q: Why was it decided to continue to use a particle based stochastic framework?

It was decided to continue to use a particle based stochastic framework because of its ability to handle multi-modal likelihoods, or in the case of a weighting function, one with many local maxima.

Q: What is the usual method for annealing?

The usual method is to set pm(x) / p0(x) m , for 1 = 0 > 1 > : : : > M .An annealing run is started in some initial state, from which a Markov chain designed to converge to pM is first simulated.

Articulated Body Motion Capture by Annealed Particle Filtering

Jonathan Deutscher

University of Oxford

Dept. of Engineering Science

Oxford, OX13PJ

United Kingdom

jdeutsch@robots.ox.ac.uk

Andrew Blake

Microsoft Research

1 Guildhall St,

Cambridge, CB2 3NH

United Kingdom

ablake@microsoft.com

Ian Reid

University of Oxford

Dept. of Engineering Science

Oxford, OX13PJ

United Kingdom

ian@robots.ox.ac.uk

Abstract

The main challenge in articulated body motion track-

ing is the large number of degrees of freedom (around 30)

to be recovered. Search algorithms, either deterministic or

stochastic, that search such a space without constraint, fall

foul of exponentialcomputational complexity. One approach

is to introduce constraints — either labelling using markers

or colour coding, prior assumptions about motion trajec-

tories or view restrictions. Another is to relax constraints

arising from articulation, and track limbs as if their mo-

tions were independent. In contrast, here we aim for general

tracking without special preparation of subjects or restric-

tive assumptions.

The principal contribution of this paper is the develop-

ment of a modiﬁed particle ﬁlter for search in high dimen-

sional conﬁguration spaces. It uses a continuation princi-

ple, based on annealing, to introduce the inﬂuence of narrow

peaks in the ﬁtness function, gradually. The new algorithm,

termed annealed particle ﬁltering, is shown to be capable of

recovering full articulated body motion efﬁciently.

1. Introduction

Marker-based human motion capture has been used commer-

cially [19] for a number of years with applications found

in special effects and biometrics. The use of markers how-

ever is intrusive, necessitates the use of expensivespecialised

hardware and can only be used on footage taken especially

for that purpose. A markerless system of human motion cap-

ture could be run usingconventionalcamerasand withoutthe

use of special apparel or other equipment. Combined with

today’s powerful off-the-shelf PC’s, cost-effective and real-

time markerless human motion capture has for the ﬁrst time

become a possibility. Such a system would have a greater

number of applications than its marker based predecessor

ranging from intelligent surveillance to character animation

and computer interfacing. For this reason the ﬁeld of human

motion capture has recently seen somewhat of a renaissance.

Research into human motion capture has so far failed to

produce a full-body tracker general enough to handle real-

istic real-world applications. This gives an insight into the

difﬁculty of the problem. Research has concentrated on the

articulated-model based approach. The reason this approach

is popular is the high level output it produces in the form of

a model conﬁguration for each frame. This output can easily

be used by higher-order processes to perform tasks such as

character animation.

The problem with using articulated models is the high

dimensionality of the conﬁguration space and the exponen-

tially increasing computational cost that results. A realistic

articulated model (see ﬁgure 4) of the human body usually

has at least 25 DOF. The model used in this paper for ex-

ample has 29 DOF, and models employed for commercial

character animation usually have over 40.

A number of effective 2D systems have been presented

[7] [10]. These are good for applications such as surveil-

lance, however they do not provide output in the form of 3D

model conﬁgurations that are needed for applications such

as 3D character animation.

There are several possible strategies for reducing the di-

mensionality of the conﬁguration space. Firstly it is possible

to restrict the range of movement of the subject. This ap-

proach has been pursued by Hogg [8], Rohr [17] and Niyogi

[15]. All three assume the subject is walking. Rohr even

reduces the dimension of the problem to the phase of the

walking cycle. Goncalves [6] and Deutscher [3] assume a

constant angle of view of the subject as does Bregler [2] and

Rehg [16]. Such an approach greatly restricts the resulting

trackers generality.

Another way to constrain the conﬁguration space is to

perform a hierarchical search. If one part of an articulated

model can be localised independently then it can be used as

a constraint for reducing the rest of the model. Gavrila [4]

does just this when he uses what he terms search space de-

composition. He is able to localise the torso using colour

cues and uses this information to constrain the search for

the limbs. Without the assistance of colour cues (or other

labelling cues) however it is very hard to independently

localise speciﬁc body parts in realistic scenarios. This is

mainly due to the problem of self occlusion and rules out

the use of a hierarchical search.

For a practical full body tracker to be developed it can-

not rely on assumptions about motion, angle of view or the

availability of labelling cues. The principal contribution of

this paper is the development of a modiﬁed particle ﬁlter for

searching high dimensional conﬁguration spaces which does

not rely on such assumptions. It uses a continuation princi-

ple, based on annealing, to introduce the inﬂuence of narrow

peaks in the ﬁtness function, gradually. The new algorithm,

termed annealed particle ﬁltering, is shown to be capable of

recovering full articulated body motion efﬁciently.

2. Particle ﬁlters

Particle ﬁltering (also known as the Condensation algorithm

[9]) provides a robust Bayesian framework for human mo-

tion capture. The Condensation algorithm was developed

for tracking objects in clutter, in which the posterior den-

sity

(

)

and the observationprocess

(

)

are often

non-Gaussian or even multi-modal (

denotes the model’s

conﬁguration vector,

;:::;

notates the his-

tory of observations at time

). The complicated nature of

the observation process during human motion capture causes

the posterior density to be non-Gaussian and multi-modal as

shown by Deutscher [3]. It is well knownthat a Kalman ﬁlter

will fail in this case. Deutscher et al were able to show that

the use of a particle ﬁlter will improvetracking performance.

The posterior density

(

)

is represented by a set of

weighted particles

(

(0)

;

(0)

)

:::

(

)

;

(

)

where the

weights



(

)

(

)

are normalised so that



(

)

= 1

. The state

at each time step

can be

estimated by

[



(

)

(

)

(1)

or the mode

[

(

)

;

(

)

=max(



(

)

(2)

of the posterior density

(

)

Particle ﬁltering works well because it can model uncer-

tainty. Less likely model conﬁgurations will not be thrown

away immediately but given a chance to prove themselves

later on, resulting in more robust tracking. However a price

is paid for these attributes in computational cost. The most

expensive operation in the standard Condensation algorithm

is an evaluation of the likelihood function

(

)

and this has to be done once at every time step for every

particle. To maintain a fair representation of

(

)

a cer-

tain number of particles are required, and this number grows

with the size of the model’s conﬁguration space. In fact it

has been shown by MacCormick and Blake [14] that



min



(3)

where

is the number of particles required,

is the number

of dimensions. The survival diagnostic

min

and the parti-

cle survival rate



are both constants with

<<

.Anex-

planation of both of these constants can be found in section

5. Clearly when

is large normal particle ﬁltering becomes

infeasible.

Partitioned sampling was developed by MacCormick and

Blake[13] as a variationon Condensationto reduce the num-

ber of particles needed to track more than one object. Mac-

Cormick [14] has also now applied this technique to tracking

articulated objects. Using partitioned sampling reduces the

number of particles required to

min



(4)

making the problem tractable. However, this assumes that

the conﬁguration space can be sliced so that one can con-

struct an observation density

(

)

for each dimension

of the model conﬁguration vector

:::x

.This

assumption, that it is possible to independently localise sep-

arate parts of an articulated model, is similar to that make by

Gavrila to enable a hierarchical search. It has already been

argued that it is not possible to use this approach without the

use of labelling cues.

Another variation on the standard particle ﬁlter used to

reduce the number of particles needed to effectively repre-

sent a posterior density has been developed by Sullivan et

al [18]. Called layered sampling it is centered around the

concept of importance resampling. Experimental evidence

however suggests that this technique is not sufﬁcient to solve

the problem of tracking with

, reducing the number

of particles required by at best a factor of 5 to 10 before the

expected behaviour of the Condensation framework breaks

down.

The second reason why Bayesian particle ﬁltering may

not be suitable for full body human motion capture is the

difﬁculties associated with constructing a valid observation

model

(

)

as a normalised probability density distri-

bution. Another factor is the computational cost of calculat-

ing

(

)

. Often an intuitive weighting function

(

;

)

can be constructed that approximates the proba-

bilistic likelihood

(

)

but which requires much less

computational effort to evaluate. Probabilistic observation

models also have a tendency to utilise only the information

that can be modelled well, discarding other available infor-

mation.

Given these factors it was decided to reduce the prob-

lem from propagating the conditional density

(

)

us-

ing

(

)

to ﬁnding the conﬁguration

which returns

the maximum value from a simple and efﬁcient weighting

function

(

;

)

at each time

,given

. By doing

this gains will be made on two fronts. It should be possi-

ble to make do with fewer likelihood (or weighting function)

evaluations because the function

(

)

no longer has to

be fully represented and an evaluation of a simple weighting

function

(

;

)

should require minimal computational

effort when compared to an evaluation of the observation

model

(

)

. The main disadvantage will be not being

able to work within a robust Bayesian framework.

It was decided to continue to use a particle based stochas-

tic framework because of its ability to handle multi-modal

likelihoods, or in the case of a weighting function, one with

many local maxima. The question is: What is an efﬁcient

way to perform a particle based stochastic search for the

global maximum of a weighting function with many local

maxima? It was decided to use an approach which is sim-

ilar to that of simulated annealing.

3. Simulated annealing

The Markov chain based method of simulated annealing was

developed by Kirkpatrick et al [11] as a way of handling

multiple modes in an optimisation context. It employs a

series of distributions, with probability densities given by

(

)

(

)

, in which each

(

)

differs only slightly

from

(

)

. Samples actually need to be drawn from

the distribution

(

)

. The distribution

is designed so

that the Markov chain used to sample from it allows move-

ment between all regionsof the state/search space. The usual

method is to set

(

)

(

)



,for

1 =



> 

::: > 

An annealing run is started in some initial state, from

which a Markov chain designed to converge to

is ﬁrst

simulated. Some number of iterations of a Markov chain de-

signed to converge to

are simulated next, starting from

the ﬁnal state of the previoussimulation. This process is con-

tinued in this fashion, using the ﬁnal state of the simulation

for

as the initial state for the simulation for

, until

the chain designed to converge to

is ﬁnally simulated.

Note that if

contains isolated modes, simply simulat-

ing the Markov chain designed to converge to

starting

from some arbitrary point could give very poor results, as it

might become stuck in whatever mode is closest to the start-

ing point, even if that mode has little of the total probability

mass. The annealing process is a heuristic for avoiding this,

by taking advantage of the freer movement possible under

the other distributions. This is exactly the kind of behaviour

needed for the stochastic search. One wants to move towards

the global maximum of the weighting function

(

;

)

using the overall trend of the matching function as a guide,

without becoming misguided by local maxima as seen in ﬁg-

ure 1.

The idea of annealing for optimisation is now adapted to

perform a particle based stochastic search within the frame-

work of an annealed particle ﬁlter.

4. Annealed particle ﬁlter

A series of weighting functions

(

;

)

(

;

)

are

employed in which each

differs only slightly from

(see ﬁgure 2, where

). The function

is designed

to be very broad, representing the overall trend of the search

space while

should be very peaked, emphasising local

features. This is achieved by setting

(

;

(

;

)



;

(5)

for



> 

> ::: > 

,where

(

;

)

is the original

weighting function. Because it is not the aim to sample from

(

;

)

, but only to ﬁnd its maximum it is not required that



One annealing run is performedat each time

using im-

age observations

. The state of the tracker after each layer

of an annealing run is represented by a set of

weighted

particles



k;m

(

(0)

k;m

;

(0)

k;m

)

:::

(

)

k;m

;

(

)

k;m

)

(6)

An unweighted set of particles will be denoted

k;m

(

(0)

k;m

)

:::

(

)

k;m

)

(7)

Each particle in the set



k;m

is considered as an

(

)

k;m

;

(

)

k;m

)

pair in which

(

)

k;m

is an instance of the multi-variate model

conﬁguration

,and



(

)

k;m

is the corresponding particle

weighting. Each annealing run can be broken down as fol-

lows (the process is illustrated in ﬁgure 2).

1. For every time step

an annealing run is started at layer

, with

2. Each layer of an annealing run is initialised by a set of

un-weighted particles

k;m

3. Each of these particles is then assigned a weight



(

)

k;m

(

;

(

)

k;m

)

(8)

which are normalised so that



(

)

k;m

.Thesetof

weighted particles



k;m

has now been formed.

particles are drawn randomly from



k;m

with replace-

ment and with a probability equal to their weighting



(

)

k;m

.Asthe

particle

(

)

k;m

is chosen it is used to

produce the particle

(

)

k;m

using

(

)

k;m

(

)

k;m

(9)

where

is a multi-variate gaussian random variable

with variance

and mean

5. The set

k;m

has now been produced which can be

used to initialise layer

. The process is repeated

until we arrive at the set



is used to estimate the optimal model conﬁguration

using

(

)



(

)

(10)

7. The set

is then produced from



using

(

)

(

)

(11)

This set is then used to initialise layer

of the next

annealing run at

(X)

(k,3)

Figure 1: Illustration of the annealed particle ﬁlter with M = 1.

Even though a large number of particles are used (so that an equiva-

lent number of weighting function evaluations are made as in ﬁgure

2), the search is misdirected by local maxima. From the resulting

weighted set it is very hard to tell where the global maximum of

lies.

5. Setting the tracking parameters

As stated previously the function

(

;

)

, used in each

layer of the annealing process is determined by

(

;

(

;

)



(12)

with



>

>:::>

.Thevalueof



will determine

the rate of annealing at each layer. A large



will produce

a peaked weighting function

resulting in a high rate of

annealing. Small values of



will have the opposite effect.

If the rate of annealing is too high the inﬂuence of local

maxima will distort the estimate of

as seen in ﬁgure 1. If

the rate is too low

will not be determined with enough

resolution (unless more layers are used wasting computa-

tional resources).

A good measure of the effective number of particles that

will be chosen for propagation to the next layer is the sur-

vival diagnostic

(taken from [14]) where

(



(

)

(13)

and from this a good measure for the rate of annealing can

be derived, called the particle survival rate



[5] [12]



(14)

Now a measure for the rate of annealing has been derived

it is possible to set the values of



;:::;

at each time

step

. At layer

in an annealing run,



from

is used to calculate a preliminary set of particle weights for



k;m

. From this set an initial rate of annealing



init

can

(X)

k,3

k,2

k,1

k,0

k+1,3

k,1

k,0

Figure 2: Illustration of the annealed particle ﬁlter with M = 3.

With a multi-layered search the sparse particle set is able to gradu-

ally migrate towards the global maximum without being distracted

by local maxima. The ﬁnal set



provides a good indication of

the weighting function’s global maximum.

be calculated using equations 13 and 14. It can be shown

that

(



)

is monotonic decreasing in



so that, given



,the

equation

(



N

(15)

has a unique solution for



. With this knowledge we can

minimise the error function





between the desired rate of

annealing



and the initial rate of annealing



init





(



)=(



init

(



))

;

(16)

using gradient descent to ﬁnd the desired



. Note that

this does not mean the weights have to be completely re-

evaluated each time



is adjusted during gradient descent.

Since

(

;

) =

(

;

)



the values

(

;

(

)

k;m

)

: 1

:::N

can be stored for each set

k;m

and



applied to each individual weight as appropriate to produce



k;m

How then are the appropriate values for



:::

deter-

mined? There are also a number of other tracking parameters

1440 1540 1640

600

725

850

k−1,0

1440 1540 1640

600

725

850

k,3

1440 1540 1640

600

725

850

k,9

1440 1540 1640

600

725

850

k,2

1440 1540 1640

600

725

850

k,8

1440 1540 1640

600

725

850

k,1

1440 1540 1640

600

725

850

k,7

1440 1540 1640

600

725

850

k,0

Figure 3: Annealed particle ﬁlter in progress. The sets

k;m

are

plotted here, taken while tracking the walking person as seen in

ﬁgure 9. Only the horizontal translation components

and

the model conﬁguration vector

are shown. Starting with

;

from the previous time step the particles are diffused to form

which easily covers the expected range of translational movement

of the subject. The particles and are then slowly annealed over 10

layers (the sets

are omitted for brevity) to produce

which is clustered around the maximum of the weighting function.

that need to be set before tracking can begin, including the

number of particles

, the number of annealing layers

and the diffusion variance vectors

:::

. A tentative

framework has been developed to allocate values to these pa-

rameters although it is acknowledged that more work needs

to be done in this area.

1. The ﬁrst step is to decide on how many annealing layers

are needed. It was found that doubling the number of

annealing layers reduces the number of particles needed

for successful tracking by more than half. This will only

work up to a point however as there seems to be a min-

imum number (

) of particles needed for tracking no

matter how many layers are used. Using a 30 DOF model

it was found that setting

=10

with



200

worked

well.

2. Each element in the vector

is allocated a value equal

to half the maximum expected movement of the corre-

sponding model conﬁguration parameter over one time

step. In this way the set

shouldcoverall possible

movements of the subject between time

and

.The

amount of diffusion added to each successive annealing

layer should decrease at the same rate as the resolution

of the set

k;m

increases. It has been found that setting

(



:::

)

(17)

produces good results.

3. The appropriate rates of annealing



:::

are inﬂu-

enced by the number of annealing layers used. With a

higher number of annealing layers a lower rate of an-

nealing can be used to obtain the desired resolution. It

was found that while using 10 annealing layers setting



:::



provided sufﬁcient resolu-

tion of

6. The model

The articulated model of the human body used in this pa-

per is built around the framework of a kinematic chain, as

seen in ﬁgure 4. Each limb is ﬂeshed out using conic sec-

tions with elliptical cross-sections. It is believed that such a

model has a number of advantages including computational

simplicity, high-level interpretation of output and compact

representation.

(a) (b)

Figure 4: The model is based on a kinematic chain consisting of 17

segments (a). Six degrees of freedom are given to base translation

and rotation. The shoulder and hip joints are treated as sockets

with 3 degrees of freedom, the clavicle joints are given 2 degrees of

freedom (they are not allowed to rotate about their own axis) and the

remaining joints are modelled as hinges requiring only one. This

results in a model with 29 degrees of freedom and a conﬁguration

vector

:::x

. The model is ﬂeshed out by conical

sections (b).

7. The weighting function

When deciding which image features are to be used to con-

struct the weighting function a number of factors must be

taken into account.



Generality. The image features used should be invariant

under a wide range of conditions so that the same track-

ing framework will function well is a broad variety of

situations.

Articulated body motion capture by annealed particle filtering

Figures

Citations

Computer Vision: Algorithms and Applications

A survey of advances in vision-based human motion capture and analysis

Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments

Monocular Pedestrian Detection: Survey and Experiments

A Boosted Particle Filter: Multitarget Detection and Tracking

References

Bayesian Inference in Econometric Models Using Monte Carlo Integration

Contour Tracking by Stochastic Propagation of Conditional Density

Tracking people with twists and exponential maps

3-D model-based tracking of humans in action: a multi-view approach

Analyzing and recognizing walking figures in XYT

Related Papers (5)

C ONDENSATION —Conditional Density Propagation forVisual Tracking

Tracking people with twists and exponential maps

3-D model-based tracking of humans in action: a multi-view approach

A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking

A Survey of Computer Vision-Based Human Motion Capture

Frequently Asked Questions (12)

Q1. What have the authors contributed in "Articulated body motion capture by annealed particle filtering" ?

Q2. What is the problem with using articulated models?

Q3. What is the main contribution of this paper?

Q4. How many particles were used to process 4 seconds of video?

Q5. What is the common way to evaluate a probabilistic likelihood?

Q6. What is the expensive operation in the standard Condensation algorithm?

Q7. How many DOF do models for character animation have?

Q8. Why was it decided to continue to use a particle based stochastic framework?

Q9. What is the usual method for annealing?

Q10. What is the main obstacle to practical human motion capture?

Q11. How many annealing layers are needed for tracking?

Q12. What is the structure of the human body used in this paper?