How does the LLE preserve topological constraints?

The locally linear embedding (LLE) (Roweis & Saul, 2000) preserves topological constraints by finding a representation based on reconstruction in a low dimensional space with an optimized set of local weightings.

How many gait cycles are in the larger training set?

The larger training set comprises approximately one gait cycle from each of 9 walking and 10 running motions performed by different subjects at different speeds (3 km/h for walking, 6–12 km/h for running).

How many gait cycles did the authors first consider?

The authors first considered a small training set comprised of 4 gait cycles (2 walks and 2 runs) performed by one subject at different speeds.

How does the paper demonstrate the effectiveness of the approach?

The authors demonstrate the effectiveness of their approach in a character animation application, where the user specifies a set of constraints (e.g., foot locations), and the remaining kinematic degrees of freedom are infered.

What is the way to generalize the model to unseen styles?

The model can also generalize to styles very differentzfrom the ones in the training set, by imposing constraints that can be satisfied only by motions very different from the training data.

What is the first simulation of the GPDM?

For the first simulation (depicted in green), the model is initialized to a running pose with a latent position not far from walking data.

how do you learn a walking, running and jumping model?

Although the authors have learned models composed of walking, running and jumping, their framework is general, being applicable in any data sets where there is a large degree of prior knowledge for the problem domain, but the data availability is relatively sparse compared to its complexity.

How do the authors force a cylindrical topology on the latent space?

To force a cylindrical topology on the latent space, the authors specify different covariances for each latent dimensionCcosk,j = (cos(φi) − cos(φk)) (cos(φi) − cos(φj)) (7) Csink,j = (sin(φi) − sin(φk)) (sin(φi) − sin(φj)) , (8)with k, j ∈ ηi.

(Open Access) Topologically-constrained latent variable models (2008) | Raquel Urtasun

Q: What have the authors contributed in "Topologically-constrained latent variable models" ?

In this paper, the authors present a range of approaches for embedding data in a non-Euclidean latent space.

Q: What is the way to encourage transitions between different sequences?

The authors can encourage transition points of different sequences to be proximal with the following kernel matrix for the back-constraint mapping:ktrans(ti, tj) = ∑m∑lδmlk(ti, t̂m)k(tj , t̂l) (9)where k(ti, t̂l) is an RBF centered at t̂l, and δml = 1 if t̂m and t̂l are in the same set.

Q: What is the first step in the GP-LVM?

When one has time-series data, Y represents a sequence of observations, and it is natural to augment the GP-LVM with an explicit dynamical model.

Q: What is the difference between the two mappings?

These two mappings project onto two dimensions of the latent space, forcing them to have a periodic structure (which comes about through the sinusoidal dependence of the kernel on phase).

Topologically-Constrained Latent Variable Models

Raquel Urtasun rurtasun@csail.mit.edu

UC Berkeley EECS & ICSI; CSAIL MIT

David J. Fleet fleet@cs.toronto.edu

University of Toronto

Andreas Geiger geiger@mrt.uka.de

Karlsruhe Institute of Technology

Jovan Popovi´c jovan@csail.mit.edu

CSAIL MIT

Trevor J. Darrell trevor@eecs.berkeley.edu

UC Berkeley EECS & ICSI; CSAIL MIT

Neil D. Lawrence Neil.Lawrence@manchester.ac.uk

University of Manchester

Abstract

In dimensionality reduction approaches, the

data are typically embedded in a Euclidean

latent space. However for some data sets this

is inappropriate. For example, in human mo-

tion data we expect latent spaces that are

cylindrical or a toroidal, that are poorly cap-

tured with a Euclidean space. In this paper,

we present a range of approaches for embed-

ding data in a non-Euclidean latent space.

Our focus is the Gaussian Process latent vari-

able model. In the context of human motion

modeling this allows us to (a) learn models

with interpretable latent directions enabling,

for example, style/content separation, and

(b) generalise beyond the data set enabling

us to learn transitions between motion styles

even though such transitions are not present

in the data.

1. Introduction

Dimensionality reduction is a popular approach to

dealing with high dimensional data s ets. It is of-

ten the case that linear dimensionality reduction, such

as principal component analysis (PCA) does not ad-

equately capture the structure of the data. For this

App earing in Proceedings of the 25

International Confer-

ence on Machine Learning, Helsinki, Finland, 2008. Copy-

right 2008 by the author(s)/owner(s).

reason there has been considerable interest in the ma-

chine learning community in non-linear dimensionality

reduction. Approaches such as locally linear embed-

ding (LLE), Isomap and maximum variance unfold-

ing (MVU) (Roweis & Saul, 2000; Tenenbaum et al.,

2000; Weinberger et al., 2004) all deﬁne a topology

through interconnections between points in the data

space. However, if a given data set is relatively sparse

or particularly noisy, these interconnections can stray

beyond the ‘true’ local neighbourhood and the result-

ing embedding can be poor.

Probabilistic formulations of latent variable models do

not usually include explicit constraints on the embed-

ding and therefore the natural topology of the data

manifold is not always respected

. Even with the cor-

rect topology and dimension of the latent space, the

learning might get stuck in local minima if the initial-

ization of the model is p oor. Moreover, the maximum

likelihoo d s olution may not be a good model, due e.g.,

to the sparseness of the data. To get better models in

such cases, more constraints on the mo del are needed.

This paper shows how explicit topological constraints

can be imposed within the context of probabilistic la-

tent variable models. We describe two approaches,

both within the context of the Gaussian process la-

tent variable model (GP-LVM) (Lawrence, 2005). The

An exception is the back-constrained GP-LVM

(Lawrence & Qui˜nonero-Candela, 2006) where a con-

strained maximum likelihood algorithm is used to enforce

these constraints.

Topologically-Constrained Latent Variable Models

ﬁrst uses prior distributions on the latent space that

encourage a given topology. The second inﬂuences

the latent space and optimisation through constrained

maximum likelihood.

Our approach is motivated by the problem of model-

ing human pose and motion for character animation.

Human motion is an interesting domain because, while

there is an increasing amount of motion capture data

available, the diversity of human motion means that

we will necessarily have to incorporate a large amount

of prior knowledge to learn probabilistic models that

can accurately reconstruct a wide range of motions.

Despite this, most existing methods for learning pose

and motion models (Elgammal & Lee, 2004; Grochow

et al., 2004; Urtasun et al., 2006) do not fully exploit

useful prior information, and many are limited to mod-

eling a single human activity (e.g., walking with a par-

ticular style).

This paper describes how prior information can be

used eﬀectively to learn models with speciﬁc topologies

that reﬂect the nature of human motion. Importantly,

with this information we can also model multiple ac-

tivities, including transitions between them (e.g. from

walking to running), even when such transitions are

not present in the training data. As a consequence,

we can now learn latent variable models with training

motions comprising multiple subjects with stylistic di-

versity, as well as multiple activities, such as running

and walking. We demonstrate the eﬀectiveness of our

approach in a character animation application, where

the user speciﬁes a set of constraints (e.g., foot loca-

tions), and the remaining kinematic degrees of freedom

are infered.

2. Gaussian Process Latent Variable

Models (GP-LVM)

We begin with a brief review of the GP-LVM

(Lawrence, 2005). The GP-LVM represents a high-

dimensional data set, Y, through a low dimensional

latent space, X, and a Gaussian process mapping

from the latent space to the data space. Let Y =

, ..., y

]

be a matrix in which each row is a single

training datum, y

∈ ℜ

. Let X = [x

, ..., x

]

de-

note the matrix whose rows represent the correspond-

ing positions in latent space, x

∈ ℜ

. Given a covari-

ance function for the Gaussian process, k

(x, x

′

), the

likelihoo d of the data given the latent positions is,

p(Y | X,

β) =

exp



−



−1





, (1)

where Z

is a normalization factor, K

is known as

the kernel matrix, and

β denotes the kernel hyperpa-

rameters. The elements of the kernel matrix are de-

ﬁned by the covariance function, (K

)

i,j

= k

, x

A common choice is the radial basis function (RBF),

(x, x

′

) = β

exp(−

||x − x

′

) +

x,x

′

, where the

kernel hyperparameters

β = {β

, β

} determine the

output variance, the RBF support width, and the vari-

ance of the additive noise. Learning in the GP-LVM

consists of maximizing (1) with respect to the latent

positions, X, and the hyperparameters,

β.

When one has time-series data, Y represents a se-

quence of observations, and it is natural to aug-

ment the GP-LVM with an explicit dynamical model.

For example, the Gaussian Process Dynamical Model

(GPDM) models the sequence as a latent stochastic

process with a Gaussian process prior (Wang et al.,

2008) , i.e.,

p(X | ¯α) =

p(x

)

exp



−



−1

out





(2)

where Z

is a normalization factor, X

out

, ..., x

]

, K

∈ ℜ

(N−1)×(N−1)

is the kernel matrix

constructed from X

= [x

, ..., x

N−1

], x

is given an

isotropic Gaussian prior and ¯α are the kernel hyper-

parameters for K

; below we use an RBF kernel for

. Like the GP-LVM the GPDM provides a gen-

erative model for the data, but additionally it pro-

vides one for the dynamics. One can therefore predict

future observation sequences given past observations,

and simulate new sequences.

3. Top Down Imposition of Topology

The smooth mapping in the GP-LVM ensures that

distant points in data space remain distant in la-

tent space. However, as discussed in (Lawrence &

Qui˜nonero-Candela, 2006), the mapping in the oppo-

site direction is not required to be smooth. While

the GPDM may mitigate this eﬀect, it often produces

models that are neither smooth nor generalize well

(Urtasun et al., 2006; Wang et al., 2008).

To help ensure smoother, well-behaved models,

(Lawrence & Qui˜nonero-Candela, 2006) suggested the

use of back-constraints, where each point in the latent

space is a smooth function of its corresponding point

in data space, x

= g

; a

), where {a

}

1≤j≤d

the set of parameters of the mappings. One possible

mapping is a kernel-based regression model, where re-

gression on a kernel induced feature space provides the

mapping,

m=1

k(y

, y

) . (3)

This approach is known as the back-constrained GP-

LVM. When learning the back-constrained GP-LVM,

Topologically-Constrained Latent Variable Models

(a) (b) (c) (d)

Figure 1. When training data contain large stylistic variations and multiple motions, the generic GPDM (a) and the

back-constrained GPDM (b) do not produce useful models. Simulations of both models here do not look realistic. (c,d)

Hybrid model learned using local linearities for smoothness (i.e., style) and backconstraints for topologies (i.e., content).

The training data is composed of 9 walks and 10 runs performed by diﬀerent subjects and speeds. (c) Likelihood for the

reconstruction of the latent points (d) 3D view of the latent trajectories for the training data in blue, and the automatically

generated motions of Figs. 3 and 4 in green and red respectively.

one needs to determine the hyperparameters of the ker-

nel matrices (for the back-constraints and the covari-

ance of the GP), as well as the mapping weights, {a

(Lawrence & Qui˜nonero-Candela, 2006) ﬁxed the hy-

perparameters of the back-constraint’s kernel matrix,

optimizing over the remaining parameters.

Nevertheless, when learning human motion data with

large stylistic variations or diﬀerent motions, nei-

ther GPDM nor back-constrained GP-LVM produce

smooth models that generalize well. Fig. 1 depicts

three 3–D models learned from 9 walks and 10 runs .

The GPDM (Fig. 1(a)) and the back-constrainted

GPDM

(Fig. 1 (b)) do not generalize well to new runs

and walks, nor do they produce realistic animations.

In this paper we show that with a well designed

set of back-constraints good models can be learned

(Fig. 1(c)). We also consider an alternative approach

to the hard constraints on the latent space arising

from g

; a

). We introduce topological constraints

through a prior distribution in the latent space, based

on a neighborhood structure learned through a gener-

alized local linear embedding (LLE) (Roweis & Saul,

2000). We then show how to incorporate domain-

speciﬁc prior knowledge, which allows us to develop

motion models with speciﬁc topologies that incorpo-

rate diﬀerent activities within a single latent space and

transitions between them.

3.1. Locally Linear GP-LVM

The locally linear embedding (LLE) (Roweis & Saul,

2000) preserves topological constraints by ﬁnding a

representation based on r econstruction in a low dimen-

sional space with an optimized set of local weightings.

Here we show how the LLE objective can be combined

with the GP-LVM, yielding a locally linear GP-LVM

(LL-GPLVM).

We use an RBF kernel for the inverse mapping in (3).

The locally linear embedding assumes that each data

point and its neighbors lie on, or close to, a locally

linear patch on the data manifold. The local geome-

try of these patches can then be characterized by lin-

ear coeﬃcients that reconstruct each data point from

its neighbors. This is done in a three step proce-

dure: (1) the K nearest neighbors, {y

}

j∈η

, of each

point, y

, are computed using Euclidean distance in

the input space, d

= ||y

− y

; (2) the weights

w = {w

} that best reconstruct each data point

from its neighbors are obtained by minimizing Φ(w) =

i=1

||y

−

j∈η

; and (3) the latent positions

best reconstructed by the weights w

are computed

by minimizing Φ(X) =

i=1

||x

−

j∈η

In the LLE, the weight matrix w is sparse (only a small

number of neighbors is used), and the two minimiza-

tions can be computed in closed form. In particular,

computing the weights can be done by solving, ∀j ∈ η

the following system,

sim

= 1 , (4)

where C

sim

= (y

− y

)

− y

) if j, k ∈ η

, and 0

otherwise. Once the weights are computed, they are

rescaled so that

= 1.

The LLE energy function can be interpreted, for a

given set of weights w, as a prior that forces each

latent point to be locally reconstructed by its neigh-

bors,i.e., p(X|w) =

exp



−

Φ(X)



, where Z is

a normalization constant, and σ

represents a global

scaling of the prior. Note that strictly speaking this is

not a proper prior as it is conditioned on the weights

which depend on the training data. Following (Roweis

& Saul, 2000), we ﬁrst compute the neighbors based

on the Euclidean distance. For each training p oint y

we then compute the weights solving Eq. (4).

Learning the LL-GPLVM is then equivalent to mini-

Topologically-Constrained Latent Variable Models

mizing the negative log posterior of the mo del,

i.e.,

= log p(Y|X,

β) p(

β) p(X|w)

ln |K

| +



−1



ln β

k=1

i=1

−

j=1

+ C , (5)

where C is a constant, and x

is the k-th comp onent

of x

. Note that we have extended the LLE to have

a diﬀerent prior for each dimension. This will be use-

ful below as we incorporate diﬀerent sources of prior

knowledge. Fig. 2 (a) shows a model of 2 walks and 2

runs learned with the locally linear GPDM. Note how

smooth the latent trajectories are.

We now have general tools to inﬂuence the structure

of the models. In what follows we generalize the top-

down imposition of topology strategies (i.e. back-

constraints and lo cally linear GP-LVM) to incorporate

domain speciﬁc prior knowledge.

4. Reﬂecting Knowledge in Latent

Space Structure

A problem for modeling human motion data is the

sparsity of the data relative to the diversity of natu-

rally plausible motions. For example, while we might

have a data set comprising diﬀerent motions, such as

runs, walks etc., the data may not contain transitions

between motions. In practice however, we know that

these motions will be approximately cyclic and that

transitions can only physically o ccur at speciﬁc points

in the cycle. How can we encourage a model to re-

spect such topological constraints which arise from

prior knowledge?

We consider two alternatives to solve this problem.

First, we show how one can adjust the distance metric

used in the locally linear embedding to better reﬂect

diﬀerent types of prior knowledge. We then show how

one can deﬁne similarity measures for use with the

back-constrained GP-LVM. Both these approaches en-

courage the latent space to construct a representation

that reﬂects our prior knowledge. They are comple-

mentary and can be combined to learn better models.

When learning a locally linear GPDM, the dynamics

and the locally linear prior are combined as a pr oduct of po-

tentials. The objective function becomes L

ln |K

| +

−1

out

ln α

, with L

deﬁned as in (5).

(a) (d)

(b) (e)

Figure 2. First two dimensions of 3–D models learned

using (a) LL-GPDM (b) LL-GPDM with topology (c)

LL-GPDM with topology and transitions. (d) Back-

constrained GPDM with an RBF mapping. (e) GPDM

with topology through backconstraints. (f) GPDM with

backconstraints for the topology and transitions. For the

mod els using topology, the cyclic structure is imposed in

the last 2 dimensions. The two typ es of transition points

(left and right leg contact points) are shown in red and

green, and are used as prior knowledge in (c,f).

4.1. Prior Knowledge through Local

Linearities

We now turn to consider how one might incorporate

prior knowledge in the LL-GPLVM framework. This is

accomplished by replacing the local Euclidean distance

measures used in Section 3.1 with other similarity mea-

sures. That is, we can modify the covariance used to

compute the weights in Eq. (4) to reﬂect our prior

knowledge in the latent space. We consider two exam-

ples: the ﬁrst involves transitions between activities;

with the second we show how topological constraints

can be placed on the form of the latent space.

Covariance for Transitions Modeling transitions

between motions is important in character animation.

Transitions can be infered automatically based on sim-

ilarity between poses (Kovar et al., 2002) or at p oints

of non-linearity of the dynamics (Bissacco, 2005), and

they can be used for learning. For example, for mo-

tions as walking or running, two types of transitions

can be identiﬁed: left and right foot ground contacts.

Topologically-Constrained Latent Variable Models

To model such transitions, we deﬁne an index on the

frames of the motion sequence, {t

}

i=1

. We then deﬁne

subsets of this set, {

}

i=1

, which represents frames

where transitions are possible. To capture transitions

in the latent model we deﬁne the elements for the co-

variance matrix as follows,

trans

= 1 −



exp(−ζ(t

− t

)



(6)

with ζ a constant, and δ

= 1 if t

and t

are in the

same set {

}

k=1

, and otherwise δ

= 0. This covari-

ance encourages the latent points at which transitions

are physically possible to be close together.

Covariance for Topologies We now consider co-

variances that encourage the latent space to have a

particular topology. Speciﬁcally we are interested in

suitable topologies for walking and running data. Be-

cause the data are approximately periodic, it seems

appropriate to have a non-Cartesian topology. To this

end one can extract the phase of the motion

, φ, and

use it with a covariance to encourage the latent points

to exhibit a periodic topological structure within a

Cartesian space. As an example we consider a cylindri-

cal topology within a 3–D latent space by constraining

two of the latent dimensions with the phase. In partic-

ular, to represent the cyclic motion we construct a dis-

tance function on the unit circle, where a latent point

corresponding to phase φ is represented with coordi-

nates (cos(φ), sin(φ)). To force a cylindrical topology

on the latent space, we specify diﬀerent covariances for

each latent dimension

cos

k,j

= (cos(φ

) − cos(φ

)) (cos(φ

) − cos(φ

)) (7)

sin

k,j

= (sin(φ

) − sin(φ

)) (sin(φ

) − sin(φ

)) , (8)

with k, j ∈ η

. The covariance for the remaining di-

mension is constructed as usual, based on Euclidean

distance in the data space. Fig. 2 (b) shows a GPDM

constrained in this way, and in Fig. 2 (c) the covari-

ance is augmented with transitions.

Note that the use of diﬀerent distance measures for

each dimension of the latent space implies that the

neighborhood and the weights in the locally linear

prior will also be diﬀerent for each dimension. Here,

three diﬀerent locally linear embeddings form the prior

distribution.

4.2. Prior Knowledge with Back Constraints

As explained above, we can also design back-

constraints to inﬂuence the topology and learn useful

The phase can be easily extracted from the data by

Fourier analysis or by detecting key postures and interpo-

lating the phases between them. Another idea, not further

explored here, would be to optimize the GP-LVM with re-

sp ect to the phase.

transitions. This can be done by replacing the ker-

nel of Eq. (3). Many kernels have interpretations as

similarity measures. In particular, any similarity mea-

sure that leads to a positive semi-deﬁnite matrix can

be interpreted as a kernel. Here, just as we deﬁne

covariance matrices above, we extend the original for-

mulation of back constraints by constructing similarity

measures (i.e., kernels) to reﬂect prior knowledge.

Similarity for Transitions To capture transitions

between two motions, we wish to design a kernel that

expresses strong similarity between points in the re-

spective motions where transitions may occur. We can

encourage transition points of diﬀerent sequences to

be proximal with the following kernel matrix for the

back-constraint mapping:

trans

, t

) =

k(t

)k(t

) (9)

where k(t

) is an RBF centered at

, and δ

= 1

and

are in the same set. The inﬂuence of the

back-constraints is controlled by the support width of

the RBF kernel.

Topologically Constrained Latent Spaces We

now consider kernels that force the latent space to have

a particular topology. To force a cylindrical topology

on the latent space, we can introduce similarity mea-

sures based on the phase, specifying diﬀerent similarity

measures for each latent dimension. As before we con-

struct a distance function in the unit circle, that takes

into account the phase. A periodic mapping can be

constructed from a kernel matrix as follows,

n,1

m=1

cos

k(cos(φ

), cos(φ

)) + a

cos

n,m

n,2

m=1

sin

k(sin(φ

), sin(φ

)) + a

sin

n,m

where k is an RBF kernel function, and x

n,i

is the i

coordinate of the n

latent point. These two map-

pings project onto two dimensions of the latent space,

forcing them to have a periodic structure (which comes

about through the sinusoidal dependence of the kernel

on phase). Fig. 2 (e) shows a model learned using

GPDM with the last two dimensions constrained in

this way (the third dimension is out of plane). The

ﬁrst dimension is constrained by an RBF mapping on

the input space. Each dimension’s kernel matrix can

then be augmented by adding the transition similarity

of Eq.(9), resulting in the model shown in Fig. 2 (f).

4.3. Model Combination

One advantage of our framework is that covariance ma-

trices can be combined in a principled manner to form

Topologically-constrained latent variable models

Figures

Citations

Structural-RNN: Deep Learning on Spatio-Temporal Graphs

Structural-RNN: Deep Learning on Spatio-Temporal Graphs

Recurrent Network Models for Human Dynamics

Recurrent Network Models for Human Dynamics

UniMiB SHAR: A Dataset for Human Activity Recognition Using Acceleration Data from Smartphones

References

Nonlinear dimensionality reduction by locally linear embedding.

A global geometric framework for nonlinear dimensionality reduction.

Motion graphs

Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models

Gaussian Process Dynamical Models for Human Motion

Related Papers (5)

Gaussian Process Dynamical Models for Human Motion

Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models

A global geometric framework for nonlinear dimensionality reduction.

Nonlinear dimensionality reduction by locally linear embedding.

Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments

Frequently Asked Questions (15)

Q1. What have the authors contributed in "Topologically-constrained latent variable models" ?

Q2. How does the LLE preserve topological constraints?

Q3. What is the way to force a cylindrical topology on the latent space?

Q4. How many gait cycles are in the larger training set?

Q5. How many gait cycles did the authors first consider?

Q6. How does the paper demonstrate the effectiveness of the approach?

Q7. What is the way to encourage transitions between different sequences?

Q8. What is the way to generalize the model to unseen styles?

Q9. What is the first step in the GP-LVM?

Q10. What is the difference between the two mappings?

Q11. What is the possible mapping of a kernel-based regression model?

Q12. What is the first simulation of the GPDM?

Q13. how do you learn a walking, running and jumping model?

Q14. How do the authors introduce topological constraints in the latent space?

Q15. How do the authors force a cylindrical topology on the latent space?