scispace - formally typeset
Open AccessProceedings ArticleDOI

Topologically-constrained latent variable models

Reads0
Chats0
TLDR
A range of approaches for embedding data in a non-Euclidean latent space for the Gaussian Process latent variable model allows to learn transitions between motion styles even though such transitions are not present in the data.
Abstract
In dimensionality reduction approaches, the data are typically embedded in a Euclidean latent space. However for some data sets this is inappropriate. For example, in human motion data we expect latent spaces that are cylindrical or a toroidal, that are poorly captured with a Euclidean space. In this paper, we present a range of approaches for embedding data in a non-Euclidean latent space. Our focus is the Gaussian Process latent variable model. In the context of human motion modeling this allows us to (a) learn models with interpretable latent directions enabling, for example, style/content separation, and (b) generalise beyond the data set enabling us to learn transitions between motion styles even though such transitions are not present in the data.

read more

Content maybe subject to copyright    Report

Topologically-Constrained Latent Variable Models
Raquel Urtasun rurtasun@csail.mit.edu
UC Berkeley EECS & ICSI; CSAIL MIT
David J. Fleet fleet@cs.toronto.edu
University of Toronto
Andreas Geiger geiger@mrt.uka.de
Karlsruhe Institute of Technology
Jovan Popovi´c jovan@csail.mit.edu
CSAIL MIT
Trevor J. Darrell trevor@eecs.berkeley.edu
UC Berkeley EECS & ICSI; CSAIL MIT
Neil D. Lawrence Neil.Lawrence@manchester.ac.uk
University of Manchester
Abstract
In dimensionality reduction approaches, the
data are typically embedded in a Euclidean
latent space. However for some data sets this
is inappropriate. For example, in human mo-
tion data we expect latent spaces that are
cylindrical or a toroidal, that are poorly cap-
tured with a Euclidean space. In this paper,
we present a range of approaches for embed-
ding data in a non-Euclidean latent space.
Our focus is the Gaussian Process latent vari-
able model. In the context of human motion
modeling this allows us to (a) learn models
with interpretable latent directions enabling,
for example, style/content separation, and
(b) generalise beyond the data set enabling
us to learn transitions between motion styles
even though such transitions are not present
in the data.
1. Introduction
Dimensionality reduction is a popular approach to
dealing with high dimensional data s ets. It is of-
ten the case that linear dimensionality reduction, such
as principal component analysis (PCA) does not ad-
equately capture the structure of the data. For this
App earing in Proceedings of the 25
th
International Confer-
ence on Machine Learning, Helsinki, Finland, 2008. Copy-
right 2008 by the author(s)/owner(s).
reason there has been considerable interest in the ma-
chine learning community in non-linear dimensionality
reduction. Approaches such as locally linear embed-
ding (LLE), Isomap and maximum variance unfold-
ing (MVU) (Roweis & Saul, 2000; Tenenbaum et al.,
2000; Weinberger et al., 2004) all define a topology
through interconnections between points in the data
space. However, if a given data set is relatively sparse
or particularly noisy, these interconnections can stray
beyond the ‘true’ local neighbourhood and the result-
ing embedding can be poor.
Probabilistic formulations of latent variable models do
not usually include explicit constraints on the embed-
ding and therefore the natural topology of the data
manifold is not always respected
1
. Even with the cor-
rect topology and dimension of the latent space, the
learning might get stuck in local minima if the initial-
ization of the model is p oor. Moreover, the maximum
likelihoo d s olution may not be a good model, due e.g.,
to the sparseness of the data. To get better models in
such cases, more constraints on the mo del are needed.
This paper shows how explicit topological constraints
can be imposed within the context of probabilistic la-
tent variable models. We describe two approaches,
both within the context of the Gaussian process la-
tent variable model (GP-LVM) (Lawrence, 2005). The
1
An exception is the back-constrained GP-LVM
(Lawrence & Qui˜nonero-Candela, 2006) where a con-
strained maximum likelihood algorithm is used to enforce
these constraints.

Topologically-Constrained Latent Variable Models
first uses prior distributions on the latent space that
encourage a given topology. The second influences
the latent space and optimisation through constrained
maximum likelihood.
Our approach is motivated by the problem of model-
ing human pose and motion for character animation.
Human motion is an interesting domain because, while
there is an increasing amount of motion capture data
available, the diversity of human motion means that
we will necessarily have to incorporate a large amount
of prior knowledge to learn probabilistic models that
can accurately reconstruct a wide range of motions.
Despite this, most existing methods for learning pose
and motion models (Elgammal & Lee, 2004; Grochow
et al., 2004; Urtasun et al., 2006) do not fully exploit
useful prior information, and many are limited to mod-
eling a single human activity (e.g., walking with a par-
ticular style).
This paper describes how prior information can be
used effectively to learn models with specific topologies
that reflect the nature of human motion. Importantly,
with this information we can also model multiple ac-
tivities, including transitions between them (e.g. from
walking to running), even when such transitions are
not present in the training data. As a consequence,
we can now learn latent variable models with training
motions comprising multiple subjects with stylistic di-
versity, as well as multiple activities, such as running
and walking. We demonstrate the effectiveness of our
approach in a character animation application, where
the user specifies a set of constraints (e.g., foot loca-
tions), and the remaining kinematic degrees of freedom
are infered.
2. Gaussian Process Latent Variable
Models (GP-LVM)
We begin with a brief review of the GP-LVM
(Lawrence, 2005). The GP-LVM represents a high-
dimensional data set, Y, through a low dimensional
latent space, X, and a Gaussian process mapping
from the latent space to the data space. Let Y =
[y
1
, ..., y
N
]
T
be a matrix in which each row is a single
training datum, y
i
D
. Let X = [x
1
, ..., x
N
]
T
de-
note the matrix whose rows represent the correspond-
ing positions in latent space, x
i
d
. Given a covari-
ance function for the Gaussian process, k
Y
(x, x
), the
likelihoo d of the data given the latent positions is,
p(Y | X,
¯
β) =
1
Z
1
exp
1
2
tr
K
1
Y
YY
T
, (1)
where Z
1
is a normalization factor, K
Y
is known as
the kernel matrix, and
¯
β denotes the kernel hyperpa-
rameters. The elements of the kernel matrix are de-
fined by the covariance function, (K
Y
)
i,j
= k
Y
(x
i
, x
j
).
A common choice is the radial basis function (RBF),
k
Y
(x, x
) = β
1
exp(
β
2
2
||x x
||
2
) +
δ
x,x
β
3
, where the
kernel hyperparameters
¯
β = {β
1
, β
2
, β
3
} determine the
output variance, the RBF support width, and the vari-
ance of the additive noise. Learning in the GP-LVM
consists of maximizing (1) with respect to the latent
positions, X, and the hyperparameters,
¯
β.
When one has time-series data, Y represents a se-
quence of observations, and it is natural to aug-
ment the GP-LVM with an explicit dynamical model.
For example, the Gaussian Process Dynamical Model
(GPDM) models the sequence as a latent stochastic
process with a Gaussian process prior (Wang et al.,
2008) , i.e.,
p(X | ¯α) =
p(x
1
)
Z
2
exp
1
2
tr
K
1
X
X
out
X
T
out
(2)
where Z
2
is a normalization factor, X
out
=
[x
2
, ..., x
N
]
T
, K
X
(N1)×(N1)
is the kernel matrix
constructed from X
in
= [x
1
, ..., x
N1
], x
1
is given an
isotropic Gaussian prior and ¯α are the kernel hyper-
parameters for K
X
; below we use an RBF kernel for
K
X
. Like the GP-LVM the GPDM provides a gen-
erative model for the data, but additionally it pro-
vides one for the dynamics. One can therefore predict
future observation sequences given past observations,
and simulate new sequences.
3. Top Down Imposition of Topology
The smooth mapping in the GP-LVM ensures that
distant points in data space remain distant in la-
tent space. However, as discussed in (Lawrence &
Qui˜nonero-Candela, 2006), the mapping in the oppo-
site direction is not required to be smooth. While
the GPDM may mitigate this effect, it often produces
models that are neither smooth nor generalize well
(Urtasun et al., 2006; Wang et al., 2008).
To help ensure smoother, well-behaved models,
(Lawrence & Qui˜nonero-Candela, 2006) suggested the
use of back-constraints, where each point in the latent
space is a smooth function of its corresponding point
in data space, x
ij
= g
j
(y
i
; a
j
), where {a
j
}
1jd
is
the set of parameters of the mappings. One possible
mapping is a kernel-based regression model, where re-
gression on a kernel induced feature space provides the
mapping,
x
ij
=
N
X
m=1
a
jm
k(y
i
, y
m
) . (3)
This approach is known as the back-constrained GP-
LVM. When learning the back-constrained GP-LVM,

Topologically-Constrained Latent Variable Models
(a) (b) (c) (d)
Figure 1. When training data contain large stylistic variations and multiple motions, the generic GPDM (a) and the
back-constrained GPDM (b) do not produce useful models. Simulations of both models here do not look realistic. (c,d)
Hybrid model learned using local linearities for smoothness (i.e., style) and backconstraints for topologies (i.e., content).
The training data is composed of 9 walks and 10 runs performed by different subjects and speeds. (c) Likelihood for the
reconstruction of the latent points (d) 3D view of the latent trajectories for the training data in blue, and the automatically
generated motions of Figs. 3 and 4 in green and red respectively.
one needs to determine the hyperparameters of the ker-
nel matrices (for the back-constraints and the covari-
ance of the GP), as well as the mapping weights, {a
j
}.
(Lawrence & Qui˜nonero-Candela, 2006) fixed the hy-
perparameters of the back-constraint’s kernel matrix,
optimizing over the remaining parameters.
Nevertheless, when learning human motion data with
large stylistic variations or different motions, nei-
ther GPDM nor back-constrained GP-LVM produce
smooth models that generalize well. Fig. 1 depicts
three 3–D models learned from 9 walks and 10 runs .
The GPDM (Fig. 1(a)) and the back-constrainted
GPDM
2
(Fig. 1 (b)) do not generalize well to new runs
and walks, nor do they produce realistic animations.
In this paper we show that with a well designed
set of back-constraints good models can be learned
(Fig. 1(c)). We also consider an alternative approach
to the hard constraints on the latent space arising
from g
j
(y
i
; a
j
). We introduce topological constraints
through a prior distribution in the latent space, based
on a neighborhood structure learned through a gener-
alized local linear embedding (LLE) (Roweis & Saul,
2000). We then show how to incorporate domain-
specific prior knowledge, which allows us to develop
motion models with specific topologies that incorpo-
rate different activities within a single latent space and
transitions between them.
3.1. Locally Linear GP-LVM
The locally linear embedding (LLE) (Roweis & Saul,
2000) preserves topological constraints by finding a
representation based on r econstruction in a low dimen-
sional space with an optimized set of local weightings.
Here we show how the LLE objective can be combined
with the GP-LVM, yielding a locally linear GP-LVM
(LL-GPLVM).
2
We use an RBF kernel for the inverse mapping in (3).
The locally linear embedding assumes that each data
point and its neighbors lie on, or close to, a locally
linear patch on the data manifold. The local geome-
try of these patches can then be characterized by lin-
ear coefficients that reconstruct each data point from
its neighbors. This is done in a three step proce-
dure: (1) the K nearest neighbors, {y
j
}
jη
i
, of each
point, y
i
, are computed using Euclidean distance in
the input space, d
ij
= ||y
i
y
j
||
2
; (2) the weights
w = {w
ij
} that best reconstruct each data point
from its neighbors are obtained by minimizing Φ(w) =
P
N
i=1
||y
i
P
jη
i
w
ij
y
j
||
2
; and (3) the latent positions
x
i
best reconstructed by the weights w
ij
are computed
by minimizing Φ(X) =
P
N
i=1
||x
i
P
jη
i
w
ij
x
j
||
2
.
In the LLE, the weight matrix w is sparse (only a small
number of neighbors is used), and the two minimiza-
tions can be computed in closed form. In particular,
computing the weights can be done by solving, j η
i
,
the following system,
X
k
C
sim
kj
w
sim
ij
= 1 , (4)
where C
sim
kj
= (y
i
y
k
)
T
(y
i
y
j
) if j, k η
i
, and 0
otherwise. Once the weights are computed, they are
rescaled so that
P
j
w
ij
= 1.
The LLE energy function can be interpreted, for a
given set of weights w, as a prior that forces each
latent point to be locally reconstructed by its neigh-
bors,i.e., p(X|w) =
1
Z
exp
1
σ
2
Φ(X)
, where Z is
a normalization constant, and σ
2
represents a global
scaling of the prior. Note that strictly speaking this is
not a proper prior as it is conditioned on the weights
which depend on the training data. Following (Roweis
& Saul, 2000), we first compute the neighbors based
on the Euclidean distance. For each training p oint y
i
,
we then compute the weights solving Eq. (4).
Learning the LL-GPLVM is then equivalent to mini-

Topologically-Constrained Latent Variable Models
mizing the negative log posterior of the mo del,
3
i.e.,
L
S
= log p(Y|X,
¯
β) p(
¯
β) p(X|w)
=
D
2
ln |K
Y
| +
1
2
tr
K
1
Y
YY
T
+
X
i
ln β
i
+
1
σ
2
d
X
k=1
N
X
i=1
kx
k
i
N
X
j=1
w
k
ij
x
k
j
k
2
+ C , (5)
where C is a constant, and x
k
i
is the k-th comp onent
of x
i
. Note that we have extended the LLE to have
a different prior for each dimension. This will be use-
ful below as we incorporate different sources of prior
knowledge. Fig. 2 (a) shows a model of 2 walks and 2
runs learned with the locally linear GPDM. Note how
smooth the latent trajectories are.
We now have general tools to influence the structure
of the models. In what follows we generalize the top-
down imposition of topology strategies (i.e. back-
constraints and lo cally linear GP-LVM) to incorporate
domain specific prior knowledge.
4. Reflecting Knowledge in Latent
Space Structure
A problem for modeling human motion data is the
sparsity of the data relative to the diversity of natu-
rally plausible motions. For example, while we might
have a data set comprising different motions, such as
runs, walks etc., the data may not contain transitions
between motions. In practice however, we know that
these motions will be approximately cyclic and that
transitions can only physically o ccur at specific points
in the cycle. How can we encourage a model to re-
spect such topological constraints which arise from
prior knowledge?
We consider two alternatives to solve this problem.
First, we show how one can adjust the distance metric
used in the locally linear embedding to better reflect
different types of prior knowledge. We then show how
one can define similarity measures for use with the
back-constrained GP-LVM. Both these approaches en-
courage the latent space to construct a representation
that reflects our prior knowledge. They are comple-
mentary and can be combined to learn better models.
3
When learning a locally linear GPDM, the dynamics
and the locally linear prior are combined as a pr oduct of po-
tentials. The objective function becomes L
S
+
d
2
ln |K
X
| +
1
2
tr
`
K
1
X
X
out
X
T
out
´
+
P
i
ln α
i
, with L
S
defined as in (5).
(a) (d)
(b) (e)
(c) (f)
Figure 2. First two dimensions of 3–D models learned
using (a) LL-GPDM (b) LL-GPDM with topology (c)
LL-GPDM with topology and transitions. (d) Back-
constrained GPDM with an RBF mapping. (e) GPDM
with topology through backconstraints. (f) GPDM with
backconstraints for the topology and transitions. For the
mod els using topology, the cyclic structure is imposed in
the last 2 dimensions. The two typ es of transition points
(left and right leg contact points) are shown in red and
green, and are used as prior knowledge in (c,f).
4.1. Prior Knowledge through Local
Linearities
We now turn to consider how one might incorporate
prior knowledge in the LL-GPLVM framework. This is
accomplished by replacing the local Euclidean distance
measures used in Section 3.1 with other similarity mea-
sures. That is, we can modify the covariance used to
compute the weights in Eq. (4) to reflect our prior
knowledge in the latent space. We consider two exam-
ples: the first involves transitions between activities;
with the second we show how topological constraints
can be placed on the form of the latent space.
Covariance for Transitions Modeling transitions
between motions is important in character animation.
Transitions can be infered automatically based on sim-
ilarity between poses (Kovar et al., 2002) or at p oints
of non-linearity of the dynamics (Bissacco, 2005), and
they can be used for learning. For example, for mo-
tions as walking or running, two types of transitions
can be identified: left and right foot ground contacts.

Topologically-Constrained Latent Variable Models
To model such transitions, we define an index on the
frames of the motion sequence, {t
i
}
N
i=1
. We then define
subsets of this set, {
ˆ
t
i
}
M
i=1
, which represents frames
where transitions are possible. To capture transitions
in the latent model we define the elements for the co-
variance matrix as follows,
C
trans
kj
= 1
δ
kj
exp(ζ(t
k
t
j
)
2
)
(6)
with ζ a constant, and δ
ij
= 1 if t
i
and t
j
are in the
same set {
ˆ
t
k
}
M
k=1
, and otherwise δ
ij
= 0. This covari-
ance encourages the latent points at which transitions
are physically possible to be close together.
Covariance for Topologies We now consider co-
variances that encourage the latent space to have a
particular topology. Specifically we are interested in
suitable topologies for walking and running data. Be-
cause the data are approximately periodic, it seems
appropriate to have a non-Cartesian topology. To this
end one can extract the phase of the motion
4
, φ, and
use it with a covariance to encourage the latent points
to exhibit a periodic topological structure within a
Cartesian space. As an example we consider a cylindri-
cal topology within a 3–D latent space by constraining
two of the latent dimensions with the phase. In partic-
ular, to represent the cyclic motion we construct a dis-
tance function on the unit circle, where a latent point
corresponding to phase φ is represented with coordi-
nates (cos(φ), sin(φ)). To force a cylindrical topology
on the latent space, we specify different covariances for
each latent dimension
C
cos
k,j
= (cos(φ
i
) cos(φ
k
)) (cos(φ
i
) cos(φ
j
)) (7)
C
sin
k,j
= (sin(φ
i
) sin(φ
k
)) (sin(φ
i
) sin(φ
j
)) , (8)
with k, j η
i
. The covariance for the remaining di-
mension is constructed as usual, based on Euclidean
distance in the data space. Fig. 2 (b) shows a GPDM
constrained in this way, and in Fig. 2 (c) the covari-
ance is augmented with transitions.
Note that the use of different distance measures for
each dimension of the latent space implies that the
neighborhood and the weights in the locally linear
prior will also be different for each dimension. Here,
three different locally linear embeddings form the prior
distribution.
4.2. Prior Knowledge with Back Constraints
As explained above, we can also design back-
constraints to influence the topology and learn useful
4
The phase can be easily extracted from the data by
Fourier analysis or by detecting key postures and interpo-
lating the phases between them. Another idea, not further
explored here, would be to optimize the GP-LVM with re-
sp ect to the phase.
transitions. This can be done by replacing the ker-
nel of Eq. (3). Many kernels have interpretations as
similarity measures. In particular, any similarity mea-
sure that leads to a positive semi-definite matrix can
be interpreted as a kernel. Here, just as we define
covariance matrices above, we extend the original for-
mulation of back constraints by constructing similarity
measures (i.e., kernels) to reflect prior knowledge.
Similarity for Transitions To capture transitions
between two motions, we wish to design a kernel that
expresses strong similarity between points in the re-
spective motions where transitions may occur. We can
encourage transition points of different sequences to
be proximal with the following kernel matrix for the
back-constraint mapping:
k
trans
(t
i
, t
j
) =
X
m
X
l
δ
ml
k(t
i
,
ˆ
t
m
)k(t
j
,
ˆ
t
l
) (9)
where k(t
i
,
ˆ
t
l
) is an RBF centered at
ˆ
t
l
, and δ
ml
= 1
if
ˆ
t
m
and
ˆ
t
l
are in the same set. The influence of the
back-constraints is controlled by the support width of
the RBF kernel.
Topologically Constrained Latent Spaces We
now consider kernels that force the latent space to have
a particular topology. To force a cylindrical topology
on the latent space, we can introduce similarity mea-
sures based on the phase, specifying different similarity
measures for each latent dimension. As before we con-
struct a distance function in the unit circle, that takes
into account the phase. A periodic mapping can be
constructed from a kernel matrix as follows,
x
n,1
=
N
X
m=1
a
cos
m
k(cos(φ
n
), cos(φ
m
)) + a
cos
0
δ
n,m
,
x
n,2
=
N
X
m=1
a
sin
m
k(sin(φ
n
), sin(φ
m
)) + a
sin
0
δ
n,m
,
where k is an RBF kernel function, and x
n,i
is the i
th
coordinate of the n
th
latent point. These two map-
pings project onto two dimensions of the latent space,
forcing them to have a periodic structure (which comes
about through the sinusoidal dependence of the kernel
on phase). Fig. 2 (e) shows a model learned using
GPDM with the last two dimensions constrained in
this way (the third dimension is out of plane). The
first dimension is constrained by an RBF mapping on
the input space. Each dimension’s kernel matrix can
then be augmented by adding the transition similarity
of Eq.(9), resulting in the model shown in Fig. 2 (f).
4.3. Model Combination
One advantage of our framework is that covariance ma-
trices can be combined in a principled manner to form

Figures
Citations
More filters
Proceedings ArticleDOI

Structural-RNN: Deep Learning on Spatio-Temporal Graphs

TL;DR: In this article, a scalable method for casting an arbitrary spatio-temporal graph as a rich RNN mixture that is feedforward, fully differentiable, and jointly trainable is proposed.
Posted Content

Structural-RNN: Deep Learning on Spatio-Temporal Graphs

TL;DR: This paper develops a scalable method for casting an arbitrary spatio-temporal graph as a rich RNN mixture that is feedforward, fully differentiable, and jointly trainable and shows improvement over the state-of-the-art with a large margin.
Posted Content

Recurrent Network Models for Human Dynamics

TL;DR: The Encoder-Recurrent-Decoder (ERD) model is a recurrent neural network that incorporates nonlinear encoder and decoder networks before and after recurrent layers that extends previous Long Short Term Memory models in the literature to jointly learn representations and their dynamics.
Proceedings ArticleDOI

Recurrent Network Models for Human Dynamics

TL;DR: In this paper, the Encoder-Recurrent-Decoder (ERD) model is proposed for recognition and prediction of human body pose in videos and motion capture, which is a recurrent neural network that incorporates nonlinear encoder and decoder networks before and after recurrent layers.
Journal ArticleDOI

UniMiB SHAR: A Dataset for Human Activity Recognition Using Acceleration Data from Smartphones

TL;DR: A new dataset of acceleration samples acquired with an Android smartphone designed for human activity recognition and fall detection is presented and shows that the presence of samples of the same subject both in the training and in the test datasets, increases the performance of the classifiers regardless of the feature vector used.
References
More filters
Journal ArticleDOI

Nonlinear dimensionality reduction by locally linear embedding.

TL;DR: Locally linear embedding (LLE) is introduced, an unsupervised learning algorithm that computes low-dimensional, neighborhood-preserving embeddings of high-dimensional inputs that learns the global structure of nonlinear manifolds.
Journal ArticleDOI

A global geometric framework for nonlinear dimensionality reduction.

TL;DR: An approach to solving dimensionality reduction problems that uses easily measured local metric information to learn the underlying global geometry of a data set and efficiently computes a globally optimal solution, and is guaranteed to converge asymptotically to the true structure.
Proceedings ArticleDOI

Motion graphs

TL;DR: This paper presents a novel method for creating realistic, controllable motion given a corpus of motion capture data, and presents a general framework for extracting particular graph walks that meet a user's specifications.
Journal Article

Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models

TL;DR: A novel probabilistic interpretation of principal component analysis (PCA) that is based on a Gaussian process latent variable model (GP-LVM), and related to popular spectral techniques such as kernel PCA and multidimensional scaling.
Journal ArticleDOI

Gaussian Process Dynamical Models for Human Motion

TL;DR: This work marginalize out the model parameters in closed form by using Gaussian process priors for both the dynamical and the observation mappings, which results in a nonparametric model for dynamical systems that accounts for uncertainty in the model.
Related Papers (5)
Frequently Asked Questions (15)
Q1. What have the authors contributed in "Topologically-constrained latent variable models" ?

In this paper, the authors present a range of approaches for embedding data in a non-Euclidean latent space. 

The locally linear embedding (LLE) (Roweis & Saul, 2000) preserves topological constraints by finding a representation based on reconstruction in a low dimensional space with an optimized set of local weightings. 

To force a cylindrical topology on the latent space, the authors can introduce similarity measures based on the phase, specifying different similarity measures for each latent dimension. 

The larger training set comprises approximately one gait cycle from each of 9 walking and 10 running motions performed by different subjects at different speeds (3 km/h for walking, 6–12 km/h for running). 

The authors first considered a small training set comprised of 4 gait cycles (2 walks and 2 runs) performed by one subject at different speeds. 

The authors demonstrate the effectiveness of their approach in a character animation application, where the user specifies a set of constraints (e.g., foot locations), and the remaining kinematic degrees of freedom are infered. 

The authors can encourage transition points of different sequences to be proximal with the following kernel matrix for the back-constraint mapping:ktrans(ti, tj) = ∑m∑lδmlk(ti, t̂m)k(tj , t̂l) (9)where k(ti, t̂l) is an RBF centered at t̂l, and δml = 1 if t̂m and t̂l are in the same set. 

The model can also generalize to styles very differentzfrom the ones in the training set, by imposing constraints that can be satisfied only by motions very different from the training data. 

When one has time-series data, Y represents a sequence of observations, and it is natural to augment the GP-LVM with an explicit dynamical model. 

These two mappings project onto two dimensions of the latent space, forcing them to have a periodic structure (which comes about through the sinusoidal dependence of the kernel on phase). 

One possible mapping is a kernel-based regression model, where regression on a kernel induced feature space provides the mapping,xij = N ∑m=1ajmk(yi,ym) . 

For the first simulation (depicted in green), the model is initialized to a running pose with a latent position not far from walking data. 

Although the authors have learned models composed of walking, running and jumping, their framework is general, being applicable in any data sets where there is a large degree of prior knowledge for the problem domain, but the data availability is relatively sparse compared to its complexity. 

The authors introduce topological constraints through a prior distribution in the latent space, based on a neighborhood structure learned through a generalized local linear embedding (LLE) (Roweis & Saul, 2000). 

To force a cylindrical topology on the latent space, the authors specify different covariances for each latent dimensionCcosk,j = (cos(φi) − cos(φk)) (cos(φi) − cos(φj)) (7) Csink,j = (sin(φi) − sin(φk)) (sin(φi) − sin(φj)) , (8)with k, j ∈ ηi.