Deep Learning on Lie Groups for Skeleton-Based Action Recognition

doi:10.1109/CVPR.2017.137

Deep Learning on Lie Groups for Skeleton-based Action Recognition

Zhiwu Huang

†

, Chengde Wan

†

, Thomas Probst

†

, Luc Van Gool

†‡

†

Computer Vision Lab, ETH Zurich, Switzerland

‡

VISICS, KU Leuven, Belgium

{zhiwu.huang, wanc, probstt, vangool}@vision.ee.ethz.ch

Abstract

In recent years, skeleton-based action recognition has

become a popular 3D classiﬁcation problem. State-of-the-

art methods typically ﬁrst represent each motion sequence

as a high-dimensional trajectory on a Lie group with an

additional dynamic time warping, and then shallowly learn

favorable Lie group features. In this paper we incorporate

the Lie group structure into a deep network architecture to

learn more appropriate Lie group features for 3D action

recognition. Within the network structure, we design rota-

tion mapping layers to transform the input Lie group fea-

tures into desirable ones, which are aligned better in the

temporal domain. To reduce the high feature dimensional-

ity, the architecture is equipped with rotation pooling layers

for the elements on the Lie group. Furthermore, we propose

a logarithm mapping layer to map the resulting manifold

data into a tangent space that facilitates the application of

regular output layers for the ﬁnal classiﬁcation. Evalua-

tions of the proposed network for standard 3D human ac-

tion recognition datasets clearly demonstrate its superiority

over existing shallow Lie group feature learning methods as

well as most conventional deep learning methods.

1. Introduction

Due to the development of depth sensors, 3D human

activity analysis [

27, 45, 23, 43, 41, 3, 42, 37, 44, 26,

35, 17] has attracted more interest than ever before. Re-

cent manifold-based approaches are quite successful at 3D

human action recognition thanks to their view-invariant

manifold-based representations for skeletal data. Typical

examples include shape silhouettes in the Kendall’s shape

space [

40, 3], linear dynamical systems on the Grassmann

manifold [

39], histograms of oriented optical ﬂow on a

hyper-sphere [

11], and pairwise transformations of skele-

tal joints on a Lie group [

41, 3, 42]. In this paper, we focus

on studying manifold-based approaches [

41, 3, 42] to learn

more appropriate Lie group representations of skeletal ac-

tion data, that have achieved state-of-the-art performances

for some 3D human action recognition benchmarks.

As studied in [

41, 3, 42], Lie group feature learning

methods often suffer from speed variations (i.e., temporal

misalignment), which tend to deteriorate classiﬁcation ac-

curacy. To handle this issue, they typically employ dynamic

time warping (DTW), as originally used in speech process-

ing [

30]. Unfortunately, such process costs additional time,

and also results in a two-step system that typically performs

worse than an end-to-end learning scheme. Moreover, such

Lie group representations for action recognition tend to be

extremely high-dimensional, in part because the features are

extracted per skeletal segment and then stacked. As a result,

any computation on such nonlinear trajectories is expensive

and complicated. To address this problem, [

41, 3, 42] at-

tempt to ﬁrst ﬂatten the underlying manifold via tangent

approximation or rolling maps, and then exploit SVM or

PCA-like method to learn features in the resulting ﬂattened

space. Although these methods achieve some success, they

merely adopt shallow linear learning schemes, yielding sub-

optimal solutions on the speciﬁc nonlinear manifolds.

Deep neural networks have shown their great power in

learning compact and discriminative representations for im-

ages and videos, thanks to their ability to perform nonlin-

ear computations and the effectiveness of gradient descent

training with backpropagation. This has motivated us to

build a deep neural network architecture for representation

learning on Lie groups. In particular, inspired by the clas-

sical manifold learning theory [

38, 36, 4, 12, 20, 19], we

equip the new network structure with rotation mapping lay-

ers, with which the input Lie group features are transformed

to new ones with better alignment. As a result, the effect of

speed variations can be appropriately mitigated. In order

to reduce the high dimensionality of the Lie group features,

we design special pooling layers to compose them in terms

of spatial and temporal levels, respectively. As the output

data reside on nonlinear manifolds, we also propose a Rie-

mannian computing layer, whose outputs could be fed into

any regular output layers such as a softmax layer. In short,

our main contributions are:

• A novel neural network architecture is introduced to

deeply learn more desirable Lie group representations

for the problem of skeleton-based action recognition.

6099

• The proposed network provides a paradigm to incorpo-

rate the Lie group structure into deep learning, which

generalizes the traditional neural network model to

non-Euclidean Lie groups.

• To train the network within the backpropagation

framework, a variant of stochastic gradient descent op-

timization is exploited in the context of Lie groups.

2. Relevant Work

Already quite some works [

46, 34, 2, 29, 33, 14, 15] have

applied aspects of Lie group theory to deep neural networks.

For example, [

33] investigated how stability properties of a

continuous recursive neural network can be altered within

neighbourhoods of equilibrium points by the use of Lie

group projections operating on the synaptic weight matrix.

[

14] studied the behavior of unsupervised neural networks

with orthonormality constraints, by exploiting the differen-

tial geometry of Lie groups. In particular, two sub-classes

of the general Lie group learning theories were studied

in detail, tackling ﬁrst-order (gradient-based) and second-

order (non-gradient-based) learning. [

15] introduced deep

symmetry networks (symnets), a generalization of convolu-

tional networks that forms feature maps over arbitrary sym-

metry groups that are basically Lie groups. The symnets

utilize kernel-based interpolation to tractably tie parameters

and pool over symmetry spaces of any dimension.

Moreover, recently some deep learning models have

emerged [

10, 7, 28, 25, 18, 21] that deal with data in a non-

Euclidean domain. For instance, [

10] proposed a spectral

version of convolutional networks to handle graphs. It ex-

ploits the notion of non shift-invariant convolution, relying

on the analogy between the classical Fourier transform and

the Laplace-Beltrami eigenbasis. [

25] developed a scalable

method for treating an arbitrary spatio-temporal graph as a

rich recurrent neural network mixture, which can be used to

transform any spatio-temporal graph by employing a certain

set of well-deﬁned steps. For shape analysis, [

28] proposed

a ‘geodesic convolution’ on local geodesic coordinate sys-

tems to extract local patches on the shape manifold. This

approach performs convolutions by sliding a window over

the manifold, and local geodesic coordinates are used in-

stead of image patches. To deeply learn symmetric positive

deﬁnite (SPD) matrices - used in many tasks - [

18] devel-

oped a Riemannian network on the manifolds of SPD matri-

ces, with some layers specially designed to deal with such

structured matrices.

In summary, such works have applied some theories of

Lie groups to regular networks, and even generalized the

common networks to non-Euclidean domains. Neverthe-

less, to the best of our knowledge, this is the ﬁrst work that

studies a deep learning architecture on Lie groups to handle

the problem of skeleton-based action recognition.

3. Lie Group Representation for Skeletal Data

Let S = (V, E) be a body skeleton, where V =

{v

1

, . . . , v

N

} denotes the set of body joints, and E =

{e

1

, . . . , e

M

} indicates the set of edges, i.e. oriented rigid

body bones. As studied in [

41, 3, 42], the relative geome-

try of a pair of body parts e

n

and e

m

can be represented in

a local coordinate system attached to the other. The local

coordinate system of body part e

n

is calculated by rotating

with minimum rotation so that its stating joint becomes the

origin and it coincides with the x-axis. With the process,

we consequently get the transformed 3D vectors

ˆ

e

m

,

ˆ

e

n

for

the two edges e

m

, e

n

respectively. Then we can compute

the rotation matrix R

m,n

(R

T

m,n

R

m,n

= R

m,n

R

T

m,n

=

I

n

, | R

m,n

| = 1) from e

m

to the local coordinate system

of e

n

. Speciﬁcally, we can ﬁrstly calculate the axis-angle

representation (ω, θ) for the rotation matrix R

m,n

by

ω =

ˆ

e

m

⊗

ˆ

e

n

k

ˆ

e

m

⊗

ˆ

e

n

k

, (1)

θ = arccos(

ˆ

e

m

·

ˆ

e

n

). (2)

where ⊗, · are outer and inner products respectively. Then,

the axis-angle representation can be easily transformed to

a rotation matrix R

m,n

. In the same way, the rotation ma-

trix R

n,m

from e

n

to the local coordinate system of e

m

can be computed. To fully encode the relative geometry be-

tween e

m

and e

n

, R

m,n

and R

n,m

are both used. As a

result, a skeleton S at the time instance t is represented by

the form (R

1,2

(t), R

2,1

(t) . . . , R

M−1,M

(t), R

M,M−1

(t)),

where M is the number of body parts, and the number of

rotation matrices is 2C

2

M

(C

2

M

is the combination formula).

The set of n×n rotation matrices in R

n

forms the special

orthogonal group SO

n

which is actually a matrix Lie group

[

22, 9, 16]. Accordingly, each motion sequence of a moving

skeleton is represented with a curve on the Lie group SO

3

×

. . .×SO

3

. It is known that the matrix Lie group is endowed

with a Riemannian manifold structure that is differentiable.

Hence, at each point R

0

on SO

n

, one can derive the tangent

space T

R

0

SO

n

that is a vector space spanned by the set

of skew-symmetric matrices. When the anchor point is the

identity matrix I

n

∈ SO

n

, the resulting tangent space is

known as the Lie algebra so

n

. As the tangent spaces are

equipped with the inner product, the Riemannian metric on

SO

n

can be deﬁned by the Frobenius inner product:

< A

1

, A

2

>= trace(A

T

1

A

2

), A

1

, A

2

∈ T

R

0

SO

n

. (3)

The logarithm map log

R

0

and exponential map exp

R

0

at R

0

on SO

n

associated with the Riemannian metric can

be expressed in terms of the usual matrix logarithm log and

exponential exp as

log

R

0

(R

1

) = log(R

1

R

T

0

) with R

0

, R

1

∈ SO

n

, (4)

exp

R

0

(A

1

) = exp

A

1

R

T

0

with A

1

∈ T

R

0

SO

n

. (5)

6100

…





⋯







󰇛󰇜

























RotMap

Input RotMat







log







LogMap





󰇛󰇜





󰇛󰇜

…

RotPooling



󰇛󰇜





󰇛󰇜







max

󰇝





,..󰇞





max 󰇝





,…󰇞

RotPooling





󰇛󰇜



















RotMap





󰇛󰇜

Output





⋯







⋯







󰇛󰇜





󰇛󰇜





⋯



Lie Group



󰇛󰇜



󰇛󰇜



󰇛󰇜

Figure 1. Conceptual illustration of the proposed Lie group Network (LieNet) architecture. In the network structure, the data space of each

RotMap/RotPooling layer corresponds to a Lie group, while the weight spaces of the RotMap layers are Lie groups as well.

4. Lie Group Network for Skeleton-based Ac-

tion Recognition

For the problem of skeleton-based action recognition, we

build a deep network architecture to learn the Lie group

representations of skeletal data. The network structure is

dubbed as LieNet, where each input is an element on the

Lie Group. Like convolutional networks (ConvNets), the

LieNet also exhibits fully connected convolution-like layers

and pooling layers, named rotation mapping (RotMap) lay-

ers and rotation pooling (RotPooling) layers respectively.

In particular, the proposed RotMap layers perform transfor-

mations on input rotation matrices to generate new rotation

matrices, which have the same manifold property, and are

expected to be aligned more accurately for more reliable

matching. The RotPooling layers aim to pool the resulting

rotation matrices at both spatial and temporal levels such

that the Lie group feature dimensionality can be reduced.

Since the rotation matrices reside on non-Euclidean mani-

folds, we have to design a layer named logarithm mapping

(LogMap) layer, to perform the Riemannian computations

on. This transforms the rotation matrices into the usual

skew-symmetric matrices, which lie in Euclidean space and

hence can be fed into any regular output layers. The archi-

tecture of the proposed LieNet is shown in Fig.

1.

4.1. RotMap Layer

As well-known from classical manifold learning theory

[

38, 36, 4, 12, 20, 19], one can learn or preserve the origi-

nal data structure to faithfully maintain geodesic distances

for better classiﬁcation. Accordingly, we design a RotMap

layer to transform the input rotation matrices to new ones

that are more suitable for the ﬁnal classiﬁcation. Formally,

the RotMap layers adopt a rotation mapping f

r

as

f

(k)

r

((R

k−1

1

, R

k−1

2

. . . , R

k−1

ˆ

M

); W

k

1

, W

k

2

. . . , W

k

ˆ

M

)

= (W

k

1

R

k−1

1

, W

k

2

R

k−1

2

. . . , W

k

ˆ

M

R

k−1

ˆ

M

)

= (R

k

1

, R

k

2

. . . , R

k

ˆ

M

)

(6)

where

ˆ

M = 2C

2

M

(M is the number of body bones

in one skeleton, C

2

M

is the combination computation),

(R

k−1

1

, R

k−1

2

. . . , R

k−1

ˆ

M

) ∈ SO

3

× SO

3

. . . × SO

3

is

the input Lie group feature (i.e., product of rotation ma-

trices) for one skeleton in the k-th layer, W

k

i

∈ R

3×3

is the transformation matrix (connection weights), and

(R

k

1

, R

k

2

. . . , R

k

ˆ

M

) is the resulting Lie group representa-

tion. Note that although there is only one transformation

matrix for each rotation matrix, it would be easily extended

with multiple projections for each input. To ensure the form

(R

k

1

, R

k

2

. . . , R

k

ˆ

M

) becomes a valid product of rotation ma-

trices residing on SO

3

× SO

3

. . . × SO

3

, the transforma-

tion matrices W

k

1

, W

k

2

, . . . , W

k

ˆ

M

are all basically required

to be rotation matrices. Accordingly, both the data and the

weight spaces on each RotMap layer correspond to a Lie

group SO

3

× SO

3

. . . × SO

3

.

Since the RotMap layers are designed to work together

with the classiﬁcation layer, each resulting skeleton rep-

resentation is tuned for more accurate classiﬁcation in an

end-to-end deep learning manner. In other words, the major

purpose of designing the RotMap layers is to align the Lie

group representations of a moving skeleton for more faith-

ful matching.

4.2. RotPooling Layer

In order to reduce the complexity of deep models, it is

typically useful to reduce the size of the representations to

6101

decrease the amount of parameters and computation in the

network. For this purpose, it is common to insert a pooling

layer in-between successive convolutional layers in a typi-

cal ConvNet architecture. The pooling layers are often de-

signed to compute statistics in local neighborhoods, such as

sum aggregation, average energy and maximum activation.

Without loss of generality, we here just introduce max

pooling

1

to the LieNet setting with the equivalent notion

of neighborhood. Since the input and output of the special

pooling layers are both expected to be rotation matrices, we

call this kind of layers as rotation pooling (RotPooling) lay-

ers. For the RotPooling, we propose two different concepts

of neighborhood in this work. The ﬁrst one is on the spa-

tial level. As shown in Fig.

2(a)→(b), we ﬁrst pool the Lie

group features on each pair of basic bones e

m

, e

n

in the

i-th frame, which is represented by the two rotation matri-

ces R

k−1,i

m,n

, R

k−1,i

n,m

(here k − 1 is the order of the layer) as

aforementioned. Then, as depicted in Fig.

2(b)→(c), we can

perform pooling on the adjacent bones that belong to the

same group (here, we can deﬁne ﬁve part groups, i.e., torso,

two arms and two legs, of the body). However, the second

step would inevitably result in a serious spatial misalign-

ment problem, and thus lead to bad matching performances.

Therefore, we ﬁnally only adopt the ﬁrst step pooling. In

this setting, the function of the max pooling is given by

f

(k)

p

({R

k−1,i

m,n

, R

k−1,i

n,m

}) = max({R

k−1,i

m,n

, R

k−1,i

n,m

})

=

(

R

k−1,i

m,n

, if Θ(R

k−1,i

m,n

) > Θ(R

k−1,i

n,m

),

R

k−1,i

n,m

, otherwise,

(7)

where Θ(·) is the representation of the given rotation matrix

such as quaternion, Euler angle or Euler axis-angle. For

example, the Euler axis ω and angle θ representations are

typically calculated by

ω(R

n,m

) =

1

2 si n( θ(R

n,m

))





R

n,m

(3, 2) − R

n,m

(2, 3)

R

n,m

(1, 3) − R

n,m

(3, 1)

R

n,m

(2, 1) − R

n,m

(1, 2)





,

(8)

θ(R

n,m

) = arccos



trace(R

n,m

) − 1

2



, (9)

where R

n,m

(i, j) is the i-the row, j-th column element of

R

n,m

. Unfortunately, except the angle representation, it is

non-trivial to deﬁne an ordering relation for a quaternion

or an axis-angle representation. Hence, in this paper, we

ﬁnally adopt the angle form Eqn.9 of rotation matrices and

its simple ordering relation to calculate the function Θ(·).

The other pooling scheme is on the temporal level. As

shown in Fig.

2 (c)→(d), the aim of the temporal pooling

1

In contrast to sum and mean poolings, max pooling can generate valid

rotation matrices directly, and hence suits the proposed LieNets. On the

other hand, leveraging Lie group computing to enable sum and mean pool-

ing to work for the LieNets, however, goes beyond the scope of this paper.

ܴ

௡ǡ௠

௞ିଵǡସ

(d)

ܴ

௡ǡ௠

௞ିଵǡଵ

ܴ

௡ǡ௠

௞ିଵǡଶ

ܴ

௡ǡ௠

௞ିଵǡଷ

ܴ

௡ǡ௠

௞ିଵǡସ

(c)

ሼܴ

௡ǡ௠

௞ିଵǡ௜

,…}

(b)

ܴ

௡ǡ௠

௞ିଵǡ௜

ܴ

௣ǡ௤

௞ିଵǡ௜

(c)

ܴ

௠ǡ௤

௞ିଵǡ௜

ሼܴ

௜ǡ௝

௞ିଵǡ௜

ǡǥ}

ሼܴ

௡ǡ௠

௞ିଵǡ௜

ǡǥ}

(a)

ܴ

௡ǡ௠

௞ିଵǡ௜

ܴ

௠ǡ௡

௞ିଵǡ௜

Figure 2. Illustration of spatial pooling (SpaPooling) (a)→(b)→(c)

and temporal pooling (TemPooling) (c)→(d) schemes.

is to obtain more compact representations for a motion se-

quence. This is because a sequence often contains many

frames, which results in the problem of extremely high-

dimensional representations. Thus, pooling in the temporal

domain can reduce the model complexity as well. Formally,

the function of this kind of max pooling is deﬁned as

f

(k)

p

({(R

k−1,1

1,2

. . . R

k−1,1

M−1,M

) . . . , (R

k−1,p

1,2

. . . , R

k−1,p

M−1,M

)})

= (max({R

k−1,1

1,2

. . . , R

k−1,p

1,2

}) . . . ,

max({R

k−1,1

M−1,M

. . . , R

k−1,p

M−1,M

})),

(10)

where M is the number of body parts in one skeleton, p is

the number of skeleton frames for pooling, and the function

max(·) is deﬁned in the way of Eqn.

7.

4.3. LogMap Layer

Classiﬁcation of curves on the Lie group SO

3

× . . . ×

SO

3

is a complicated task due to the non-Euclidean nature

of the underlying space. To address the problem as in [42],

we design the logarithm map (LogMap) layer to ﬂatten the

Lie group SO

3

× . . . × SO

3

to its Lie algebra so

3

× . . . ×

so

3

. Accordingly, by using the logarithm map Eqn.

4, the

function of this layer can be deﬁned as

f

(k)

l

((R

k−1

1

, R

k−1

2

. . . , R

k−1

ˆ

M

))

= (log(R

k−1

1

), log(R

k−1

2

) . . . , log(R

k−1

ˆ

M

)).

(11)

One typical approach to calculate the logarithm map is

to use the approach log(R) = U log(Σ)U

T

, where R =

U ΣU

T

, log(Σ) is the diagonal matrix of the eigenvalue

logarithms. However, the spectral operation not only suffers

from the problem of zeroes occurring in log(Σ) due to the

property of the rotation matrix R, but also consumes too

much time for matrix gradient computation [

24]. Therefore,

we resort to other approaches to perform the function of this

6102

layer. Fortunately, we can explore the relationship between

the logarithm map and the axis-angle representation as:

log(R) =

(

0, if θ(R) = 0,

θ(R)

2 sin(θ(R))

(R − R

T

), otherwise,

(12)

where θ(R) is the angle Eqn.

9 of R. With this equation,

the corresponding matrix gradient can be easily derived by

traditional element-wise matrix calculation.

4.4. Output Layers

After performing the LogMap layers, the outputs can

be transformed into vector form and concatenated directly

frame by frame within one sequence due to their Euclidean

nature. Then, we can add any regular network layers such

as rectiﬁed linear unit (ReLU) layers and regular fully con-

nected (FC) layers. In particular for the ReLU layer, we

can simply set relatively small elements to zero as done in

classical ReLU. In the FC layer, the dimensionality of the

weight is set to d

k

× d

k−1

, where d

k

and d

k−1

are the class

number and the vector dimensionalities, respectively. For

skeleton-based action recognition, we employ a common

softmax layer as the ﬁnal output layer. Besides, as studied in

[

37, 26], learning temporal dependencies over the sequen-

tial data can improve human action recognition. Hence, we

can also feed the outputs into Long Short-Term Memory

(LSTM) unit to learn useful temporal features. Because of

the space limitation, we do not study this any further.

5. Training Procedure

In order to train the proposed LieNets, we exploit the

Stochastic gradient descent (SGD) algorithm that is one of

the most popular network training tools. To begin with, let

the LieNet model be represented as a sequence of function

compositions f = f

(l)

◦f

(l−1)

. . .◦f

(1)

with a parameter tu-

ple W = (W

l

, W

l−1

. . . , W

1

), where f

(k)

is the function

for the k-th layer, W

k

(dropping the sample index for sim-

plicity) represents the weight parameters of the k-th layer,

and l is the number of layers. The loss of the k-th layer is

deﬁned by L

(k)

= ℓ ◦ f

(l)

. . . ◦ f

(k)

, where ℓ is the loss

function for the ﬁnal output layer.

To optimize the deep model, one classical SGD algo-

rithm needs to compute the gradient of the objective func-

tion, which is typically achieved by the backpropagation

chain rule. In particular, the gradients of the weight W

k

and the data R

k−1

(dropping the sample index for simplic-

ity) for the k-th layer can be respectively computed by the

chain rule:

∂L

(k)

(R

k−1

, y)

∂W

k

=

∂L

(k+1)

(R

k

, y)

∂R

k

∂f

(k)

(R

k−1

)

∂W

k

, (13)

∂L

(k)

(R

k−1

, y)

∂R

k−1

=

∂L

(k+1)

(R

k

, y)

∂R

k

∂f

(k)

(R

k−1

)

∂R

k−1

, (14)

where y is the class label, R

k

= f

(k)

(R

k−1

). Eqn.

13 is

the gradient for updating W

k

, while Eqn.

14 computes the

gradients in the layers below to update R

k−1

.

The gradients of the data involved in RotPooling,

LogMap and regular output layers can be calculated by

Eqn.

14 as usual. Particularly, the gradient for the data in

RotPooling can be computed with the same gradient com-

puting approach used in a regular max pooling layer in the

context of traditional ConvNets. For the data in the LogMap

layer, the gradient can be obtained by the element-wise gra-

dient computation on the involved rotation matrices.

On the other hand, the computation of the gradients of

the parameter weights deﬁned in the RotMap layers is non-

trivial. This is because the weight matrices are enforced

to be on the Riemannian manifold SO

3

of the rotation

matrices, i.e. the Lie group. As a consequence, merely

using Eqn.

13 to compute their Euclidean gradients rather

than Riemannian gradients in the procedure of backpropa-

gation would not generate valid rotation weights. To handle

this problem, we propose a new approach of updating the

weights used in Eqn.

6 for the RotMap layers. As studied in

[

1], the steepest descent direction for the used loss function

L

(k)

(R

k−1

, y) with respect to W

k

on the manifold SO

3

is

the Riemannian gradient

˜

∇L

(k)

W

k

, which can be obtained by

parallel transporting the Euclidean gradients onto the corre-

sponding tangent space. In particular, transporting the gra-

dient from a point W

t

k

to another point W

t+1

k

requires sub-

tracting the normal component

¯

∇L

(k)

W

k

, at W

t+1

k

, which can

be obtained as follows:

¯

∇L

(k)

W

k

= ∇L

(k)

W

k

W

T

k

W

k

, (15)

where the Euclidean gradient ∇L

(k)

W

k

is computed by using

Eqn.

13 as

∇L

(k)

W

k

=

∂L

(k+1)

(R

k

, y)

∂R

k

R

T

k−1

. (16)

Thanks to the parallel transport, the Riemannian gradient

can be calculated by

˜

∇L

(k)

W

k

= ∇L

(k)

W

k

−

¯

∇L

(k)

W

k

. (17)

Searching along the tangential direction takes the update

in the tangent space of the SO

3

manifold. Then, such up-

date is mapped back to the SO

3

manifold with a retraction

operation. Consequently, an update of the weight W

k

on

the SO

3

manifold is of the following form

W

t+1

k

= Γ(W

t

k

− λ

˜

∇L

(k)

W

k

), (18)

where W

t

k

is the current weight, Γ is the retraction opera-

tion, λ is the learning rate.

6103

Deep Learning on Lie Groups for Skeleton-Based Action Recognition

Summary (2 min read)

1. Introduction

2. Relevant Work

3. Lie Group Representation for Skeletal Data

4. Lie Group Network for Skeleton-based Action Recognition

4.4. Output Layers

5. Training Procedure

6.1. Evaluation Datasets

6.2. Implementation Details

6.3. Experimental Results

Figures (8)

Citations

Cites background from "Deep Learning on Lie Groups for Ske..."

Additional excerpts

Cites methods from "Deep Learning on Lie Groups for Ske..."

References

"Deep Learning on Lie Groups for Ske..." refers background or methods in this paper

"Deep Learning on Lie Groups for Ske..." refers background or methods in this paper

"Deep Learning on Lie Groups for Ske..." refers methods in this paper

"Deep Learning on Lie Groups for Ske..." refers background in this paper

Related Papers (5)