Deep Learning on Lie Groups for Skeleton-Based Action Recognition
Summary (2 min read)
1. Introduction
- The authors focus on studying manifold-based approaches [41, 3, 42] to learn more appropriate Lie group representations of skeletal action data, that have achieved state-of-the-art performances for some 3D human action recognition benchmarks.
- To handle this issue, they typically employ dynamic time warping (DTW), as originally used in speech processing [30].
- To address this problem, [41, 3, 42] attempt to first flatten the underlying manifold via tangent approximation or rolling maps, and then exploit SVM or PCA-like method to learn features in the resulting flattened space.
- The proposed network provides a paradigm to incorporate the Lie group structure into deep learning, which generalizes the traditional neural network model to non-Euclidean Lie groups.
2. Relevant Work
- In particular, two sub-classes of the general Lie group learning theories were studied in detail, tackling first-order (gradient-based) and secondorder (non-gradient-based) learning. [15] introduced deep symmetry networks , a generalization of convolutional networks that forms feature maps over arbitrary symmetry groups that are basically Lie groups.
- The symnets utilize kernel-based interpolation to tractably tie parameters and pool over symmetry spaces of any dimension.
- [10] proposed a spectral version of convolutional networks to handle graphs.
- For shape analysis, [28] proposed a ‘geodesic convolution’ on local geodesic coordinate systems to extract local patches on the shape manifold.
3. Lie Group Representation for Skeletal Data
- The local coordinate system of body part en is calculated by rotating with minimum rotation so that its stating joint becomes the origin and it coincides with the x-axis.
- When the anchor point is the identity matrix.
- In ∈ SOn, the resulting tangent space is known as the Lie algebra son.
4. Lie Group Network for Skeleton-based Action Recognition
- For the problem of skeleton-based action recognition, the authors build a deep network architecture to learn the Lie group representations of skeletal data.
- The network structure is dubbed as LieNet, where each input is an element on the Lie Group.
- Like convolutional networks , the LieNet also exhibits fully connected convolution-like layers and pooling layers, named rotation mapping layers and rotation pooling layers respectively.
- In particular, the proposed RotMap layers perform transformations on input rotation matrices to generate new rotation matrices, which have the same manifold property, and are expected to be aligned more accurately for more reliable matching.
- This transforms the rotation matrices into the usual skew-symmetric matrices, which lie in Euclidean space and hence can be fed into any regular output layers.
4.4. Output Layers
- After performing the LogMap layers, the outputs can be transformed into vector form and concatenated directly frame by frame within one sequence due to their Euclidean nature.
- Then, the authors can add any regular network layers such as rectified linear unit (ReLU) layers and regular fully connected (FC) layers.
- In the FC layer, the dimensionality of the weight is set to dk × dk−1, where dk and dk−1 are the class number and the vector dimensionalities, respectively.
- Besides, as studied in [37, 26], learning temporal dependencies over the sequential data can improve human action recognition.
- Because of the space limitation, the authors do not study this any further.
5. Training Procedure
- In order to train the proposed LieNets, the authors exploit the Stochastic gradient descent (SGD) algorithm that is one of the most popular network training tools.
- The gradients of the data involved in RotPooling, LogMap and regular output layers can be calculated by Eqn.14 as usual.
- As a consequence, merely using Eqn.13 to compute their Euclidean gradients rather than Riemannian gradients in the procedure of backpropagation would not generate valid rotation weights.
- To handle this problem, the authors propose a new approach of updating the weights used in Eqn.6 for the RotMap layers.
- Then, such update is mapped back to the SO3 manifold with a retraction operation.
6.1. Evaluation Datasets
- G3D-Gaming dataset [5] contains 663 sequences of 20 different gaming motions.
- Each subject performed every action more than two times.
- Due to its large scale, the dataset is highly suitable for deep learning.
6.2. Implementation Details
- As a result, for each moving skeleton, the authors finally compute a Lie group curve of length 100, 16, 64 for the G3D-Gaming, HDM05 and NTU RGB-D datasets, respectively.
- As the focus of this work is on skeleton-based action recognition, the authors mainly utilize manifold-based approaches for comparison.
- For a fair comparison, the authors use the source codes from the original authors, and set the involved parameters as in the original papers.
- For the proposed LieNet, the authors build its architecture with single or multiple block(s) of RotMap/RotPooling layers illustrated in Fig.1 before the three final layers being LogMap, FC and softmax layers.
- As the LieNet gets promising results on all datasets with the same configuration, this shows its insensitivity to the parameter settings.
6.3. Experimental Results
- For the dataset, the authors follow a cross-subject test setting, where half the subjects are used for training and the other half are employed for testing.
- As shown in Table 1, the LieNet shows its superiority over the two baseline methods SO and SE.
- This extreme case would result in the loss of the temporal resolution and thus undermine the performance of recognizing activities.
- The left of Fig.3 verifies the necessity of using RotMap, RotPooling and LogMap layers to improve the proposed LieNet-3Blocks.
Did you find this useful? Give us your feedback
Citations
837 citations
Cites background from "Deep Learning on Lie Groups for Ske..."
...Specifically, many of them have been evaluated based on the preliminary version [47] of our dataset, or pre-trained on it for transfer learning for other tasks [43], [61], [62], [63], [64], [65], [66], [67], [68], [69], [70], [71], [72], [73], [74], [75], [76], [77], [78], [79]....
[...]
...[62] incorporated Lie group structure into a deep architecture for skeleton-based action recognition....
[...]
437 citations
436 citations
Additional excerpts
...Skeleton-based action recognition has been explored in different aspects in recent years [40], [41], [42], [43], [44], [45], [46], [47], [48], [49], [50], [51], [52], [53], [54], [55], [56], [57]....
[...]
380 citations
329 citations
Cites methods from "Deep Learning on Lie Groups for Ske..."
...In [36] GCNNs were used for shape segmentation, and in [14], they were used for skeleton-based action recognition....
[...]
References
15,106 citations
"Deep Learning on Lie Groups for Ske..." refers background or methods in this paper
...In particular, inspired by the classical manifold learning theory [38, 36, 4, 12, 20, 19], we equip the new network structure with rotation mapping layers, with which the input Lie group features are transformed to new ones with better alignment....
[...]
...As well-known from classical manifold learning theory [38, 36, 4, 12, 20, 19], one can learn or preserve the original data structure to faithfully maintain geodesic distances for better classification....
[...]
13,652 citations
"Deep Learning on Lie Groups for Ske..." refers background or methods in this paper
...In particular, inspired by the classical manifold learning theory [38, 36, 4, 12, 20, 19], we equip the new network structure with rotation mapping layers, with which the input Lie group features are transformed to new ones with better alignment....
[...]
...As well-known from classical manifold learning theory [38, 36, 4, 12, 20, 19], one can learn or preserve the original data structure to faithfully maintain geodesic distances for better classification....
[...]
7,210 citations
5,561 citations
"Deep Learning on Lie Groups for Ske..." refers methods in this paper
...While the convergence of the used SGD algorithm on Riemannian manifolds has been studied well in [8, 6] already, the convergence behavior (see Fig....
[...]
3,460 citations
"Deep Learning on Lie Groups for Ske..." refers background in this paper
...Moreover, recently some deep learning models have emerged [10, 7, 28, 25, 18, 21] that deal with data in a nonEuclidean domain....
[...]
...For instance, [10] proposed a spectral version of convolutional networks to handle graphs....
[...]