scispace - formally typeset
Open AccessProceedings ArticleDOI

Actional-Structural Graph Convolutional Networks for Skeleton-Based Action Recognition

Reads0
Chats0
TLDR
The proposed AS-GCN achieves consistently large improvement compared to the state-of-the-art methods and shows promising results for future pose prediction.
Abstract
Action recognition with skeleton data has recently attracted much attention in computer vision. Previous studies are mostly based on fixed skeleton graphs, only capturing local physical dependencies among joints, which may miss implicit joint correlations. To capture richer dependencies, we introduce an encoder-decoder structure, called A-link inference module, to capture action-specific latent dependencies, i.e. actional links, directly from actions. We also extend the existing skeleton graphs to represent higher-order dependencies, i.e. structural links. Combing the two types of links into a generalized skeleton graph, We further propose the actional-structural graph convolution network (AS-GCN), which stacks actional-structural graph convolution and temporal convolution as a basic building block, to learn both spatial and temporal features for action recognition. A future pose prediction head is added in parallel to the recognition head to help capture more detailed action patterns through self-supervision. We validate AS-GCN in action recognition using two skeleton data sets, NTU-RGB+D and Kinetics. The proposed AS-GCN achieves consistently large improvement compared to the state-of-the-art methods. As a side product, AS-GCN also shows promising results for future pose prediction.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Skeleton-Based Action Recognition With Shift Graph Convolutional Network

TL;DR: The proposed Shift-GCN notably exceeds the state-of-the-art methods with more than 10 times less computational complexity, and is composed of novel shift graph operations and lightweight point-wise convolutions.
Proceedings ArticleDOI

Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition

TL;DR: A simple method to disentangle multi-scale graph convolutions and a unified spatial-temporal graph convolutional operator named G3D are presented and a powerful feature extractor named MS-G3D is developed based on which the model outperforms previous state-of-the-art methods on three large-scale datasets.
Posted Content

Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition

TL;DR: Wang et al. as discussed by the authors proposed a unified spatial-temporal graph convolutional operator named G3D, which disentangles the importance of nodes in different neighborhoods for effective long-range modeling.
Proceedings ArticleDOI

Semantics-Guided Neural Networks for Efficient Skeleton-Based Human Action Recognition

TL;DR: In this paper, a semantics-guided neural network (SGN) is proposed for skeleton-based action recognition, which explicitly introduces the high level semantics of joints (joint type and frame index) into the network to enhance the feature representation capability.
Book ChapterDOI

Decoupling GCN with DropGraph Module for Skeleton-Based Action Recognition

TL;DR: This paper rethink the spatial aggregation in existing GCN-based skeleton action recognition methods and discovers that they are limited by coupling aggregation mechanism, and proposes decoupling GCN to boost the graph modeling ability with no extra computation, no extra latency, noextra GPU memory cost, and less than 10% extra parameters.
References
More filters
Proceedings Article

Adam: A Method for Stochastic Optimization

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Journal ArticleDOI

The anatomy of a large-scale hypertextual Web search engine

TL;DR: This paper provides an in-depth description of Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and looks at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.
Proceedings ArticleDOI

Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields

TL;DR: Part Affinity Fields (PAFs) as discussed by the authors uses a nonparametric representation to learn to associate body parts with individuals in the image and achieves state-of-the-art performance on the MPII Multi-Person benchmark.
Proceedings Article

Categorical Reparameterization with Gumbel-Softmax

TL;DR: Gumbel-Softmax as mentioned in this paper replaces the non-differentiable samples from a categorical distribution with a differentiable sample from a novel Gumbel softmax distribution, which has the essential property that it can be smoothly annealed into the categorical distributions.
Proceedings Article

Semi-Supervised Classification with Graph Convolutional Networks

TL;DR: In this paper, a scalable approach for semi-supervised learning on graph-structured data is presented based on an efficient variant of convolutional neural networks which operate directly on graphs.
Related Papers (5)