scispace - formally typeset
Proceedings ArticleDOI

Skeleton-based action recognition with convolutional neural networks

Chao Li, +3 more
- pp 597-600
Reads0
Chats0
TLDR
A novel convolutional neural networks (CNN) based framework for both action classification and detection of skeleton-based action recognition and a window proposal network to extract temporal segment proposals, which are further classified within the same network.
Abstract
Current state-of-the-art approaches to skeleton-based action recognition are mostly based on recurrent neural networks (RNN). In this paper, we propose a novel convolutional neural networks (CNN) based framework for both action classification and detection. Raw skeleton coordinates as well as skeleton motion are fed directly into CNN for label prediction. A novel skeleton transformer module is designed to rearrange and select important skeleton joints automatically. With a simple 7-layer network, we obtain 89.3% accuracy on validation set of the NTU RGB+D dataset. For action detection in untrimmed videos, we develop a window proposal network to extract temporal segment proposals, which are further classified within the same network. On the recent PKU-MMD dataset, we achieve 93.7% mAP, surpassing the baseline by a large margin.

read more

Citations
More filters
Proceedings Article

Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition

TL;DR: Wang et al. as discussed by the authors proposed a novel model of dynamic skeletons called Spatial-Temporal Graph Convolutional Networks (ST-GCN), which moves beyond the limitations of previous methods by automatically learning both the spatial and temporal patterns from data.
Proceedings ArticleDOI

Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition

TL;DR: Zhang et al. as mentioned in this paper proposed a two-stream adaptive graph convolutional network (2s-AGCN) to model both the first-order and the second-order information simultaneously, which shows notable improvement for the recognition accuracy.
Proceedings ArticleDOI

Skeleton-Based Action Recognition With Directed Graph Neural Networks

TL;DR: A novel directed graph neural network is designed specially to extract the information of joints, bones and their relations and make prediction based on the extracted features and is tested on two large-scale datasets, NTU-RGBD and Skeleton-Kinetics, and exceeds state-of-the-art performance on both of them.
Proceedings ArticleDOI

Skeleton-Based Action Recognition With Shift Graph Convolutional Network

TL;DR: The proposed Shift-GCN notably exceeds the state-of-the-art methods with more than 10 times less computational complexity, and is composed of novel shift graph operations and lightweight point-wise convolutions.
Proceedings ArticleDOI

Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation

TL;DR: This paper introduces a global spatial aggregation scheme, which is able to learn superior joint co-occurrence features over local aggregation, and consistently outperforms other state-of-the-arts on action recognition and detection benchmarks like NTU RGB+D, SBU Kinect Interaction and PKU-MMD.
References
More filters
Journal ArticleDOI

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

TL;DR: This work introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals and further merge RPN and Fast R-CNN into a single network by sharing their convolutionAL features.
Posted Content

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

TL;DR: Faster R-CNN as discussed by the authors proposes a Region Proposal Network (RPN) to generate high-quality region proposals, which are used by Fast R-NN for detection.
Proceedings Article

Two-Stream Convolutional Networks for Action Recognition in Videos

TL;DR: This work proposes a two-stream ConvNet architecture which incorporates spatial and temporal networks and demonstrates that a ConvNet trained on multi-frame dense optical flow is able to achieve very good performance in spite of limited training data.
Proceedings Article

Maxout Networks

TL;DR: A simple new model called maxout is defined designed to both facilitate optimization by dropout and improve the accuracy of dropout's fast approximate model averaging technique.
Posted Content

NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis

TL;DR: In this paper, a large-scale dataset for RGB+D human action recognition was introduced with more than 56 thousand video samples and 4 million frames, collected from 40 distinct subjects.
Related Papers (5)