Action Recognition From Depth Maps Using Deep Convolutional Neural Networks

doi:10.1109/THMS.2015.2504550

Journal ArticleDOI

Action Recognition From Depth Maps Using Deep Convolutional Neural Networks

Pichao Wang, +5 more

- 01 Aug 2016 -

IEEE Transactions on Human-Machine Syste...

- Vol. 46, Iss: 4, pp 498-509

Chats0

TLDR

The proposed method maintained its performance on the large dataset, whereas the performance of existing methods decreased with the increased number of actions, and the method achieved 2-9% better results on most of the individual datasets.

Abstract:

This paper proposes a new method, i.e., weighted hierarchical depth motion maps (WHDMM) + three-channel deep convolutional neural networks (3ConvNets), for human action recognition from depth maps on small training datasets. Three strategies are developed to leverage the capability of ConvNets in mining discriminative features for recognition. First, different viewpoints are mimicked by rotating the 3-D points of the captured depth maps. This not only synthesizes more data, but also makes the trained ConvNets view-tolerant. Second, WHDMMs at several temporal scales are constructed to encode the spatiotemporal motion patterns of actions into 2-D spatial structures. The 2-D spatial structures are further enhanced for recognition by converting the WHDMMs into pseudocolor images. Finally, the three ConvNets are initialized with the models obtained from ImageNet and fine-tuned independently on the color-coded WHDMMs constructed in three orthogonal planes. The proposed algorithm was evaluated on the MSRAction3D, MSRAction3DExt, UTKinect-Action, and MSRDailyActivity3D datasets using cross-subject protocols. In addition, the method was evaluated on the large dataset constructed from the above datasets. The proposed method achieved 2–9% better results on most of the individual datasets. Furthermore, the proposed method maintained its performance on the large dataset, whereas the performance of existing methods decreased with the increased number of actions.

Citations

PDF

Open Access

More filters

Posted Content

NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis

Amir Shahroudy, +3 more

- 11 Apr 2016 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: In this paper, a large-scale dataset for RGB+D human action recognition was introduced with more than 56 thousand video samples and 4 million frames, collected from 40 distinct subjects.

...read moreread less

Proceedings ArticleDOI

NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis

Amir Shahroudy, +3 more

TL;DR: A large-scale dataset for RGB+D human action recognition with more than 56 thousand video samples and 4 million frames, collected from 40 distinct subjects is introduced and a new recurrent neural network structure is proposed to model the long-term temporal correlation of the features for each body part, and utilize them for better action classification.

...read moreread less

Journal ArticleDOI

NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding

Jun Liu, +5 more

- 01 Oct 2020 -

IEEE Transactions on Pattern Analysis an...

TL;DR: This work introduces a large-scale dataset for RGB+D human action recognition, which is collected from 106 distinct subjects and contains more than 114 thousand video samples and 8 million frames, and investigates a novel one-shot 3D activity recognition problem on this dataset.

...read moreread less

Proceedings ArticleDOI

Joint Geometrical and Statistical Alignment for Visual Domain Adaptation

Jing Zhang, +2 more

TL;DR: Joint Geometrical and Statistical Alignment (JGSA) as mentioned in this paper learns two coupled projections that project the source domain and target domain data into low-dimensional subspaces where the geometrical shift and distribution shift are reduced simultaneously.

...read moreread less

Proceedings ArticleDOI

Action Recognition Based on Joint Trajectory Maps Using Convolutional Neural Networks

Pichao Wang, +3 more

TL;DR: In this article, a joint trajectory map (JTM) was proposed to encode spatio-temporal information carried in 3D skeleton sequences into multiple 2D images, referred to as Joint Trajectory Maps (jTM), and ConvNets were adopted to exploit the discriminative features for real-time human action recognition.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Journal ArticleDOI

Receptive fields, binocular interaction and functional architecture in the cat's visual cortex

David H. Hubel, +1 more

- 01 Jan 1962 -

The Journal of Physiology

TL;DR: This method is used to examine receptive fields of a more complex type and to make additional observations on binocular interaction and this approach is necessary in order to understand the behaviour of individual cells, but it fails to deal with the problem of the relationship of one cell to its neighbours.

...read moreread less

Book ChapterDOI

Visualizing and Understanding Convolutional Networks

Matthew D. Zeiler, +1 more

TL;DR: A novel visualization technique is introduced that gives insight into the function of intermediate feature layers and the operation of the classifier in large Convolutional Network models, used in a diagnostic role to find model architectures that outperform Krizhevsky et al on the ImageNet classification benchmark.

...read moreread less

Posted Content

Caffe: Convolutional Architecture for Fast Feature Embedding

Yangqing Jia, +7 more

- 20 Jun 2014 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: Caffe as discussed by the authors is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.

...read moreread less

Proceedings ArticleDOI

Caffe: Convolutional Architecture for Fast Feature Embedding

Yangqing Jia, +7 more

TL;DR: Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.

...read moreread less

Collapse

Action Recognition From Depth Maps Using Deep Convolutional Neural Networks

Citations

NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis

NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis

NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding

Joint Geometrical and Statistical Alignment for Visual Domain Adaptation

Action Recognition Based on Joint Trajectory Maps Using Convolutional Neural Networks

References

ImageNet Classification with Deep Convolutional Neural Networks

Receptive fields, binocular interaction and functional architecture in the cat's visual cortex

Visualizing and Understanding Convolutional Networks

Caffe: Convolutional Architecture for Fast Feature Embedding

Caffe: Convolutional Architecture for Fast Feature Embedding

Related Papers (5)

Action recognition based on a bag of 3D points

Mining actionlet ensemble for action recognition with depth cameras

Hierarchical recurrent neural network for skeleton based action recognition

View invariant human action recognition using histograms of 3D joints

Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group