ConvNets-Based Action Recognition from Depth Maps through Virtual Cameras and Pseudocoloring

doi:10.1145/2733373.2806296

Proceedings ArticleDOI

ConvNets-Based Action Recognition from Depth Maps through Virtual Cameras and Pseudocoloring

Pichao Wang, +5 more

- pp 1119-1122

Chats0

TLDR

Through transferring learning the models originally trained over ImageNet for image classification, the three ConvNets are trained independently on the color-coded DMMs constructed in three orthogonal planes, achieving state-of-the-art results on these datasets.

Abstract:

In this paper, we propose to adopt ConvNets to recognize human actions from depth maps on relatively small datasets based on Depth Motion Maps (DMMs). In particular, three strategies are developed to effectively leverage the capability of ConvNets in mining discriminative features for recognition. Firstly, different viewpoints are mimicked by rotating virtual cameras around subject represented by the 3D points of the captured depth maps. This not only synthesizes more data from the captured ones, but also makes the trained ConvNets view-tolerant. Secondly, DMMs are constructed and further enhanced for recognition by encoding them into Pseudo-RGB images, turning the spatial-temporal motion patterns into textures and edges. Lastly, through transferring learning the models originally trained over ImageNet for image classification, the three ConvNets are trained independently on the color-coded DMMs constructed in three orthogonal planes. The proposed algorithm was extensively evaluated on MSRAction3D, MSRAction3DExt and UTKinect-Action datasets and achieved the state-of-the-art results on these datasets.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Enhanced skeleton visualization for view invariant human action recognition

Mengyuan Liu, +2 more

- 01 Aug 2017 -

Pattern Recognition

TL;DR: Enhanced skeleton visualization method encodes spatio-temporal skeletons as visual and motion enhanced color images in a compact yet distinctive manner and consistently achieves the highest accuracies on four datasets, including the largest and most challenging NTU RGB+D dataset for skeleton-based action recognition.

...read moreread less

Proceedings ArticleDOI

Action Recognition Based on Joint Trajectory Maps Using Convolutional Neural Networks

Pichao Wang, +3 more

TL;DR: In this article, a joint trajectory map (JTM) was proposed to encode spatio-temporal information carried in 3D skeleton sequences into multiple 2D images, referred to as Joint Trajectory Maps (jTM), and ConvNets were adopted to exploit the discriminative features for real-time human action recognition.

...read moreread less

Journal ArticleDOI

Action Recognition From Depth Maps Using Deep Convolutional Neural Networks

Pichao Wang, +5 more

- 01 Aug 2016 -

IEEE Transactions on Human-Machine Syste...

TL;DR: The proposed method maintained its performance on the large dataset, whereas the performance of existing methods decreased with the increased number of actions, and the method achieved 2-9% better results on most of the individual datasets.

...read moreread less

Journal ArticleDOI

Skeleton Optical Spectra-Based Action Recognition Using Convolutional Neural Networks

Yonghong Hou, +3 more

- 01 Mar 2018 -

IEEE Transactions on Circuits and System...

TL;DR: This letter presents an effective method to encode the spatiotemporal information of a skeleton sequence into color texture images, referred to as skeleton optical spectra, and employs convolutional neural networks (ConvNets) to learn the discriminative features for action recognition.

...read moreread less

Journal ArticleDOI

Joint Distance Maps Based Action Recognition With Convolutional Neural Networks

Chuankun Li, +3 more

- 06 Mar 2017 -

IEEE Signal Processing Letters

TL;DR: An effective yet simple method is proposed to encode the spatio-temporal information of skeleton sequences into color texture images, referred to as joint distance maps (JDMs), and convolutional neural networks are employed to exploit the discriminative features from the JDMs for human action and interaction recognition.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Posted Content

Caffe: Convolutional Architecture for Fast Feature Embedding

Yangqing Jia, +7 more

- 20 Jun 2014 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: Caffe as discussed by the authors is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.

...read moreread less

Proceedings ArticleDOI

Caffe: Convolutional Architecture for Fast Feature Embedding

Yangqing Jia, +7 more

TL;DR: Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.

...read moreread less

Proceedings Article

Two-Stream Convolutional Networks for Action Recognition in Videos

Karen Simonyan, +1 more

TL;DR: This work proposes a two-stream ConvNet architecture which incorporates spatial and temporal networks and demonstrates that a ConvNet trained on multi-frame dense optical flow is able to achieve very good performance in spite of limited training data.

...read moreread less

Large-scale Video Classiﬁcation with Convolutional Neural Networks

Andrej Karpathy, +5 more

ConvNets-Based Action Recognition from Depth Maps through Virtual Cameras and Pseudocoloring

Citations

Enhanced skeleton visualization for view invariant human action recognition

Action Recognition Based on Joint Trajectory Maps Using Convolutional Neural Networks

Action Recognition From Depth Maps Using Deep Convolutional Neural Networks

Skeleton Optical Spectra-Based Action Recognition Using Convolutional Neural Networks

Joint Distance Maps Based Action Recognition With Convolutional Neural Networks

References

ImageNet Classification with Deep Convolutional Neural Networks

Caffe: Convolutional Architecture for Fast Feature Embedding

Caffe: Convolutional Architecture for Fast Feature Embedding

Two-Stream Convolutional Networks for Action Recognition in Videos

Large-scale Video Classiﬁcation with Convolutional Neural Networks

Related Papers (5)

Action recognition based on a bag of 3D points

Hierarchical recurrent neural network for skeleton based action recognition

Mining actionlet ensemble for action recognition with depth cameras

Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group

View invariant human action recognition using histograms of 3D joints