scispace - formally typeset
Proceedings ArticleDOI

ConvNets-Based Action Recognition from Depth Maps through Virtual Cameras and Pseudocoloring

Reads0
Chats0
TLDR
Through transferring learning the models originally trained over ImageNet for image classification, the three ConvNets are trained independently on the color-coded DMMs constructed in three orthogonal planes, achieving state-of-the-art results on these datasets.
Abstract
In this paper, we propose to adopt ConvNets to recognize human actions from depth maps on relatively small datasets based on Depth Motion Maps (DMMs). In particular, three strategies are developed to effectively leverage the capability of ConvNets in mining discriminative features for recognition. Firstly, different viewpoints are mimicked by rotating virtual cameras around subject represented by the 3D points of the captured depth maps. This not only synthesizes more data from the captured ones, but also makes the trained ConvNets view-tolerant. Secondly, DMMs are constructed and further enhanced for recognition by encoding them into Pseudo-RGB images, turning the spatial-temporal motion patterns into textures and edges. Lastly, through transferring learning the models originally trained over ImageNet for image classification, the three ConvNets are trained independently on the color-coded DMMs constructed in three orthogonal planes. The proposed algorithm was extensively evaluated on MSRAction3D, MSRAction3DExt and UTKinect-Action datasets and achieved the state-of-the-art results on these datasets.

read more

Citations
More filters
Journal ArticleDOI

Enhanced skeleton visualization for view invariant human action recognition

TL;DR: Enhanced skeleton visualization method encodes spatio-temporal skeletons as visual and motion enhanced color images in a compact yet distinctive manner and consistently achieves the highest accuracies on four datasets, including the largest and most challenging NTU RGB+D dataset for skeleton-based action recognition.
Proceedings ArticleDOI

Action Recognition Based on Joint Trajectory Maps Using Convolutional Neural Networks

TL;DR: In this article, a joint trajectory map (JTM) was proposed to encode spatio-temporal information carried in 3D skeleton sequences into multiple 2D images, referred to as Joint Trajectory Maps (jTM), and ConvNets were adopted to exploit the discriminative features for real-time human action recognition.
Journal ArticleDOI

Action Recognition From Depth Maps Using Deep Convolutional Neural Networks

TL;DR: The proposed method maintained its performance on the large dataset, whereas the performance of existing methods decreased with the increased number of actions, and the method achieved 2-9% better results on most of the individual datasets.
Journal ArticleDOI

Skeleton Optical Spectra-Based Action Recognition Using Convolutional Neural Networks

TL;DR: This letter presents an effective method to encode the spatiotemporal information of a skeleton sequence into color texture images, referred to as skeleton optical spectra, and employs convolutional neural networks (ConvNets) to learn the discriminative features for action recognition.
Journal ArticleDOI

Joint Distance Maps Based Action Recognition With Convolutional Neural Networks

TL;DR: An effective yet simple method is proposed to encode the spatio-temporal information of skeleton sequences into color texture images, referred to as joint distance maps (JDMs), and convolutional neural networks are employed to exploit the discriminative features from the JDMs for human action and interaction recognition.
References
More filters
Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Posted Content

Caffe: Convolutional Architecture for Fast Feature Embedding

TL;DR: Caffe as discussed by the authors is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.
Proceedings ArticleDOI

Caffe: Convolutional Architecture for Fast Feature Embedding

TL;DR: Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.
Proceedings Article

Two-Stream Convolutional Networks for Action Recognition in Videos

TL;DR: This work proposes a two-stream ConvNet architecture which incorporates spatial and temporal networks and demonstrates that a ConvNet trained on multi-frame dense optical flow is able to achieve very good performance in spite of limited training data.
Related Papers (5)