scispace - formally typeset
Book ChapterDOI

Human Action Recognition from 3D Landmark Points of the Performer

TLDR
In this article, the 3D landmark points of the human pose were extracted from a single image and then used as features for action recognition by applying an autoencoder architecture followed by a regression layer.
Abstract
Recognizing human actions is an active research area, where pose of the performer is an important cue for recognition. However, applying the 3D landmark points of the performer in recognizing action, is relatively less explored area of research due to the challenge involved in the process of extracting 3D landmark points from single view of the performers. With the recent advancements in the area of 3D landmark point detection, exploiting the landmark points in recognizing human action, is a good idea. We propose a technique for Human Action Recognition by learning the 3D landmark points of human pose, obtained from single image. We apply an autoencoder architecture followed by a regression layer to estimate the pose parameters like shape, gesture and camera position, which are later mapped to the 3D landmark points by Skinned Multi Person Linear Model (SMPL model). The proposed method is a novel attempt to apply a CNN based 3D pose reconstruction model (autoencoder) for recognizing action. Further, instead of using the autoencoder as a classifier to classify to 3D poses, we replace the decoder part by a regressor to obtain the landmark points, which are then fed into a classifier. The 3D landmark points of the human performer(s) at each frame, are fed into a neural network classifier as features for recognizing action.

read more

References
More filters
Proceedings ArticleDOI

Histograms of oriented gradients for human detection

TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Posted Content

UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild

TL;DR: This work introduces UCF101 which is currently the largest dataset of human actions and provides baseline action recognition results on this new dataset using standard bag of words approach with overall performance of 44.5%.
Proceedings Article

3D Convolutional Neural Networks for Human Action Recognition

TL;DR: A novel 3D CNN model for action recognition that extracts features from both the spatial and the temporal dimensions by performing 3D convolutions, thereby capturing the motion information encoded in multiple adjacent frames.
Proceedings ArticleDOI

Learning realistic human actions from movies

TL;DR: A new method for video classification that builds upon and extends several recent ideas including local space-time features,space-time pyramids and multi-channel non-linear SVMs is presented and shown to improve state-of-the-art results on the standard KTH action dataset.
Proceedings ArticleDOI

Action Recognition with Improved Trajectories

TL;DR: Dense trajectories were shown to be an efficient video representation for action recognition and achieved state-of-the-art results on a variety of datasets are improved by taking into account camera motion to correct them.
Related Papers (5)