An Extensive Analysis of the Vision-based Deep Learning Techniques for Action Recognition
Reads0
Chats0
TLDR
This paper has summarized the evolution of various action localization, classification, and detection algorithms applied to data from vision-based sensors and reviewed the datasets that have been used for the action classification, localization, and Detection process.Abstract:
Action recognition involves the idea of localizing and classifying actions in a video over a sequence of frames. It can be thought of as an image classification task extended temporally. The information obtained over the multitude of frames is aggregated to comprehend the action classification output. Applications of action recognition systems range from assistance for healthcare systems to human-machine interaction. Action recognition has proven to be a challenging task as it poses many impediments including high computation cost, capturing extended context, designing complex architectures, and lack of benchmark datasets. Increasing the efficiency of algorithms in human action recognition can significantly improve the probability of implementing it in real-world scenarios. This paper has summarized the evolution of various action localization, classification, and detection algorithms applied to data from vision-based sensors. We have also reviewed the datasets that have been used for the action classification, localization, and detection process. We have further explored the areas of action classification, temporal and spatiotemporal action detection, which use convolution neural networks, recurrent neural networks, or a combination of both.read more
Citations
More filters
References
More filters
Proceedings ArticleDOI
Histograms of oriented gradients for human detection
Navneet Dalal,Bill Triggs +1 more
TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Proceedings ArticleDOI
Learning Spatiotemporal Features with 3D Convolutional Networks
TL;DR: The learned features, namely C3D (Convolutional 3D), with a simple linear classifier outperform state-of-the-art methods on 4 different benchmarks and are comparable with current best methods on the other 2 benchmarks.
Proceedings Article
Two-Stream Convolutional Networks for Action Recognition in Videos
Karen Simonyan,Andrew Zisserman +1 more
TL;DR: This work proposes a two-stream ConvNet architecture which incorporates spatial and temporal networks and demonstrates that a ConvNet trained on multi-frame dense optical flow is able to achieve very good performance in spite of limited training data.
Proceedings ArticleDOI
Large-Scale Video Classification with Convolutional Neural Networks
TL;DR: This work studies multiple approaches for extending the connectivity of a CNN in time domain to take advantage of local spatio-temporal information and suggests a multiresolution, foveated architecture as a promising way of speeding up the training.
Related Papers (5)
Internal Transfer Learning for Improving Performance in Human Action Recognition for Small Datasets
Complex Human Action Recognition Using a Hierarchical Feature Reduction and Deep Learning-Based Method
Fatemeh Serpush,Mahdi Rezaei +1 more