scispace - formally typeset
Open AccessJournal ArticleDOI

A Comprehensive Survey of Vision-Based Human Action Recognition Methods.

TLDR
This survey paper provides a comprehensive overview of recent approaches in human action recognition research, including progress in hand-designed action features in RGB and depth data, current deep learning-based action feature representation methods, advances in human–object interaction recognition methods, and the current prominent research topic of action detection methods.
Abstract
Although widely used in many applications, accurate and efficient human action recognition remains a challenging area of research in the field of computer vision. Most recent surveys have focused on narrow problems such as human action recognition methods using depth data, 3D-skeleton data, still image data, spatiotemporal interest point-based methods, and human walking motion recognition. However, there has been no systematic survey of human action recognition. To this end, we present a thorough review of human action recognition methods and provide a comprehensive overview of recent approaches in human action recognition research, including progress in hand-designed action features in RGB and depth data, current deep learning-based action feature representation methods, advances in human⁻object interaction recognition methods, and the current prominent research topic of action detection methods. Finally, we present several analysis recommendations for researchers. This survey paper provides an essential reference for those interested in further research on human action recognition.

read more

Citations
More filters
Journal ArticleDOI

Vision-based human activity recognition: a survey

TL;DR: Most computer vision applications such as human computer interaction, virtual reality, security, video surveillance and home monitoring are highly correlated to HAR tasks, which establishes new trend and milestone in the development cycle of HAR systems.
Book

Synthetic Data for Deep Learning

TL;DR: The synthetic-to-real domain adaptation problem that inevitably arises in applications of synthetic data is discussed, including synthetic- to-real refinement with GAN-based models and domain adaptation at the feature/model level without explicit data transformations.
Journal ArticleDOI

Skeleton-based action recognition via spatial and temporal transformer networks

TL;DR: This work proposes a novel Spatial-Temporal Transformer network (ST-TR) which models dependencies between joints using the Transformer self-attention operator, outperforming the state-of-the-art on NTU-RGB+D w.r.t.
Book ChapterDOI

Spatial Temporal Transformer Network for Skeleton-based Action Recognition

TL;DR: This work proposes a novel Spatial-Temporal Transformer network (ST-TR) which models dependencies between joints using the Transformer self-attention operator, outperforming the state-of-the-art on NTU-RGB+D w.r.t. models using the same input data consisting of joint information.
Journal ArticleDOI

Vision-based human action recognition: An overview and real world challenges

TL;DR: This paper investigates an overview of the existing methods according to the kind of issue they address, and presents a comparison of the already introduced datasets introduced for the human action recognition field.
References
More filters
Journal ArticleDOI

Representation Learning: A Review and New Perspectives

TL;DR: Recent work in the area of unsupervised feature learning and deep learning is reviewed, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks.
Proceedings Article

Two-Stream Convolutional Networks for Action Recognition in Videos

TL;DR: This work proposes a two-stream ConvNet architecture which incorporates spatial and temporal networks and demonstrates that a ConvNet trained on multi-frame dense optical flow is able to achieve very good performance in spite of limited training data.
Posted Content

UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild

TL;DR: This work introduces UCF101 which is currently the largest dataset of human actions and provides baseline action recognition results on this new dataset using standard bag of words approach with overall performance of 44.5%.
Posted Content

Long-term Recurrent Convolutional Networks for Visual Recognition and Description

TL;DR: A novel recurrent convolutional architecture suitable for large-scale visual learning which is end-to-end trainable, and shows such models have distinct advantages over state-of-the-art models for recognition or generation which are separately defined and/or optimized.
Posted Content

Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields

TL;DR: This work presents an approach to efficiently detect the 2D pose of multiple people in an image using a nonparametric representation, which it refers to as Part Affinity Fields (PAFs), to learn to associate body parts with individuals in the image.
Related Papers (5)