scispace - formally typeset
Open AccessJournal ArticleDOI

Quo Vadis, Skeleton Action Recognition?

TLDR
Skeleton-Mimetics-152 as discussed by the authors is a 3D pose-annotated subset of RGB videos sourced from Kinetics-700, a large-scale action dataset, and Metaphorics, a dataset with caption style annotated YouTube videos of the popular social game Dumb Charades and interpretative dance performances.
Abstract
In this paper, we study current and upcoming frontiers across the landscape of skeleton-based human action recognition. To study skeleton-action recognition in the wild, we introduce Skeletics-152, a curated and 3-D pose-annotated subset of RGB videos sourced from Kinetics-700, a large-scale action dataset. We extend our study to include out-of-context actions by introducing Skeleton-Mimetics, a dataset derived from the recently introduced Mimetics dataset. We also introduce Metaphorics, a dataset with caption-style annotated YouTube videos of the popular social game Dumb Charades and interpretative dance performances. We benchmark state-of-the-art models on the NTU-120 dataset and provide multi-layered assessment of the results. The results from benchmarking the top performers of NTU-120 on the newly introduced datasets reveal the challenges and domain gap induced by actions in the wild. Overall, our work characterizes the strengths and limitations of existing approaches and datasets. Via the introduced datasets, our work enables new frontiers for human action recognition.

read more

Citations
More filters
Proceedings ArticleDOI

Revisiting Skeleton-based Action Recognition

TL;DR: PoseConv3D as mentioned in this paper uses a 3D heatmap volume instead of a graph sequence as the base representation of human skeletons, which is more effective in learning spatio-temporal features, more robust against pose estimation noises, and generalizes better in cross-dataset settings.
Journal ArticleDOI

Zoom Transformer for Skeleton-Based Group Activity Recognition

TL;DR: Zoom Transformer as mentioned in this paper exploits both low-level single-person motion information and the high-level multi-person interaction information in a uniform model structure with carefully designed Relation-aware Maps.
Journal ArticleDOI

Motion Guided Attention Learning for Self-Supervised 3D Human Action Recognition

TL;DR: Li et al. as mentioned in this paper proposed a Motion Guided Attention Learning (MG-AL) framework, which formulates the action representation learning as a self-supervised motion attention prediction problem.
Journal ArticleDOI

PyHAPT: A Python-based Human Activity Pose Tracking data processing framework

Hao Quan, +1 more
TL;DR: Wang et al. as discussed by the authors proposed a novel Python-based human activity pose tracking data processing framework (PyHAPT), which provides the functionality to efficiently process annotated human pose tracking raw video data collected in unconstrained environments.
References
More filters
Proceedings ArticleDOI

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

TL;DR: In this article, a Two-Stream Inflated 3D ConvNet (I3D) is proposed to learn seamless spatio-temporal feature extractors from video while leveraging successful ImageNet architecture designs and their parameters.
Journal ArticleDOI

OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields

TL;DR: OpenPose as mentioned in this paper uses Part Affinity Fields (PAFs) to learn to associate body parts with individuals in the image, which achieves high accuracy and real-time performance.
Proceedings ArticleDOI

Mining actionlet ensemble for action recognition with depth cameras

TL;DR: An actionlet ensemble model is learnt to represent each action and to capture the intra-class variance, and novel features that are suitable for depth data are proposed.
Proceedings ArticleDOI

Action recognition based on a bag of 3D points

TL;DR: An action graph is employed to model explicitly the dynamics of the actions and a bag of 3D points to characterize a set of salient postures that correspond to the nodes in the action graph to recognize human actions from sequences of depth maps.
Proceedings ArticleDOI

Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group

TL;DR: A new skeletal representation that explicitly models the 3D geometric relationships between various body parts using rotations and translations in 3D space is proposed and outperforms various state-of-the-art skeleton-based human action recognition approaches.
Related Papers (5)