Quo Vadis, Skeleton Action Recognition ?

Open AccessPosted Content

Quo Vadis, Skeleton Action Recognition ?

Pranay Gupta, +6 more

- 04 Jul 2020 -

arXiv: Computer Vision and Pattern Recog...

Chats0

TLDR

The results from benchmarking the top performers of NTU-120 on Skeletics-152 reveal the challenges and domain gap induced by actions 'in the wild', and proposes new frontiers for human action recognition.

Abstract:

In this paper, we study current and upcoming frontiers across the landscape of skeleton-based human action recognition. To begin with, we benchmark state-of-the-art models on the NTU-120 dataset and provide multi-layered assessment of the results. To examine skeleton action recognition 'in the wild', we introduce Skeletics-152, a curated and 3-D pose-annotated subset of RGB videos sourced from Kinetics-700, a large-scale action dataset. The results from benchmarking the top performers of NTU-120 on Skeletics-152 reveal the challenges and domain gap induced by actions 'in the wild'. We extend our study to include out-of-context actions by introducing Skeleton-Mimetics, a dataset derived from the recently introduced Mimetics dataset. Finally, as a new frontier for action recognition, we introduce Metaphorics, a dataset with caption-style annotated YouTube videos of the popular social game Dumb Charades and interpretative dance performances. Overall, our work characterizes the strengths and limitations of existing approaches and datasets. It also provides an assessment of top-performing approaches across a spectrum of activity settings and via the introduced datasets, proposes new frontiers for human action recognition.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

The AMIRO Social Robotics Framework: Deployment and Evaluation on the Pepper Robot

Alexandra Ștefania Ghiță, +7 more

- 18 Dec 2020 -

Sensors

TL;DR: The AMIRO social robotics framework as discussed by the authors is designed in a modular and robust way for assistive care scenarios, including robotic services for navigation, person detection and recognition, multi-lingual natural language interaction and dialogue management, as well as activity recognition and general behavior composition.

...read moreread less

Journal ArticleDOI

Zoom Transformer for Skeleton-Based Group Activity Recognition

Jiaxu Zhang, +3 more

TL;DR: ZoomTransformer as discussed by the authors exploits both the low-level single-person motion information and the high-level multi-person interaction information in a uniform model structure with carefully designed relation-aware maps.

...read moreread less

Journal ArticleDOI

Motion Guided Attention Learning for Self-Supervised 3D Human Action Recognition

Yang Yang, +2 more

TL;DR: Li et al. as mentioned in this paper proposed a Motion Guided Attention Learning (MG-AL) framework, which formulates the action representation learning as a self-supervised motion attention prediction problem.

...read moreread less

Journal ArticleDOI

ConfLab: A Rich Multimodal Multisensor Dataset of Free-Standing Social Interactions In-the-Wild

Chirag Raman, +5 more

arXiv.org

TL;DR: The ConfLab dataset is described, an instantiation of a new concept for multimodal multisensor data collection of real life in-the-wild free standing social interactions in the form of a Conference Living Lab that aims to bridge the gap between traditional computer vision tasks and in- the-wild ecologically valid socially-motivated tasks.

...read moreread less

Posted ContentDOI

Semi-supervised sequence modeling for improved behavioral segmentation

Matthew R Whiteway, +6 more

- 17 Jun 2021 -

bioRxiv

TL;DR: In this article, a semi-supervised approach is proposed to quantify animal behavior from video data by constructing a sequence model loss function with (1) a standard supervised loss that classifies a sparse set of hand labels; (2) a weakly supervised loss classifying a set of easy-to-compute heuristic labels; and (3) a selfsupervised loss predicting the evolution of the behavioral features.

...read moreread less

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

Joao Carreira, +1 more

TL;DR: In this article, a Two-Stream Inflated 3D ConvNet (I3D) is proposed to learn seamless spatio-temporal feature extractors from video while leveraging successful ImageNet architecture designs and their parameters.

...read moreread less

Journal ArticleDOI

OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields

Zhe Cao, +4 more

- 01 Jan 2021 -

IEEE Transactions on Pattern Analysis an...

TL;DR: OpenPose as mentioned in this paper uses Part Affinity Fields (PAFs) to learn to associate body parts with individuals in the image, which achieves high accuracy and real-time performance.

...read moreread less

Proceedings Article

Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition

Sijie Yan, +2 more

TL;DR: Wang et al. as discussed by the authors proposed a novel model of dynamic skeletons called Spatial-Temporal Graph Convolutional Networks (ST-GCN), which moves beyond the limitations of previous methods by automatically learning both the spatial and temporal patterns from data.

...read moreread less

Proceedings ArticleDOI

Mining actionlet ensemble for action recognition with depth cameras

Jiang Wang, +3 more

TL;DR: An actionlet ensemble model is learnt to represent each action and to capture the intra-class variance, and novel features that are suitable for depth data are proposed.

...read moreread less

Posted Content

NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis

Amir Shahroudy, +3 more

- 11 Apr 2016 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: In this paper, a large-scale dataset for RGB+D human action recognition was introduced with more than 56 thousand video samples and 4 million frames, collected from 40 distinct subjects.

...read moreread less