scispace - formally typeset
Open AccessPosted Content

Quo Vadis, Skeleton Action Recognition ?

Reads0
Chats0
TLDR
The results from benchmarking the top performers of NTU-120 on Skeletics-152 reveal the challenges and domain gap induced by actions 'in the wild', and proposes new frontiers for human action recognition.
Abstract
In this paper, we study current and upcoming frontiers across the landscape of skeleton-based human action recognition. To begin with, we benchmark state-of-the-art models on the NTU-120 dataset and provide multi-layered assessment of the results. To examine skeleton action recognition 'in the wild', we introduce Skeletics-152, a curated and 3-D pose-annotated subset of RGB videos sourced from Kinetics-700, a large-scale action dataset. The results from benchmarking the top performers of NTU-120 on Skeletics-152 reveal the challenges and domain gap induced by actions 'in the wild'. We extend our study to include out-of-context actions by introducing Skeleton-Mimetics, a dataset derived from the recently introduced Mimetics dataset. Finally, as a new frontier for action recognition, we introduce Metaphorics, a dataset with caption-style annotated YouTube videos of the popular social game Dumb Charades and interpretative dance performances. Overall, our work characterizes the strengths and limitations of existing approaches and datasets. It also provides an assessment of top-performing approaches across a spectrum of activity settings and via the introduced datasets, proposes new frontiers for human action recognition.

read more

Citations
More filters
Journal ArticleDOI

The AMIRO Social Robotics Framework: Deployment and Evaluation on the Pepper Robot

TL;DR: The AMIRO social robotics framework as discussed by the authors is designed in a modular and robust way for assistive care scenarios, including robotic services for navigation, person detection and recognition, multi-lingual natural language interaction and dialogue management, as well as activity recognition and general behavior composition.
Journal ArticleDOI

Zoom Transformer for Skeleton-Based Group Activity Recognition

TL;DR: ZoomTransformer as discussed by the authors exploits both the low-level single-person motion information and the high-level multi-person interaction information in a uniform model structure with carefully designed relation-aware maps.
Journal ArticleDOI

Motion Guided Attention Learning for Self-Supervised 3D Human Action Recognition

TL;DR: Li et al. as mentioned in this paper proposed a Motion Guided Attention Learning (MG-AL) framework, which formulates the action representation learning as a self-supervised motion attention prediction problem.
Journal ArticleDOI

ConfLab: A Rich Multimodal Multisensor Dataset of Free-Standing Social Interactions In-the-Wild

TL;DR: The ConfLab dataset is described, an instantiation of a new concept for multimodal multisensor data collection of real life in-the-wild free standing social interactions in the form of a Conference Living Lab that aims to bridge the gap between traditional computer vision tasks and in- the-wild ecologically valid socially-motivated tasks.
Posted ContentDOI

Semi-supervised sequence modeling for improved behavioral segmentation

TL;DR: In this article, a semi-supervised approach is proposed to quantify animal behavior from video data by constructing a sequence model loss function with (1) a standard supervised loss that classifies a sparse set of hand labels; (2) a weakly supervised loss classifying a set of easy-to-compute heuristic labels; and (3) a selfsupervised loss predicting the evolution of the behavioral features.
References
More filters
Proceedings ArticleDOI

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

TL;DR: In this article, a Two-Stream Inflated 3D ConvNet (I3D) is proposed to learn seamless spatio-temporal feature extractors from video while leveraging successful ImageNet architecture designs and their parameters.
Journal ArticleDOI

OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields

TL;DR: OpenPose as mentioned in this paper uses Part Affinity Fields (PAFs) to learn to associate body parts with individuals in the image, which achieves high accuracy and real-time performance.
Proceedings Article

Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition

TL;DR: Wang et al. as discussed by the authors proposed a novel model of dynamic skeletons called Spatial-Temporal Graph Convolutional Networks (ST-GCN), which moves beyond the limitations of previous methods by automatically learning both the spatial and temporal patterns from data.
Proceedings ArticleDOI

Mining actionlet ensemble for action recognition with depth cameras

TL;DR: An actionlet ensemble model is learnt to represent each action and to capture the intra-class variance, and novel features that are suitable for depth data are proposed.
Posted Content

NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis

TL;DR: In this paper, a large-scale dataset for RGB+D human action recognition was introduced with more than 56 thousand video samples and 4 million frames, collected from 40 distinct subjects.
Related Papers (5)