NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding

doi:10.1109/TPAMI.2019.2916873

Open AccessJournal ArticleDOI

NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding

Jun Liu, +5 more

- 01 Oct 2020 -

IEEE Transactions on Pattern Analysis an...

- Vol. 42, Iss: 10, pp 2684-2701

Chats0

TLDR

This work introduces a large-scale dataset for RGB+D human action recognition, which is collected from 106 distinct subjects and contains more than 114 thousand video samples and 8 million frames, and investigates a novel one-shot 3D activity recognition problem on this dataset.

Abstract:

Research on depth-based human activity analysis achieved outstanding performance and demonstrated the effectiveness of 3D representation for action recognition. The existing depth-based and RGB+D-based action recognition benchmarks have a number of limitations, including the lack of large-scale training samples, realistic number of distinct class categories, diversity in camera views, varied environmental conditions, and variety of human subjects. In this work, we introduce a large-scale dataset for RGB+D human action recognition, which is collected from 106 distinct subjects and contains more than 114 thousand video samples and 8 million frames. This dataset contains 120 different action classes including daily, mutual, and health-related activities. We evaluate the performance of a series of existing 3D activity analysis methods on this dataset, and show the advantage of applying deep learning methods for 3D-based human action recognition. Furthermore, we investigate a novel one-shot 3D activity recognition problem on our dataset, and a simple yet effective Action-Part Semantic Relevance-aware (APSR) framework is proposed for this task, which yields promising results for recognition of the novel action classes. We believe the introduction of this large-scale dataset will enable the community to apply, adapt, and develop various data-hungry learning techniques for depth-based and RGB+D-based human activity understanding.

NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding

Citations

Skeleton-Based Action Recognition With Shift Graph Convolutional Network

Human Action Recognition and Prediction: A Survey.

Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition

Computer Vision and Image Understanding

Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition

References

Very Deep Convolutional Networks for Large-Scale Image Recognition

Glove: Global Vectors for Word Representation

Distributed Representations of Words and Phrases and their Compositionality

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Two-Stream Convolutional Networks for Action Recognition in Videos

Related Papers (5)

NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis

Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition

Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition

Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition

Enhanced skeleton visualization for view invariant human action recognition