Constructing Stronger and Faster Baselines for Skeleton-based Action Recognition.

Open AccessPosted Content

Constructing Stronger and Faster Baselines for Skeleton-based Action Recognition.

- 29 Jun 2021 -

arXiv: Computer Vision and Pattern Recog...

TLDR

Wang et al. as discussed by the authors designed a compound scaling strategy to expand the model's width and depth synchronously, and eventually obtained a family of efficient GCN baselines with high accuracies and small amounts of trainable parameters, where ''x'' denotes the scaling coefficient.

Abstract:

One essential problem in skeleton-based action recognition is how to extract discriminative features over all skeleton joints. However, the complexity of the recent State-Of-The-Art (SOTA) models for this task tends to be exceedingly sophisticated and over-parameterized. The low efficiency in model training and inference has increased the validation costs of model architectures in large-scale datasets. To address the above issue, recent advanced separable convolutional layers are embedded into an early fused Multiple Input Branches (MIB) network, constructing an efficient Graph Convolutional Network (GCN) baseline for skeleton-based action recognition. In addition, based on such the baseline, we design a compound scaling strategy to expand the model's width and depth synchronously, and eventually obtain a family of efficient GCN baselines with high accuracies and small amounts of trainable parameters, termed EfficientGCN-Bx, where ''x'' denotes the scaling coefficient. On two large-scale datasets, i.e., NTU RGB+D 60 and 120, the proposed EfficientGCN-B4 baseline outperforms other SOTA methods, e.g., achieving 91.7% accuracy on the cross-subject benchmark of NTU 60 dataset, while being 3.15x smaller and 3.21x faster than MS-G3D, which is one of the best SOTA methods. The source code in PyTorch version and the pretrained models are available at this https URL.

Citations

PDF

Open Access

More filters

Posted ContentDOI

Human Action Recognition from Various Data Modalities: A Review

Zehua Sun, +5 more

- 22 Dec 2020 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This paper reviews both the hand-crafted feature-based and deep learning-based methods for single data modalities and also the methods based on multiple modalities, including the fusion-based frameworks and the co-learning-based approaches for HAR.

...read moreread less

Journal ArticleDOI

A union of deep learning and swarm-based optimization for 3D human action recognition

Hritam Basak, +5 more

- 31 Mar 2022 -

Dental science reports

TL;DR: DSwarm-Net as mentioned in this paper employs deep learning and swarm intelligence-based metaheuristic for HAR that uses 3D skeleton data for action classification and extracts four different types of features from the skeletal data namely: Distance, Distance Velocity, Angle, and Angle Velocity, which capture complementary information from the skeleton joints for encoding them into images.

...read moreread less

Journal ArticleDOI

Convolutional Neural Networks or Vision Transformers: Who Will Win the Race for Action Recognitions in Visual Data?

Oumaima Moutik, +6 more

- 01 Jan 2023 -

Sensors

TL;DR: Zhang et al. as discussed by the authors conducted a comparative study of the accuracy-complexity trade-off between CNN and Transformer for action recognition in video clips, and based on the performance analysis's outcome, the question of whether CNN or Vision Transformers will win the race was discussed.

...read moreread less

Journal ArticleDOI

Human Action Recognition: A Taxonomy-Based Survey, Updates, and Opportunities

Md. Golam Morshed, +3 more

- 01 Feb 2023 -

Sensors

TL;DR: In this paper , a taxonomy-based, rigorous study of human activity recognition techniques is presented, discussing the best ways to acquire human action features, derived using RGB and depth data, as well as the latest research on deep learning and hand-crafted techniques.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Posted Content

Semi-Supervised Classification with Graph Convolutional Networks

Thomas Kipf, +1 more

- 09 Sep 2016 -

arXiv: Learning

TL;DR: A scalable approach for semi-supervised learning on graph-structured data that is based on an efficient variant of convolutional neural networks which operate directly on graphs which outperforms related methods by a significant margin.

...read moreread less

Journal ArticleDOI

Squeeze-and-Excitation Networks

Jie Hu, +4 more

TL;DR: This work proposes a novel architectural unit, which is term the "Squeeze-and-Excitation" (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels and finds that SE blocks produce significant performance improvements for existing state-of-the-art deep architectures at minimal additional computational cost.

...read moreread less

Posted Content

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Andrew Howard, +7 more

- 17 Apr 2017 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This work introduces two simple global hyper-parameters that efficiently trade off between latency and accuracy and demonstrates the effectiveness of MobileNets across a wide range of applications and use cases including object detection, finegrain classification, face attributes and large scale geo-localization.

...read moreread less

Proceedings ArticleDOI

MobileNetV2: Inverted Residuals and Linear Bottlenecks

Mark Sandler, +4 more

TL;DR: MobileNetV2 as mentioned in this paper is based on an inverted residual structure where the shortcut connections are between the thin bottleneck layers and intermediate expansion layer uses lightweight depthwise convolutions to filter features as a source of non-linearity.

...read moreread less

Collapse

Related Papers (5)

Fast and Memory Efficient Graph Optimization via ICM for Visual Place Recognition

Stefan Schubert, +2 more

Projection Bank: From High-dimensional Data to Medium-length Binary Codes

Li Liu, +2 more

- 16 Sep 2015 -

arXiv: Computer Vision and Pattern Recog...

MutualNet: Adaptive ConvNet via Mutual Learning from Network Width and Resolution

Taojiannan Yang, +5 more

- 27 Sep 2019 -

arXiv: Computer Vision and Pattern Recog...

Efficient end-to-end learning for quantizable representations

Yeonwoo Jeong, +1 more

Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products

Tharun Medini, +4 more

Constructing Stronger and Faster Baselines for Skeleton-based Action Recognition.

Citations

Human Action Recognition from Various Data Modalities: A Review

A union of deep learning and swarm-based optimization for 3D human action recognition

Development and Validation of a Deep Learning Method to Predict Cerebral Palsy From Spontaneous Movements in Infants at High Risk

Convolutional Neural Networks or Vision Transformers: Who Will Win the Race for Action Recognitions in Visual Data?

Human Action Recognition: A Taxonomy-Based Survey, Updates, and Opportunities

References

Deep Residual Learning for Image Recognition

Semi-Supervised Classification with Graph Convolutional Networks

Squeeze-and-Excitation Networks

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

MobileNetV2: Inverted Residuals and Linear Bottlenecks

Related Papers (5)

Fast and Memory Efficient Graph Optimization via ICM for Visual Place Recognition

Projection Bank: From High-dimensional Data to Medium-length Binary Codes

MutualNet: Adaptive ConvNet via Mutual Learning from Network Width and Resolution

Efficient end-to-end learning for quantizable representations

Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products