Chained Multi-stream Networks Exploiting Pose, Motion, and Appearance for Action Classification and Detection

doi:10.1109/ICCV.2017.316

Open AccessProceedings ArticleDOI

Chained Multi-stream Networks Exploiting Pose, Motion, and Appearance for Action Classification and Detection

Mohammadreza Zolfaghari, +3 more

- pp 2923-2932

Chats0

TLDR

This paper proposes a network architecture that computes and integrates the most important visual cues for action recognition: pose, motion, and the raw images and introduces a Markov chain model which adds cues successively.

Abstract:

General human action recognition requires understanding of various visual cues. In this paper, we propose a network architecture that computes and integrates the most important visual cues for action recognition: pose, motion, and the raw images. For the integration, we introduce a Markov chain model which adds cues successively. The resulting approach is efficient and applicable to action classification as well as to spatial and temporal action localization. The two contributions clearly improve the performance over respective baselines. The overall approach achieves state-of-the-art action classification performance on HMDB51, J-HMDB and NTU RGB+D datasets. Moreover, it yields state-of-the-art spatio-temporal action localization results on UCF101 and J-HMDB.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Mosaic : Advancing User Quality of Experience in 360-Degree Video Streaming With Machine Learning

Sohee Park, +4 more

- 21 Jan 2021 -

IEEE Transactions on Network and Service...

TL;DR: Mosaic as discussed by the authors combines a powerful neural network-based viewport prediction with a rate control mechanism that assigns rates to different tiles in the 360-degree frame such that the video quality of experience is optimized subject to a given network capacity.

...read moreread less

Proceedings ArticleDOI

JOLO-GCN: Mining Joint-Centered Light-Weight Information for Skeleton-Based Action Recognition

Jinmiao Cai, +4 more

TL;DR: Zhang et al. as discussed by the authors proposed a two-stream graph convolutional network (JOLO-GCN) to capture the local subtle motion around each joint as pivotal joint-centered visual information.

...read moreread less

Posted Content

Pose And Joint-Aware Action Recognition.

Anshul Shah, +5 more

- 16 Oct 2020 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: A new model for joint-based action recognition is presented, which first extracts motion features from each joint separately through a shared motion encoder before performing collective reasoning, and which outperforms the existing baseline on Mimetics, a dataset with out-of-context actions.

...read moreread less

Posted Content

Pose-based Body Language Recognition for Emotion and Psychiatric Symptom Interpretation

Zhengyuan Yang, +4 more

- 30 Oct 2020 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: The accuracy and transferability of the proposed body language recognition method on several public action recognition datasets is validated and the framework outperforms other methods on the URMC dataset.

...read moreread less

Journal ArticleDOI

A tensor framework for geosensor data forecasting of significant societal events

Lihua Zhou, +7 more

- 01 Apr 2019 -

Pattern Recognition

TL;DR: A tensor pattern is used to model the geosensor data, based on which a tensor decomposition algorithm is then developed to estimate future values of geos sensor data, and a rank increasing strategy and a sliding window strategy are used to improve the prediction accuracy.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.

...read moreread less

Book ChapterDOI

U-Net: Convolutional Networks for Biomedical Image Segmentation

Olaf Ronneberger, +2 more

TL;DR: Neber et al. as discussed by the authors proposed a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently, which can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.

...read moreread less

Journal ArticleDOI

Gradient-based learning applied to document recognition

Yann LeCun, +6 more

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.

...read moreread less

Proceedings ArticleDOI

Histograms of oriented gradients for human detection

Navneet Dalal, +1 more

TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.

...read moreread less

Proceedings ArticleDOI

Fully convolutional networks for semantic segmentation

Jonathan Long, +2 more

TL;DR: The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.

...read moreread less

Collapse

Chained Multi-stream Networks Exploiting Pose, Motion, and Appearance for Action Classification and Detection

Citations

Mosaic : Advancing User Quality of Experience in 360-Degree Video Streaming With Machine Learning

JOLO-GCN: Mining Joint-Centered Light-Weight Information for Skeleton-Based Action Recognition

Pose And Joint-Aware Action Recognition.

Pose-based Body Language Recognition for Emotion and Psychiatric Symptom Interpretation

A tensor framework for geosensor data forecasting of significant societal events

References

Very Deep Convolutional Networks for Large-Scale Image Recognition

U-Net: Convolutional Networks for Biomedical Image Segmentation

Gradient-based learning applied to document recognition

Histograms of oriented gradients for human detection

Fully convolutional networks for semantic segmentation

Related Papers (5)

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

Learning Spatiotemporal Features with 3D Convolutional Networks

Two-Stream Convolutional Networks for Action Recognition in Videos

Deep Residual Learning for Image Recognition

Long-term recurrent convolutional networks for visual recognition and description