scispace - formally typeset
Open AccessProceedings ArticleDOI

Chained Multi-stream Networks Exploiting Pose, Motion, and Appearance for Action Classification and Detection

Reads0
Chats0
TLDR
This paper proposes a network architecture that computes and integrates the most important visual cues for action recognition: pose, motion, and the raw images and introduces a Markov chain model which adds cues successively.
Abstract
General human action recognition requires understanding of various visual cues. In this paper, we propose a network architecture that computes and integrates the most important visual cues for action recognition: pose, motion, and the raw images. For the integration, we introduce a Markov chain model which adds cues successively. The resulting approach is efficient and applicable to action classification as well as to spatial and temporal action localization. The two contributions clearly improve the performance over respective baselines. The overall approach achieves state-of-the-art action classification performance on HMDB51, J-HMDB and NTU RGB+D datasets. Moreover, it yields state-of-the-art spatio-temporal action localization results on UCF101 and J-HMDB.

read more

Citations
More filters
Journal ArticleDOI

Mosaic : Advancing User Quality of Experience in 360-Degree Video Streaming With Machine Learning

TL;DR: Mosaic as discussed by the authors combines a powerful neural network-based viewport prediction with a rate control mechanism that assigns rates to different tiles in the 360-degree frame such that the video quality of experience is optimized subject to a given network capacity.
Proceedings ArticleDOI

JOLO-GCN: Mining Joint-Centered Light-Weight Information for Skeleton-Based Action Recognition

TL;DR: Zhang et al. as discussed by the authors proposed a two-stream graph convolutional network (JOLO-GCN) to capture the local subtle motion around each joint as pivotal joint-centered visual information.
Posted Content

Pose And Joint-Aware Action Recognition.

TL;DR: A new model for joint-based action recognition is presented, which first extracts motion features from each joint separately through a shared motion encoder before performing collective reasoning, and which outperforms the existing baseline on Mimetics, a dataset with out-of-context actions.
Posted Content

Pose-based Body Language Recognition for Emotion and Psychiatric Symptom Interpretation

TL;DR: The accuracy and transferability of the proposed body language recognition method on several public action recognition datasets is validated and the framework outperforms other methods on the URMC dataset.
Journal ArticleDOI

A tensor framework for geosensor data forecasting of significant societal events

TL;DR: A tensor pattern is used to model the geosensor data, based on which a tensor decomposition algorithm is then developed to estimate future values of geos sensor data, and a rank increasing strategy and a sliding window strategy are used to improve the prediction accuracy.
References
More filters
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Book ChapterDOI

U-Net: Convolutional Networks for Biomedical Image Segmentation

TL;DR: Neber et al. as discussed by the authors proposed a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently, which can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.
Journal ArticleDOI

Gradient-based learning applied to document recognition

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.
Proceedings ArticleDOI

Histograms of oriented gradients for human detection

TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Proceedings ArticleDOI

Fully convolutional networks for semantic segmentation

TL;DR: The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.
Related Papers (5)