Chained Multi-stream Networks Exploiting Pose, Motion, and Appearance for Action Classification and Detection
Mohammadreza Zolfaghari,Gabriel L. Oliveira,Nima Sedaghat,Thomas Brox +3 more
- pp 2923-2932
Reads0
Chats0
TLDR
This paper proposes a network architecture that computes and integrates the most important visual cues for action recognition: pose, motion, and the raw images and introduces a Markov chain model which adds cues successively.Abstract:
General human action recognition requires understanding of various visual cues. In this paper, we propose a network architecture that computes and integrates the most important visual cues for action recognition: pose, motion, and the raw images. For the integration, we introduce a Markov chain model which adds cues successively. The resulting approach is efficient and applicable to action classification as well as to spatial and temporal action localization. The two contributions clearly improve the performance over respective baselines. The overall approach achieves state-of-the-art action classification performance on HMDB51, J-HMDB and NTU RGB+D datasets. Moreover, it yields state-of-the-art spatio-temporal action localization results on UCF101 and J-HMDB.read more
Citations
More filters
Journal ArticleDOI
Mosaic : Advancing User Quality of Experience in 360-Degree Video Streaming With Machine Learning
TL;DR: Mosaic as discussed by the authors combines a powerful neural network-based viewport prediction with a rate control mechanism that assigns rates to different tiles in the 360-degree frame such that the video quality of experience is optimized subject to a given network capacity.
Proceedings ArticleDOI
JOLO-GCN: Mining Joint-Centered Light-Weight Information for Skeleton-Based Action Recognition
TL;DR: Zhang et al. as discussed by the authors proposed a two-stream graph convolutional network (JOLO-GCN) to capture the local subtle motion around each joint as pivotal joint-centered visual information.
Posted Content
Pose And Joint-Aware Action Recognition.
Anshul Shah,Shlok Kumar Mishra,Ankan Bansal,Jun-Cheng Chen,Rama Chellappa,Abhinav Shrivastava +5 more
TL;DR: A new model for joint-based action recognition is presented, which first extracts motion features from each joint separately through a shared motion encoder before performing collective reasoning, and which outperforms the existing baseline on Mimetics, a dataset with out-of-context actions.
Posted Content
Pose-based Body Language Recognition for Emotion and Psychiatric Symptom Interpretation
TL;DR: The accuracy and transferability of the proposed body language recognition method on several public action recognition datasets is validated and the framework outperforms other methods on the URMC dataset.
Journal ArticleDOI
A tensor framework for geosensor data forecasting of significant societal events
TL;DR: A tensor pattern is used to model the geosensor data, based on which a tensor decomposition algorithm is then developed to estimate future values of geos sensor data, and a rank increasing strategy and a sliding window strategy are used to improve the prediction accuracy.
References
More filters
Proceedings Article
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan,Andrew Zisserman +1 more
TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Book ChapterDOI
U-Net: Convolutional Networks for Biomedical Image Segmentation
TL;DR: Neber et al. as discussed by the authors proposed a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently, which can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.
Journal ArticleDOI
Gradient-based learning applied to document recognition
Yann LeCun,Léon Bottou,Léon Bottou,Yoshua Bengio,Yoshua Bengio,Yoshua Bengio,Patrick Haffner +6 more
TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.
Proceedings ArticleDOI
Histograms of oriented gradients for human detection
Navneet Dalal,Bill Triggs +1 more
TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Proceedings ArticleDOI
Fully convolutional networks for semantic segmentation
TL;DR: The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.