Two-Stream SR-CNNs for Action Recognition in Videos.

doi:10.5244/C.30.108

Open AccessProceedings ArticleDOI

Two-Stream SR-CNNs for Action Recognition in Videos.

Yifan Wang, +4 more

Chats0

TLDR

This paper proposes a new deep architecture by incorporating human/object detection results into the framework, called two-stream semantic region based CNNs (SR-CNNs), which not only shares great modeling capacity with the original two- stream CNNs, but also exhibits the flexibility of leveraging semantic cues for action understanding.

Abstract:

Human action is a high-level concept in computer vision research and understanding it may benefit from different semantics, such as human pose, interacting objects, and scene context. In this paper, we explicitly exploit semantic cues with aid of existing human/object detectors for action recognition in videos, and thoroughly study their effect on the recognition performance for different types of actions. Specifically, we propose a new deep architecture by incorporating human/object detection results into the framework, called two-stream semantic region based CNNs (SR-CNNs). Our proposed architecture not only shares great modeling capacity with the original two-stream CNNs, but also exhibits the flexibility of leveraging semantic cues (e.g. scene, person, object) for action understanding. We perform experiments on the UCF101 dataset and demonstrate its superior performance to the original two-stream CNNs. In addition, we systematically study the effect of incorporating semantic cues on the recognition performance for different types of action classes, and try to provide some insights for building more reasonable action benchmarks and developing better recognition algorithms.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

RGB-D-based human motion recognition with deep learning: A survey

Pichao Wang, +4 more

- 01 Jun 2018 -

Computer Vision and Image Understanding

TL;DR: A detailed overview of recent advances in RGB-D-based motion recognition is presented in this paper, where the reviewed methods are broadly categorized into four groups, depending on the modality adopted for recognition: RGB-based, depth based, skeleton-based and RGB+D based.

...read moreread less

Journal ArticleDOI

Multi-stream CNN: Learning representations based on human-related regions for action recognition

Zhigang Tu, +6 more

- 01 Jul 2018 -

Pattern Recognition

TL;DR: A human-related multi-stream CNN (HR-MSCNN) architecture that encodes appearance, motion, and the captured tubes of the human- related regions is introduced that achieves state-of-the-art results on these four datasets.

...read moreread less

Journal ArticleDOI

TS-LSTM and temporal-inception: Exploiting spatiotemporal dynamics for activity recognition

Chih-Yao Ma, +3 more

- 01 Feb 2019 -

Signal Processing-image Communication

TL;DR: In this article, a baseline two-stream convolutional neural network (2-stream ConvNet) with LSTM and Temporal Segment RNN (TSRNN) with Inception-style Temporal-ConvNet was used to extract spatiotemporal information.

...read moreread less

Posted Content

RGB-D-based Human Motion Recognition with Deep Learning: A Survey

Pichao Wang, +4 more

- 31 Oct 2017 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: A detailed overview of recent advances in RGB-D-based motion recognition is presented in this paper, where the reviewed methods are broadly categorized into four groups, depending on the modality adopted for recognition: RGB-based, depth based, skeleton-based and RGB+D based.

...read moreread less

Journal ArticleDOI

Knowledge Guided Disambiguation for Large-Scale Scene Classification With Multi-Resolution CNNs

Limin Wang, +4 more

- 01 Apr 2017 -

IEEE Transactions on Image Processing

TL;DR: Wang et al. as mentioned in this paper proposed a multi-resolution CNN architecture that captures visual content and structure at multiple levels and designed two knowledge guided disambiguation techniques to deal with the problem of label ambiguity.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.

...read moreread less

Journal ArticleDOI

Gradient-based learning applied to document recognition

Yann LeCun, +6 more

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.

...read moreread less

Journal ArticleDOI

ImageNet Large Scale Visual Recognition Challenge

Olga Russakovsky, +11 more

- 01 Dec 2015 -

International Journal of Computer Vision

TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.

...read moreread less

Posted Content

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Shaoqing Ren, +3 more

- 04 Jun 2015 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: Faster R-CNN as discussed by the authors proposes a Region Proposal Network (RPN) to generate high-quality region proposals, which are used by Fast R-NN for detection.

...read moreread less