Videos as Space-Time Region Graphs.

Open AccessPosted Content

Videos as Space-Time Region Graphs.

Xiaolong Wang, +1 more

- 05 Jun 2018 -

arXiv: Computer Vision and Pattern Recog...

Chats0

TLDR

In this paper, the authors propose to represent videos as space-time region graphs which capture temporal shape dynamics and functional relationships between humans and objects, and perform reasoning on this graph representation via Graph Convolutional Networks.

Abstract:

How do humans recognize the action "opening a book" ? We argue that there are two important cues: modeling temporal shape dynamics and modeling functional relationships between humans and objects. In this paper, we propose to represent videos as space-time region graphs which capture these two important cues. Our graph nodes are defined by the object region proposals from different frames in a long range video. These nodes are connected by two types of relations: (i) similarity relations capturing the long range dependencies between correlated objects and (ii) spatial-temporal relations capturing the interactions between nearby objects. We perform reasoning on this graph representation via Graph Convolutional Networks. We achieve state-of-the-art results on both Charades and Something-Something datasets. Especially for Charades, we obtain a huge 4.4% gain when our model is applied in complex environments.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

SlowFast Networks for Video Recognition

Christoph Feichtenhofer, +3 more

TL;DR: This work presents SlowFast networks for video recognition, which achieves strong performance for both action classification and detection in video, and large improvements are pin-pointed as contributions by the SlowFast concept.

...read moreread less

Posted Content

TSM: Temporal Shift Module for Efficient Video Understanding

Ji Lin, +2 more

- 20 Nov 2018 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: A generic and effective Temporal Shift Module (TSM) that can achieve the performance of 3D CNN but maintain 2D CNN’s complexity and is extended to online setting, which enables real-time low-latency online video recognition and video object detection.

...read moreread less

Proceedings ArticleDOI

Skeleton-Based Action Recognition With Directed Graph Neural Networks

Lei Shi, +3 more

TL;DR: A novel directed graph neural network is designed specially to extract the information of joints, bones and their relations and make prediction based on the extracted features and is tested on two large-scale datasets, NTU-RGBD and Skeleton-Kinetics, and exceeds state-of-the-art performance on both of them.

...read moreread less

Proceedings ArticleDOI

Feature Selective Anchor-Free Module for Single-Shot Object Detection

Chenchen Zhu, +2 more

TL;DR: The FSAF module robustly improves the baseline RetinaNet by a large margin under various settings, while introducing nearly free inference overhead, and the resulting best model can achieve a state-of-the-art 44.6% mAP, outperforming all existing single-shot detectors on COCO.

...read moreread less

Journal ArticleDOI

Graph convolutional networks: a comprehensive review

Si Zhang, +3 more

- 01 Dec 2019 -

Computational Social Networks

TL;DR: A comprehensive review specifically on the emerging field of graph convolutional networks, which is one of the most prominent graph deep learning models, is conducted and several open challenges are presented and potential directions for future research are discussed.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Attention is All you Need

Ashish Vaswani, +7 more

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.

...read moreread less

Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.

...read moreread less

Journal ArticleDOI

ImageNet Large Scale Visual Recognition Challenge

Olga Russakovsky, +11 more

- 01 Dec 2015 -

International Journal of Computer Vision

TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.

...read moreread less

Book ChapterDOI

Microsoft COCO: Common Objects in Context

Tsung-Yi Lin, +7 more

TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.

...read moreread less

Journal ArticleDOI

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Shaoqing Ren, +3 more

- 01 Jun 2017 -

IEEE Transactions on Pattern Analysis an...

TL;DR: This work introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals and further merge RPN and Fast R-CNN into a single network by sharing their convolutionAL features.

...read moreread less