Anticipating the future by watching unlabeled video.

Open AccessPosted Content

Anticipating the future by watching unlabeled video.

Carl Vondrick, +2 more

- 29 Apr 2015 -

arXiv: Computer Vision and Pattern Recog...

Chats0

TLDR

A large scale framework that capitalizes on temporal structure in unlabeled video to learn to anticipate both actions and objects in the future, and suggests that learning with unlabeling videos significantly helps forecast actions and anticipate objects.

Abstract:

In many computer vision applications, machines will need to reason beyond the present, and predict the future. This task is challenging because it requires leveraging extensive commonsense knowledge of the world that is difficult to write down. We believe that a promising resource for efficiently obtaining this knowledge is through the massive amounts of readily available unlabeled video. In this paper, we present a large scale framework that capitalizes on temporal structure in unlabeled video to learn to anticipate both actions and objects in the future. The key idea behind our approach is that we can train deep networks to predict the visual representation of images in the future. We experimentally validate this idea on two challenging "in the wild" video datasets, and our results suggest that learning with unlabeled videos significantly helps forecast actions and anticipate objects.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Social LSTM: Human Trajectory Prediction in Crowded Spaces

Alexandre Alahi, +5 more

TL;DR: This work proposes an LSTM model which can learn general human movement and predict their future trajectories and outperforms state-of-the-art methods on some of these datasets.

...read moreread less

Proceedings Article

Deep multi-scale video prediction beyond mean square error

Michael Mathieu, +4 more

TL;DR: This work trains a convolutional network to generate future frames given an input sequence and proposes three different and complementary feature learning strategies: a multi-scale architecture, an adversarial training method, and an image gradient difference loss function.

...read moreread less

Posted Content

Deep multi-scale video prediction beyond mean square error

Michael Mathieu, +4 more

- 17 Nov 2015 -

arXiv: Learning

TL;DR: In this paper, a multi-scale architecture, an adversarial training method, and an image gradient difference loss function were proposed to predict future frames from a video sequence. But their performance was not as good as those of the previous works.

...read moreread less

Proceedings ArticleDOI

The “Something Something” Video Database for Learning and Evaluating Visual Common Sense

Raghav Goyal, +13 more

TL;DR: This work describes the ongoing collection of the “something-something” database of video prediction tasks whose solutions require a common sense understanding of the depicted situation, and describes the challenges in crowd-sourcing this data at scale.

...read moreread less

Book ChapterDOI

Shuffle and Learn: Unsupervised Learning Using Temporal Order Verification

Ishan Misra, +2 more

TL;DR: This paper forms an approach for learning a visual representation from the raw spatiotemporal signals in videos using a Convolutional Neural Network, and shows that this method captures information that is temporally varying, such as human pose.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Journal ArticleDOI

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997 -

Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

Proceedings ArticleDOI

Going deeper with convolutions

Christian Szegedy, +8 more

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).

...read moreread less

Journal Article

Dropout: a simple way to prevent neural networks from overfitting

Nitish Srivastava, +4 more

- 01 Jan 2014 -

Journal of Machine Learning Research

TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

...read moreread less

Journal ArticleDOI

ImageNet Large Scale Visual Recognition Challenge

Olga Russakovsky, +11 more

- 01 Dec 2015 -

International Journal of Computer Vision

TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.

...read moreread less

Collapse

Related Papers (5)

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997 -

Neural Computation

UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild

Khurram Soomro, +2 more

- 03 Dec 2012 -

arXiv: Computer Vision and Pattern Recog...

Anticipating the future by watching unlabeled video.

Citations

Social LSTM: Human Trajectory Prediction in Crowded Spaces

Deep multi-scale video prediction beyond mean square error

Deep multi-scale video prediction beyond mean square error

The “Something Something” Video Database for Learning and Evaluating Visual Common Sense

Shuffle and Learn: Unsupervised Learning Using Temporal Order Verification

References

ImageNet Classification with Deep Convolutional Neural Networks

Long short-term memory

Going deeper with convolutions

Dropout: a simple way to prevent neural networks from overfitting

ImageNet Large Scale Visual Recognition Challenge

Related Papers (5)

Long short-term memory

UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild

Generative Adversarial Nets

Deep Residual Learning for Image Recognition

Two-Stream Convolutional Networks for Action Recognition in Videos