scispace - formally typeset
Proceedings ArticleDOI

Deep Future Gaze: Gaze Anticipation on Egocentric Videos Using Adversarial Networks

Reads0
Chats0
TLDR
A new generative adversarial neural network based model, Deep Future Gaze (DFG), which generates multiple future frames conditioned on the single current frame and anticipates corresponding future gazes in next few seconds and achieves better performance of gaze prediction on current frames than state-of-the-art methods.
Abstract
We introduce a new problem of gaze anticipation on egocentric videos. This substantially extends the conventional gaze prediction problem to future frames by no longer confining it on the current frame. To solve this problem, we propose a new generative adversarial neural network based model, Deep Future Gaze (DFG). DFG generates multiple future frames conditioned on the single current frame and anticipates corresponding future gazes in next few seconds. It consists of two networks: generator and discriminator. The generator uses a two-stream spatial temporal convolution architecture (3D-CNN) explicitly untangling the foreground and the background to generate future frames. It then attaches another 3D-CNN for gaze anticipation based on these synthetic frames. The discriminator plays against the generator by differentiating the synthetic frames of the generator from the real frames. Through competition with discriminator, the generator progressively improves quality of the future frames and thus anticipates future gaze better. Experimental results on the publicly available egocentric datasets show that DFG significantly outperforms all well-established baselines. Moreover, we demonstrate that DFG achieves better performance of gaze prediction on current frames than state-of-the-art methods. This is due to benefiting from learning motion discriminative representations in frame generation. We further contribute a new egocentric dataset (OST) in the object search task. DFG also achieves the best performance for this challenging dataset.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Generative Adversarial Networks: An Overview

TL;DR: Generative adversarial networks (GANs) as mentioned in this paper provide a way to learn deep representations without extensively annotated training data by deriving backpropagation signals through a competitive process involving a pair of networks.
Journal ArticleDOI

Generative Adversarial Networks: An Overview

TL;DR: The aim of this review article is to provide an overview of GANs for the signal processing community, drawing on familiar analogies and concepts where possible, and point to remaining challenges in their theory and application.
Posted Content

A Review on Generative Adversarial Networks: Algorithms, Theory, and Applications

TL;DR: This paper attempts to provide a review on various GANs methods from the perspectives of algorithms, theory, and applications, and compares the commonalities and differences of these GAns methods.
Book ChapterDOI

In the Eye of Beholder: Joint Learning of Gaze and Actions in First Person Video

TL;DR: A novel deep model is proposed for joint gaze estimation and action recognition in First Person Vision that describes the participant’s gaze as a probabilistic variable and models its distribution using stochastic units in a deep network to generate an attention map.
Proceedings ArticleDOI

Joint Pose and Expression Modeling for Facial Expression Recognition

TL;DR: An end-to-end deep learning model by exploiting different poses and expressions jointly for simultaneous facial image synthesis and pose-invariant facial expression recognition, based on generative adversarial network (GAN).
References
More filters
Proceedings Article

Adam: A Method for Stochastic Optimization

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Book ChapterDOI

Visualizing and Understanding Convolutional Networks

TL;DR: A novel visualization technique is introduced that gives insight into the function of intermediate feature layers and the operation of the classifier in large Convolutional Network models, used in a diagnostic role to find model architectures that outperform Krizhevsky et al on the ImageNet classification benchmark.
Journal ArticleDOI

A feature-integration theory of attention

TL;DR: A new hypothesis about the role of focused attention is proposed, which offers a new set of criteria for distinguishing separable from integral features and a new rationale for predicting which tasks will show attention limits and which will not.
Journal ArticleDOI

A model of saliency-based visual attention for rapid scene analysis

TL;DR: In this article, a visual attention system inspired by the behavior and the neuronal architecture of the early primate visual system is presented, where multiscale image features are combined into a single topographical saliency map.

A model of saliency-based visual attention for rapid scene analysis

Laurent Itti
TL;DR: A visual attention system, inspired by the behavior and the neuronal architecture of the early primate visual system, is presented, which breaks down the complex problem of scene understanding by rapidly selecting conspicuous locations to be analyzed in detail.
Related Papers (5)