Proceedings ArticleDOI
Deep Future Gaze: Gaze Anticipation on Egocentric Videos Using Adversarial Networks
Mengmi Zhang,Keng Teck Ma,Joo-Hwee Lim,Qi Zhao,Jiashi Feng +4 more
- pp 3539-3548
Reads0
Chats0
TLDR
A new generative adversarial neural network based model, Deep Future Gaze (DFG), which generates multiple future frames conditioned on the single current frame and anticipates corresponding future gazes in next few seconds and achieves better performance of gaze prediction on current frames than state-of-the-art methods.Abstract:
We introduce a new problem of gaze anticipation on egocentric videos. This substantially extends the conventional gaze prediction problem to future frames by no longer confining it on the current frame. To solve this problem, we propose a new generative adversarial neural network based model, Deep Future Gaze (DFG). DFG generates multiple future frames conditioned on the single current frame and anticipates corresponding future gazes in next few seconds. It consists of two networks: generator and discriminator. The generator uses a two-stream spatial temporal convolution architecture (3D-CNN) explicitly untangling the foreground and the background to generate future frames. It then attaches another 3D-CNN for gaze anticipation based on these synthetic frames. The discriminator plays against the generator by differentiating the synthetic frames of the generator from the real frames. Through competition with discriminator, the generator progressively improves quality of the future frames and thus anticipates future gaze better. Experimental results on the publicly available egocentric datasets show that DFG significantly outperforms all well-established baselines. Moreover, we demonstrate that DFG achieves better performance of gaze prediction on current frames than state-of-the-art methods. This is due to benefiting from learning motion discriminative representations in frame generation. We further contribute a new egocentric dataset (OST) in the object search task. DFG also achieves the best performance for this challenging dataset.read more
Citations
More filters
Journal ArticleDOI
Generative Adversarial Networks: An Overview
TL;DR: Generative adversarial networks (GANs) as mentioned in this paper provide a way to learn deep representations without extensively annotated training data by deriving backpropagation signals through a competitive process involving a pair of networks.
Journal ArticleDOI
Generative Adversarial Networks: An Overview
TL;DR: The aim of this review article is to provide an overview of GANs for the signal processing community, drawing on familiar analogies and concepts where possible, and point to remaining challenges in their theory and application.
Posted Content
A Review on Generative Adversarial Networks: Algorithms, Theory, and Applications
TL;DR: This paper attempts to provide a review on various GANs methods from the perspectives of algorithms, theory, and applications, and compares the commonalities and differences of these GAns methods.
Book ChapterDOI
In the Eye of Beholder: Joint Learning of Gaze and Actions in First Person Video
Yin Li,Miao Liu,James M. Rehg +2 more
TL;DR: A novel deep model is proposed for joint gaze estimation and action recognition in First Person Vision that describes the participant’s gaze as a probabilistic variable and models its distribution using stochastic units in a deep network to generate an attention map.
Proceedings ArticleDOI
Joint Pose and Expression Modeling for Facial Expression Recognition
TL;DR: An end-to-end deep learning model by exploiting different poses and expressions jointly for simultaneous facial image synthesis and pose-invariant facial expression recognition, based on generative adversarial network (GAN).
References
More filters
Proceedings Article
Adam: A Method for Stochastic Optimization
Diederik P. Kingma,Jimmy Ba +1 more
TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Book ChapterDOI
Visualizing and Understanding Convolutional Networks
Matthew D. Zeiler,Rob Fergus +1 more
TL;DR: A novel visualization technique is introduced that gives insight into the function of intermediate feature layers and the operation of the classifier in large Convolutional Network models, used in a diagnostic role to find model architectures that outperform Krizhevsky et al on the ImageNet classification benchmark.
Journal ArticleDOI
A feature-integration theory of attention
Anne Treisman,Garry A. Gelade +1 more
TL;DR: A new hypothesis about the role of focused attention is proposed, which offers a new set of criteria for distinguishing separable from integral features and a new rationale for predicting which tasks will show attention limits and which will not.
Journal ArticleDOI
A model of saliency-based visual attention for rapid scene analysis
TL;DR: In this article, a visual attention system inspired by the behavior and the neuronal architecture of the early primate visual system is presented, where multiscale image features are combined into a single topographical saliency map.
A model of saliency-based visual attention for rapid scene analysis
TL;DR: A visual attention system, inspired by the behavior and the neuronal architecture of the early primate visual system, is presented, which breaks down the complex problem of scene understanding by rapidly selecting conspicuous locations to be analyzed in detail.