Deep Future Gaze: Gaze Anticipation on Egocentric Videos Using Adversarial Networks

doi:10.1109/CVPR.2017.377

Proceedings ArticleDOI

Deep Future Gaze: Gaze Anticipation on Egocentric Videos Using Adversarial Networks

Mengmi Zhang, +4 more

- pp 3539-3548

Chats0

TLDR

A new generative adversarial neural network based model, Deep Future Gaze (DFG), which generates multiple future frames conditioned on the single current frame and anticipates corresponding future gazes in next few seconds and achieves better performance of gaze prediction on current frames than state-of-the-art methods.

Abstract:

We introduce a new problem of gaze anticipation on egocentric videos. This substantially extends the conventional gaze prediction problem to future frames by no longer confining it on the current frame. To solve this problem, we propose a new generative adversarial neural network based model, Deep Future Gaze (DFG). DFG generates multiple future frames conditioned on the single current frame and anticipates corresponding future gazes in next few seconds. It consists of two networks: generator and discriminator. The generator uses a two-stream spatial temporal convolution architecture (3D-CNN) explicitly untangling the foreground and the background to generate future frames. It then attaches another 3D-CNN for gaze anticipation based on these synthetic frames. The discriminator plays against the generator by differentiating the synthetic frames of the generator from the real frames. Through competition with discriminator, the generator progressively improves quality of the future frames and thus anticipates future gaze better. Experimental results on the publicly available egocentric datasets show that DFG significantly outperforms all well-established baselines. Moreover, we demonstrate that DFG achieves better performance of gaze prediction on current frames than state-of-the-art methods. This is due to benefiting from learning motion discriminative representations in frame generation. We further contribute a new egocentric dataset (OST) in the object search task. DFG also achieves the best performance for this challenging dataset.

Deep Future Gaze: Gaze Anticipation on Egocentric Videos Using Adversarial Networks

Citations

Generative Adversarial Networks: An Overview

Generative Adversarial Networks: An Overview

A Review on Generative Adversarial Networks: Algorithms, Theory, and Applications

In the Eye of Beholder: Joint Learning of Gaze and Actions in First Person Video

Joint Pose and Expression Modeling for Facial Expression Recognition

References

Adam: A Method for Stochastic Optimization

Visualizing and Understanding Convolutional Networks

A feature-integration theory of attention

A model of saliency-based visual attention for rapid scene analysis

A model of saliency-based visual attention for rapid scene analysis

Related Papers (5)

Deep Residual Learning for Image Recognition

Detecting activities of daily living in first-person camera views

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

Generative Adversarial Nets

A model of saliency-based visual attention for rapid scene analysis