Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model.

doi:10.1109/TIP.2018.2851672

Open AccessJournal ArticleDOI

Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model.

Marcella Cornia, +3 more

- 29 Jun 2018 -

IEEE Transactions on Image Processing

- Vol. 27, Iss: 10, pp 5142-5154

TLDR

Zhang et al. as mentioned in this paper proposed a convolutional long short-term memory (LSTM) network to iteratively refine the predicted saliency map by focusing on the most salient regions of the input image.

Abstract:

Data-driven saliency has recently gained a lot of attention thanks to the use of convolutional neural networks for predicting gaze fixations. In this paper, we go beyond standard approaches to saliency prediction, in which gaze maps are computed with a feed-forward network, and present a novel model which can predict accurate saliency maps by incorporating neural attentive mechanisms. The core of our solution is a convolutional long short-term memory that focuses on the most salient regions of the input image to iteratively refine the predicted saliency map. In addition, to tackle the center bias typical of human eye fixations, our model can learn a set of prior maps generated with Gaussian functions. We show, through an extensive evaluation, that the proposed architecture outperforms the current state-of-the-art on public saliency prediction datasets. We further study the contribution of each key component to demonstrate their robustness on different scenarios.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Salient Object Detection Driven by Fixation Prediction

Wenguan Wang, +3 more

TL;DR: A novel neural network called Attentive Saliency Network (ASNet) is built that learns to detect salient objects from fixation maps that offers an efficient recurrent mechanism for sequential refinement of the segmentation map.

...read moreread less

Journal ArticleDOI

Predicting the Driver's Focus of Attention: The DR(eye)VE Project

Andrea Palazzi, +4 more

- 01 Jul 2019 -

IEEE Transactions on Pattern Analysis an...

TL;DR: In this article, a model based on a multi-branch deep architecture was proposed to predict the driver's focus of attention while driving, which part of the scene around the vehicle is more critical for the task.

...read moreread less

Proceedings Article

How much Position Information Do Convolutional Neural Networks Encode

Amirul Islam, +2 more

TL;DR: In this paper, a comprehensive set of experiments show the validity of this hypothesis and shed light on how and where this information is represented while offering clues to where positional information is derived from in deep CNNs.

...read moreread less

Posted Content

Faster gaze prediction with dense networks and Fisher pruning

Lucas Theis, +3 more

- 17 Jan 2018 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: Through a combination of knowledge distillation and Fisher pruning, this paper obtains much more runtime-efficient architectures for saliency prediction, achieving a 10x speedup for the same AUC performance as a state of the art network on the CAT2000 dataset.

...read moreread less

Journal ArticleDOI

Contextual encoder-decoder network for visual saliency prediction.

Alexander Kroner, +4 more

- 08 May 2020 -

Neural Networks

TL;DR: This work proposes an approach based on a convolutional neural network pre-trained on a large-scale image classification task that achieves competitive and consistent results across multiple evaluation metrics on two public saliency benchmarks and demonstrates the effectiveness of the suggested approach on five datasets and selected examples.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Journal ArticleDOI

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997 -

Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.

...read moreread less

Proceedings ArticleDOI

Going deeper with convolutions

Christian Szegedy, +8 more

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).

...read moreread less

Collapse

Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model.

Citations

Salient Object Detection Driven by Fixation Prediction

Predicting the Driver's Focus of Attention: The DR(eye)VE Project

How much Position Information Do Convolutional Neural Networks Encode

Faster gaze prediction with dense networks and Fisher pruning

Contextual encoder-decoder network for visual saliency prediction.

References

Deep Residual Learning for Image Recognition

ImageNet Classification with Deep Convolutional Neural Networks

Long short-term memory

Very Deep Convolutional Networks for Large-Scale Image Recognition

Going deeper with convolutions

Related Papers (5)

A model of saliency-based visual attention for rapid scene analysis

SALICON: Saliency in Context

Learning to predict where humans look

Graph-Based Visual Saliency

SALICON: Reducing the Semantic Gap in Saliency Prediction by Adapting Deep Neural Networks