Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model.
TLDR
Zhang et al. as mentioned in this paper proposed a convolutional long short-term memory (LSTM) network to iteratively refine the predicted saliency map by focusing on the most salient regions of the input image.Abstract:
Data-driven saliency has recently gained a lot of attention thanks to the use of convolutional neural networks for predicting gaze fixations. In this paper, we go beyond standard approaches to saliency prediction, in which gaze maps are computed with a feed-forward network, and present a novel model which can predict accurate saliency maps by incorporating neural attentive mechanisms. The core of our solution is a convolutional long short-term memory that focuses on the most salient regions of the input image to iteratively refine the predicted saliency map. In addition, to tackle the center bias typical of human eye fixations, our model can learn a set of prior maps generated with Gaussian functions. We show, through an extensive evaluation, that the proposed architecture outperforms the current state-of-the-art on public saliency prediction datasets. We further study the contribution of each key component to demonstrate their robustness on different scenarios.read more
Citations
More filters
Proceedings ArticleDOI
Salient Object Detection Driven by Fixation Prediction
TL;DR: A novel neural network called Attentive Saliency Network (ASNet) is built that learns to detect salient objects from fixation maps that offers an efficient recurrent mechanism for sequential refinement of the segmentation map.
Journal ArticleDOI
Predicting the Driver's Focus of Attention: The DR(eye)VE Project
TL;DR: In this article, a model based on a multi-branch deep architecture was proposed to predict the driver's focus of attention while driving, which part of the scene around the vehicle is more critical for the task.
Proceedings Article
How much Position Information Do Convolutional Neural Networks Encode
TL;DR: In this paper, a comprehensive set of experiments show the validity of this hypothesis and shed light on how and where this information is represented while offering clues to where positional information is derived from in deep CNNs.
Posted Content
Faster gaze prediction with dense networks and Fisher pruning
TL;DR: Through a combination of knowledge distillation and Fisher pruning, this paper obtains much more runtime-efficient architectures for saliency prediction, achieving a 10x speedup for the same AUC performance as a state of the art network on the CAT2000 dataset.
Journal ArticleDOI
Contextual encoder-decoder network for visual saliency prediction.
TL;DR: This work proposes an approach based on a convolutional neural network pre-trained on a large-scale image classification task that achieves competitive and consistent results across multiple evaluation metrics on two public saliency benchmarks and demonstrates the effectiveness of the suggested approach on five datasets and selected examples.
References
More filters
Proceedings ArticleDOI
Deep Residual Learning for Image Recognition
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings Article
ImageNet Classification with Deep Convolutional Neural Networks
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Journal ArticleDOI
Long short-term memory
TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Proceedings Article
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan,Andrew Zisserman +1 more
TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Proceedings ArticleDOI
Going deeper with convolutions
Christian Szegedy,Wei Liu,Yangqing Jia,Pierre Sermanet,Scott Reed,Dragomir Anguelov,Dumitru Erhan,Vincent Vanhoucke,Andrew Rabinovich +8 more
TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).