Proceedings ArticleDOI
Understanding Low- and High-Level Contributions to Fixation Prediction
Matthias Kümmerer,Thomas S. A. Wallis,Leon A. Gatys,Matthias Bethge +3 more
- pp 4799-4808
Reads0
Chats0
TLDR
Comparing different features within the same powerful readout architecture allows to better understand the relevance of low- versus high-level features in predicting fixation locations, while simultaneously achieving state-of-the-art saliency prediction.Abstract:
Understanding where people look in images is an important problem in computer vision. Despite significant research, it remains unclear to what extent human fixations can be predicted by low-level (contrast) compared to highlevel (presence of objects) image features. Here we address this problem by introducing two novel models that use different feature spaces but the same readout architecture. The first model predicts human fixations based on deep neural network features trained on object recognition. This model sets a new state-of-the art in fixation prediction by achieving top performance in area under the curve metrics on the MIT300 hold-out benchmark (AUC = 88%, sAUC = 77%, NSS = 2.34). The second model uses purely low-level (isotropic contrast) features. This model achieves better performance than all models not using features pretrained on object recognition, making it a strong baseline to assess the utility of high-level features. We then evaluate and visualize which fixations are better explained by lowlevel compared to high-level image features. Surprisingly we find that a substantial proportion of fixations are better explained by the simple low-level model than the stateof- the-art model. Comparing different features within the same powerful readout architecture allows us to better understand the relevance of low- versus high-level features in predicting fixation locations, while simultaneously achieving state-of-the-art saliency prediction.read more
Citations
More filters
Journal ArticleDOI
Advertising Image Saliency Prediction Method Based on Score Level Fusion
TL;DR: Zhang et al. as discussed by the authors proposed a saliency prediction algorithm for advertisement images, where two text candidate regions based on intensity feature and improved MESR algorithm are first obtained and further integrated to produce a two-dimensional text confidence score.
Proceedings ArticleDOI
Bayesian inference for an exploration-exploitation model of human gaze control
TL;DR: This work develops a discrete-time probabilistic generative model, with a Markovian structure, where at each step the next fixation location is selected using one of two strategies exploitation or exploration, and implements efficient Bayesian inference for hyperparameter estimation using an HMC within Gibbs approach.
Posted Content
Data augmentation and image understanding.
TL;DR: Data augmentation as mentioned in this paper is a commonly used technique for training artificial neural networks to augment the size of data sets through transformations of the images, since they correspond to the transformations we see in our visual world.
Journal ArticleDOI
Where to look at the movies: Analyzing visual attention to understand movie editing
TL;DR: In this paper , a new eye-tracking database containing gaze-pattern information on movie sequences, as well as editing annotations, was proposed, and state-of-the-art computational saliency techniques behave on this dataset.
Proceedings ArticleDOI
Improving saliency models’ predictions of the next fixation with humans’ intrinsic cost of gaze shifts
TL;DR: In this paper , a sequential decision-making algorithm is proposed to predict the next gaze target by converting a static saliency map into a sequence of dynamic history-dependent value maps, which are recomputed after each gaze shift.
References
More filters
Proceedings Article
ImageNet Classification with Deep Convolutional Neural Networks
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Proceedings Article
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan,Andrew Zisserman +1 more
TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Proceedings ArticleDOI
ImageNet: A large-scale hierarchical image database
TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
Proceedings ArticleDOI
Going deeper with convolutions
Christian Szegedy,Wei Liu,Yangqing Jia,Pierre Sermanet,Scott Reed,Dragomir Anguelov,Dumitru Erhan,Vincent Vanhoucke,Andrew Rabinovich +8 more
TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
Posted Content
Caffe: Convolutional Architecture for Fast Feature Embedding
Yangqing Jia,Evan Shelhamer,Jeff Donahue,Sergey Karayev,Jonathan Long,Ross Girshick,Sergio Guadarrama,Trevor Darrell +7 more
TL;DR: Caffe as discussed by the authors is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.