scispace - formally typeset
Proceedings ArticleDOI

Understanding Low- and High-Level Contributions to Fixation Prediction

Reads0
Chats0
TLDR
Comparing different features within the same powerful readout architecture allows to better understand the relevance of low- versus high-level features in predicting fixation locations, while simultaneously achieving state-of-the-art saliency prediction.
Abstract
Understanding where people look in images is an important problem in computer vision. Despite significant research, it remains unclear to what extent human fixations can be predicted by low-level (contrast) compared to highlevel (presence of objects) image features. Here we address this problem by introducing two novel models that use different feature spaces but the same readout architecture. The first model predicts human fixations based on deep neural network features trained on object recognition. This model sets a new state-of-the art in fixation prediction by achieving top performance in area under the curve metrics on the MIT300 hold-out benchmark (AUC = 88%, sAUC = 77%, NSS = 2.34). The second model uses purely low-level (isotropic contrast) features. This model achieves better performance than all models not using features pretrained on object recognition, making it a strong baseline to assess the utility of high-level features. We then evaluate and visualize which fixations are better explained by lowlevel compared to high-level image features. Surprisingly we find that a substantial proportion of fixations are better explained by the simple low-level model than the stateof- the-art model. Comparing different features within the same powerful readout architecture allows us to better understand the relevance of low- versus high-level features in predicting fixation locations, while simultaneously achieving state-of-the-art saliency prediction.

read more

Citations
More filters
Journal ArticleDOI

Advertising Image Saliency Prediction Method Based on Score Level Fusion

TL;DR: Zhang et al. as discussed by the authors proposed a saliency prediction algorithm for advertisement images, where two text candidate regions based on intensity feature and improved MESR algorithm are first obtained and further integrated to produce a two-dimensional text confidence score.
Proceedings ArticleDOI

Bayesian inference for an exploration-exploitation model of human gaze control

TL;DR: This work develops a discrete-time probabilistic generative model, with a Markovian structure, where at each step the next fixation location is selected using one of two strategies exploitation or exploration, and implements efficient Bayesian inference for hyperparameter estimation using an HMC within Gibbs approach.
Posted Content

Data augmentation and image understanding.

TL;DR: Data augmentation as mentioned in this paper is a commonly used technique for training artificial neural networks to augment the size of data sets through transformations of the images, since they correspond to the transformations we see in our visual world.
Journal ArticleDOI

Where to look at the movies: Analyzing visual attention to understand movie editing

TL;DR: In this paper , a new eye-tracking database containing gaze-pattern information on movie sequences, as well as editing annotations, was proposed, and state-of-the-art computational saliency techniques behave on this dataset.
Proceedings ArticleDOI

Improving saliency models’ predictions of the next fixation with humans’ intrinsic cost of gaze shifts

TL;DR: In this paper , a sequential decision-making algorithm is proposed to predict the next gaze target by converting a static saliency map into a sequence of dynamic history-dependent value maps, which are recomputed after each gaze shift.
References
More filters
Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Proceedings ArticleDOI

ImageNet: A large-scale hierarchical image database

TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
Proceedings ArticleDOI

Going deeper with convolutions

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
Posted Content

Caffe: Convolutional Architecture for Fast Feature Embedding

TL;DR: Caffe as discussed by the authors is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.
Related Papers (5)