scispace - formally typeset
Proceedings ArticleDOI

Understanding Low- and High-Level Contributions to Fixation Prediction

Reads0
Chats0
TLDR
Comparing different features within the same powerful readout architecture allows to better understand the relevance of low- versus high-level features in predicting fixation locations, while simultaneously achieving state-of-the-art saliency prediction.
Abstract
Understanding where people look in images is an important problem in computer vision. Despite significant research, it remains unclear to what extent human fixations can be predicted by low-level (contrast) compared to highlevel (presence of objects) image features. Here we address this problem by introducing two novel models that use different feature spaces but the same readout architecture. The first model predicts human fixations based on deep neural network features trained on object recognition. This model sets a new state-of-the art in fixation prediction by achieving top performance in area under the curve metrics on the MIT300 hold-out benchmark (AUC = 88%, sAUC = 77%, NSS = 2.34). The second model uses purely low-level (isotropic contrast) features. This model achieves better performance than all models not using features pretrained on object recognition, making it a strong baseline to assess the utility of high-level features. We then evaluate and visualize which fixations are better explained by lowlevel compared to high-level image features. Surprisingly we find that a substantial proportion of fixations are better explained by the simple low-level model than the stateof- the-art model. Comparing different features within the same powerful readout architecture allows us to better understand the relevance of low- versus high-level features in predicting fixation locations, while simultaneously achieving state-of-the-art saliency prediction.

read more

Citations
More filters
Proceedings ArticleDOI

Saliency Map Extraction in Human Crowd RGB Data

TL;DR: This work proposes a novel convolutional neural network based method for saliency prediction of regions which attracts human visual attention in human crowded scene and overperformed state-of-the-art methods on the saliency in human crowd Eyecrowd dataset.
Posted Content

Visual Attention: Deep Rare Features

TL;DR: Contribution-DeepRare2019 (DR) uses the power of DNNs feature extraction and the genericity of feature-engineered algorithms to provide accurate visual attention prediction in any situation.
Journal ArticleDOI

A Novel Lightweight Audio-visual Saliency Model for Videos

TL;DR: Wang et al. as discussed by the authors proposed a lightweight audio-visual saliency (LAVS) model for video sequences, which utilized audio cues for an efficient deep-learning model for the video saliency estimation.
Posted Content

Deep Saliency Prior for Reducing Visual Distraction.

TL;DR: In this article, a saliency model is used to parameterize a differentiable editing operator, such that the saliency within the masked region is reduced. And the resulting effects are consistent with cognitive research on the human visual system (e.g., since color mismatch is salient, the recoloring operator learns to harmonize objects' colors with their surrounding to reduce their saliency).
Posted Content

TranSalNet: Visual saliency prediction using transformers.

TL;DR: Zhang et al. as discussed by the authors proposed a novel saliency model integrating transformer components to CNNs to capture the long-range contextual information, and the proposed model achieves promising results in predicting saliency.
References
More filters
Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Proceedings ArticleDOI

ImageNet: A large-scale hierarchical image database

TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
Proceedings ArticleDOI

Going deeper with convolutions

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
Posted Content

Caffe: Convolutional Architecture for Fast Feature Embedding

TL;DR: Caffe as discussed by the authors is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.
Related Papers (5)