Proceedings ArticleDOI
Understanding Low- and High-Level Contributions to Fixation Prediction
Matthias Kümmerer,Thomas S. A. Wallis,Leon A. Gatys,Matthias Bethge +3 more
- pp 4799-4808
Reads0
Chats0
TLDR
Comparing different features within the same powerful readout architecture allows to better understand the relevance of low- versus high-level features in predicting fixation locations, while simultaneously achieving state-of-the-art saliency prediction.Abstract:
Understanding where people look in images is an important problem in computer vision. Despite significant research, it remains unclear to what extent human fixations can be predicted by low-level (contrast) compared to highlevel (presence of objects) image features. Here we address this problem by introducing two novel models that use different feature spaces but the same readout architecture. The first model predicts human fixations based on deep neural network features trained on object recognition. This model sets a new state-of-the art in fixation prediction by achieving top performance in area under the curve metrics on the MIT300 hold-out benchmark (AUC = 88%, sAUC = 77%, NSS = 2.34). The second model uses purely low-level (isotropic contrast) features. This model achieves better performance than all models not using features pretrained on object recognition, making it a strong baseline to assess the utility of high-level features. We then evaluate and visualize which fixations are better explained by lowlevel compared to high-level image features. Surprisingly we find that a substantial proportion of fixations are better explained by the simple low-level model than the stateof- the-art model. Comparing different features within the same powerful readout architecture allows us to better understand the relevance of low- versus high-level features in predicting fixation locations, while simultaneously achieving state-of-the-art saliency prediction.read more
Citations
More filters
Proceedings ArticleDOI
Saliency Map Extraction in Human Crowd RGB Data
TL;DR: This work proposes a novel convolutional neural network based method for saliency prediction of regions which attracts human visual attention in human crowded scene and overperformed state-of-the-art methods on the saliency in human crowd Eyecrowd dataset.
Posted Content
Visual Attention: Deep Rare Features
TL;DR: Contribution-DeepRare2019 (DR) uses the power of DNNs feature extraction and the genericity of feature-engineered algorithms to provide accurate visual attention prediction in any situation.
Journal ArticleDOI
A Novel Lightweight Audio-visual Saliency Model for Videos
TL;DR: Wang et al. as discussed by the authors proposed a lightweight audio-visual saliency (LAVS) model for video sequences, which utilized audio cues for an efficient deep-learning model for the video saliency estimation.
Posted Content
Deep Saliency Prior for Reducing Visual Distraction.
Kfir Aberman,Junfeng He,Yossi Gandelsman,Inbar Mosseri,David E. Jacobs,Kai Kohlhoff,Yael Pritch,Michael Rubinstein +7 more
TL;DR: In this article, a saliency model is used to parameterize a differentiable editing operator, such that the saliency within the masked region is reduced. And the resulting effects are consistent with cognitive research on the human visual system (e.g., since color mismatch is salient, the recoloring operator learns to harmonize objects' colors with their surrounding to reduce their saliency).
Posted Content
TranSalNet: Visual saliency prediction using transformers.
TL;DR: Zhang et al. as discussed by the authors proposed a novel saliency model integrating transformer components to CNNs to capture the long-range contextual information, and the proposed model achieves promising results in predicting saliency.
References
More filters
Proceedings Article
ImageNet Classification with Deep Convolutional Neural Networks
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Proceedings Article
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan,Andrew Zisserman +1 more
TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Proceedings ArticleDOI
ImageNet: A large-scale hierarchical image database
TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
Proceedings ArticleDOI
Going deeper with convolutions
Christian Szegedy,Wei Liu,Yangqing Jia,Pierre Sermanet,Scott Reed,Dragomir Anguelov,Dumitru Erhan,Vincent Vanhoucke,Andrew Rabinovich +8 more
TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
Posted Content
Caffe: Convolutional Architecture for Fast Feature Embedding
Yangqing Jia,Evan Shelhamer,Jeff Donahue,Sergey Karayev,Jonathan Long,Ross Girshick,Sergio Guadarrama,Trevor Darrell +7 more
TL;DR: Caffe as discussed by the authors is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.