Analysis of Scores, Datasets, and Models in Visual Saliency Prediction
Summary (2 min read)
1. Introduction
- A large number of models has been proposed for predicting where people look in scenes [1].
- It also ben- efits many engineering applications (e.g., object detection and segmentation, content-aware image re-targeting, image in-painting, visual tracking, image and video compression, crowds analysis and social gaming [2][6][24][37][30], determining the importance of objects in a scene [48, 44], memorability of image regions [49], and object recall [50].
- While being very effective, previous comparisons have not correctly addressed all challenging parameters in model accuracy.
- Here, the authors thoroughly investigate these shortcomings with additional comparison of models over scanpath sequences.
- The authors provide the latest update on saliency modeling, with the most comprehensive set of models, challenges/parameters, datasets, and measures.
2. Basic concepts and definitions
- Here, the authors lay out the ground for the rest of the paper and explain some basic concepts of visual attention.
- Saliency is a property of the perceived visual stimulus (bottom-up (BU)) or, at most, of the features that the visual system extracts from the stimulus (which can be manipulated by top-down (TD) cues).
- A major distinction between mechanisms of visual attention is the bottom-up vs. top-down dissociation.
- Note that these are not exclusive concepts.
3. Analysis of challenges and open problems
- The authors discuss challenges that have emerged as more models have been proposed.
- Some models explicitly (e.g., Judd) or implicitly (e.g., GBVS) have added center-bias (location prior) making fair comparison challenging.
- Two other issues regarding scores are sensitivity to map normalization (a.k.a re-parameterization) and having welldefined bounds (and chance level).
- When using the method in [25] (i.e., saliency from other images but at fixations of the current image), this type of AUC leads to the exact value of 0.5 for the central Gaussian (See Supp.).
4. Saliency benchmark
- The authors resized saliency maps to the size of the original images onto which eye movements have been recorded.
- Over the largest dataset (i.e, MIT), AWS, LG, AIM, Torralba performed better than other models.
- Fig. 4 shows model performance over stimulus categories of the NUSEF dataset for each model and average over all models.
- Algorithm 1 Scanpath evaluation PHASE 1: generating human scanpath and clusters Input: subjects’ fixations.
5. Model comparison over applications
- Since models are already good at predicting fixations, some new measures are necessary to draw distinctions among them.
- Later, the histogram of saccade statistic for an image is computed from all the observers and is L1 normalized.
- Fig. 10 shows performance of individual features in classification.
6. Conclusions and future directions
- The authors comparisons2 show that in general AWS, LG, HouNIPS, Judd, Rarity-G (smoothed version), AIM, and Torralba models performed higher than other models.
- Thus the authors believe it is important to gather larger datasets, especially over new stimulus categories.
- The authors showed that, from statistics of fixations, saccades, and saliency at fixations, it is possible to decode the stimulus category.
- Another promising research direction is designing better saliency evaluation scores which: (1) are able to better distinguish fixated vs. non-fixated locations, and (2) are able to discount confounding parameters such as center-bias.
Did you find this useful? Give us your feedback
Citations
608 citations
Cites methods from "Analysis of Scores, Datasets, and M..."
...tion and salient object segmentation. 2.2 Models in Closely Related areas 2.2.1 Fixation Prediction Models Reviewing all fixation prediction models goes beyond the scope of this paper (See [46], [143]–[145] for reviews and benchmarks of these models). Here we give pointers to the most important trends and works in this domain. Inclusion of these models here is to measure their performance versus salient...
[...]
...t of fixation prediction models considered in this study. All of these models are based on pure low-level mechanisms and have shown to be very efficient in previous fixation prediction benchmarks [144], [145]. 2.2.2 Image Segmentation Models Segmentation is a fundamental problem studied in computer vision and usually adopted as a pre-process step to image analysis. Without any prior knowledge of the conte...
[...]
564 citations
Cites methods from "Analysis of Scores, Datasets, and M..."
...As mentioned by the previous studies [7], [43], and [44], in the branch of bottom-up SOD, approaches are to detect saliency under free viewing, which is automatically determined by the physical characteristics of the scene, while approaches in the other branch are to detect the task-driven saliency determined by the current goals of the observer....
[...]
526 citations
Cites background from "Analysis of Scores, Datasets, and M..."
...The shuffled AUC metric, sAUC [8], [20], [73], [74], [85] samples negatives from fixation locations from other images, instead of uniformly at random....
[...]
...Dozens of computational saliency models are available to choose from [7], [8], [11], [12], [37], but objectively determining which model offers the “best” approximation to human eye fixations remains a challenge....
[...]
...Differences in how saliency and ground truth are represented and which attributes of saliency models should be rewarded/penalized leads to different choices of metrics for reporting performance [8], [12], [42],...
[...]
...[8] compared 32 saliency models with 3 metrics for fixation prediction and additional metrics for scanpath prediction on 4 datasets....
[...]
...Most eye-tracking datasets have been shown to be center biased, containing a larger number of fixations near the image center, across different image types, videos, and even observer tasks [7], [8], [14], [16], [33], [36]....
[...]
503 citations
443 citations
References
11,844 citations
11,727 citations
"Analysis of Scores, Datasets, and M..." refers background in this paper
...The density function can be defined in terms of kernel K(x) with bandwidth h as follows [14] : f̂h,K(x) = ck,d nhd n ∑ i=1 K ( ‖x− xi h ‖(2) ) ....
[...]
11,452 citations
"Analysis of Scores, Datasets, and M..." refers background in this paper
...What is the unit of attention? Do we attend to spatial locations, objects, or features? [5][27] A great deal of neurophysiological and behavioral evidence exists for all three....
[...]
10,525 citations
8,566 citations
Additional excerpts
...In ITTI98, each feature map’s contribution to the saliency map is weighted by the squared difference between the globally most active location and the average activity of all other local maxima in the feature map [3]....
[...]
...[28] - - [32] [3] [33] [20] [6] [9] [10] [24] [7] [21] [11] [13] [13] [12] [8] [25] [23] [15] [18] [22] [2] [16] [14] [19] [17] [26] [34] [47] [38] Year - - - - 98 00 03 05 05 05 06 06 07 07 07 08 08 08 09 09 09 09 09 09 10 10 10 10 10 11 11 12 Code M M C C C C C C M M M S M M M M E M S M M M M M S M E E M 11 11 11 Category O O I I C C B C B/I I C G C S I I I C B C S S I P S G I C B B C I -...
[...]
Related Papers (5)
Frequently Asked Questions (11)
Q2. What future works have the authors mentioned in the paper "Analysis of scores, datasets, and models in visual saliency prediction" ?
The authors found that some stimulus categories are harder for models ( e. g., nature, nude, and portrait ) which warrant more attention in future works. Future directions: In this regard, it will also be interesting to test the feasibility of predicting whether a scene is natural or man-made from saliency and fixations. The authors believe it is important to constantly measure the gap between the IO model and models to find out in which directions models lag behind human performance.
Q3. What are the common types of stimuli used in neurophysiological and modeling works?
Visual stimuli used in neurophysiological and modeling works include: static (synthetic search arrays involving pop-out and conjunction search arrays, cartoons, or photographs) and over spatio-temporal dynamic stimuli (movies and interactive video games).
Q4. What are the two main causes of CB?
Two important causes for CB are: (1) Viewing strategy where subjects start looking from the image center and (2) A perhaps stronger, photographer bias, which is the tendency of photographers to frame interesting objects at the center.
Q5. What is the difficult challenge in the fixation datasets?
A difficult challenge in fixation datasets which has affected fair model comparison is “Center-Bias (CB)”, whereby humans often appear to preferentially look near an image’s center [28].
Q6. Why are there still inconsistencies in the results of previous benchmarks?
But due to a lack of an exhaustive coherent benchmarking system, to address several issues such as evaluation measures (e.g., at least 4 types of AUC measures have been used; supplement), center-bias, map characteristics (e.g., smoothing), and dataset bias, a lot of inconsistencies still exist.
Q7. What is the reason for the lack of models to predict scanpath sequence?
In the context of saliency modeling, few models have aimed to predict scanpath sequence, partly due to difficulty in measuringand quantizing scanpaths.
Q8. How do the authors make a fixation histogram?
Fixation histogram is made by dividing the image into a grid pattern (16 × 16) and counting the number of fixations in each grid.
Q9. What is the way to compute the histograms for a given image?
To compute the histograms for a given image, the authors initially compute corresponding features (e.g., saccade velocity, etc.) for each observer and quantize the values into several bins.
Q10. Why do the authors believe it is important to measure the gap between the IO model and models?
The authors believe it is important to constantly measure the gap between the IO model and models to find out in which directions models lag behind human performance.
Q11. What is the way to tune the parameters in a model?
Properly tuning these parameters is important in fair model comparison and is perhaps best left for a model developer to optimize himself.