Saliency Detection for Stereoscopic Images
Summary (4 min read)
Saliency Detection for Stereoscopic Images
- In these applications, the salient regions extracted from saliency detection models are processed specifically since they attract much more humans' attention compared with other regions.
- To achieve the depth perception, binocular depth cues (such as binocular disparity) are introduced and merged together with others (such as monocular disparity) in an adaptive way based on the viewing space conditions.
- The features of color, luminance, texture and depth are extracted from DCT (Discrete Cosine Transform) coefficients of image patches.
- Existing 3D saliency detection models usually adopt depth information to weight the traditional 2D saliency map [19] , [20] , or combine the depth saliency map and the traditional 2D saliency map simply [21] , [23] to obtain the saliency map of 3D images.
III. THE PROPOSED MODEL
- Firstly, the color, luminance, texture, and depth features are extracted from the input stereoscopic image.
- Based on these features, the feature contrast is calculated for the feature map calculation.
- A fusion method is designed to combine the feature maps into the saliency map.
- Additionally, the authors use the centre bias factor and a model of human visual acuity to enhance the saliency map based on the characteristics of the HVS.
- The authors will describe each step in detail in the following subsections.
A. Feature Extraction
- The input image is divided into small image patches and then the DCT coefficients are adopted to represent the energy for each image patch.
- The input RGB image is converted to YCbCr color space due to its perceptual property.
- As there is little energy with the high-frequency coefficients in the right-bottom corner of the DCT block, the authors just use several first AC coefficients to represent the texture feature of image patches.
- The depth map M of perceived depth information is computed based on the disparity as [23] : EQUATION ) where V represents the viewing distance of the observer; d denotes the interocular distance; P is the disparity between pixels; W and H represent the width (in cm) and horizontal resolution of the display screen, respectively.
- The authors will introduce how to calculate the feature map based on these extracted features in the next subsection.
B. Feature Map Calculation
- As the authors have explained before, salient regions in visual scenes pop out due to their feature contrast from their surrounding regions.
- The authors estimate the saliency value of each image patch based on the feature contrast between this image path and all the other patches in the image.
- For any image patch i , its saliency value is calculated based on the center-surround differences between this patch and all other patches in the image.
- The weighting for the center-surround differences is determined by the spatial distances (within the Gaussian model) between image patches.
- Since the color, luminance and depth features are represented by one DC coefficient for each image patch, the feature contrast from these features (luminance, color and depth) between two image patches i and j can be calculated as the difference between two DC coefficients of two corresponding image patches as follows.
C. Saliency Estimation from Feature Map Fusion
- After calculating feature maps indicated in Eq. ( 2), the authors fuse these feature maps from color, luminance, texture and depth to compute the final saliency map.
- It is well accepted that different visual dimensions in natural scenes are competing with each other during the combination for the final saliency map [40] , [41] .
- During the fusion of different feature maps, the authors can assign more weighting for those feature maps with small and compact salient regions and less weighting for others with more spread salient regions.
- Here, the authors define the measure of compactness by the spatial variance of feature maps.
- Experimental results in the next section show that the proposed fusion method can obtain promising performance.
D. Saliency Enhancement
- Eye tracking experiments from existing studies have shown that the bias towards the screen center exists during human fixation, which is called centre bias [43] , [44] .
- The study [44] also shows that the center-bias exists during the human fixation.
- The density of the cone photoreceptor cells becomes lower with larger retinal eccentricity.
- The visual acuity decreases with the increased eccentricity from the fixation point [36] , [38] .
- With the enhancement operation by the centre bias factor, the saliency values of center regions in images would increase, while with the enhancement operation by human visual acuity, the saliency values of non-salient regions in natural scenes would decrease and the saliency map would get visually better.
IV. EXPERIMENT EVALUATION
- The authors conduct the experiments to demonstrate the performance of the proposed 3D saliency detection model.
- The authors first present the evaluation methodology and quantitative evaluation metrics.
- Following this, the performance comparison between different feature maps is given in subsection IV-B.
- In Subsection IV-C, the authors provide the performance evaluation between the proposed method with other existing ones.
A. Evaluation Methodology
- In the experiment, the authors adopt the eye tracking database [29] proposed in the study [23] to evaluate the performance of the proposed model.
- This database includes 18 stereoscopic images with various types such as outdoor scenes, indoor scenes, scenes including objects, scenes without any various object, etc.
- DOF is normally associated with free vision in the real applications, where objects actually exist at different distances from observers.
- The data was collected by SMI RED 500 remote eye-tracker and a chin-rest was used to stabilize the observer's head.
- The PLCC (Pearson Linear Correlation Coefficient), KLD (Kullback-Leibler Divergence), and AUC (Area Under the Receiver Operating Characteristics Curve) are used to evaluate the quantitative performance of the proposed stereoscopic saliency detection model.
B. Experiment 1: Comparison Between Different Feature Channels
- The authors compare the performance of different feature maps from color, luminance, texture and depth.
- Table I provides the quantitative comparison results for these feature maps.
- Its KLD value is also higher than those from other features.
- From this figure, the authors can see that the feature maps from color, luminance and depth are better than those from texture feature.
- The overall saliency map by combining feature maps can obtain the best saliency estimation, as shown in Fig. 5(g) .
C. Experiment 2: Comparison Between the Proposed Method and Other Existing Ones
- The authors compare the proposed 3D saliency detection model with other existing ones in [23] .
- For the saliency results from the fusion model by combing 2D saliency model in [3] and depth saliency in [23] , some background regions are detected as salient regions in images, as shown in saliency maps from Fig. 6(d) .
- Similarly, the 3D model by combing the proposed 2D and the DSM in [23] can get better performance than others by combing other 2D models and the DSM in [23] .
- That database includes 600 stereoscopic images including indoor and outdoor scenes.
- Please note that the AUC and CC values of other existing models are from the original paper [26] .
V. CONCLUSION
- The authors propose a new stereoscopic saliency detection model for 3D images.
- The features of color, luminance, texture and depth are extracted from DCT coefficients to represent the energy for small image patches.
- The saliency is estimated based on the energy contrast weighted by a Gaussian model of spatial distances between image patches for the consideration of both local and global contrast.
- A new fusion method is designed to combine the feature maps for the final saliency map.
- Experimental results show the promising performance of the proposed saliency detection model for stereoscopic images based on the recent eye tracking databases.
Did you find this useful? Give us your feedback
Citations
385 citations
364 citations
Cites methods from "Saliency Detection for Stereoscopic..."
...Linear weighted fusion including summation (SUM) [8], [25] and multiplication (MUL) [7] are two of the simplest and most widely used fusion methods....
[...]
328 citations
310 citations
303 citations
Cites methods from "Saliency Detection for Stereoscopic..."
...Result fusion methods include summation [35, 42], multiplication [31] and designed rules [33]....
[...]
References
11,452 citations
"Saliency Detection for Stereoscopic..." refers background or methods in this paper
...According to the Feature Integration Theory (FIT) [13], the early selective attention causes some image regions to be salient due to their different features (such as color, intensity, texture, depth, etc....
[...]
...According to FIT [13], the salient regions in visual scenes will pop out due to their feature contrast from their surrounding regions....
[...]
...Based on the FIT, many bottom-up saliency detection models have been proposed for 2D images/videos recently [1]-[6]....
[...]
...According to the Feature Integration Theory (FIT) [13], the early selective attention causes some image regions to be salient due to their different features (such as color, intensity, texture, depth, etc.) from their surrounding regions....
[...]
10,525 citations
8,566 citations
"Saliency Detection for Stereoscopic..." refers background or methods in this paper
...proposed one of the earliest computational saliency detection model based on the neuronal architecture of the primates’ early visual system [1]....
[...]
...Bottom-up approach, which is data-driven and task-independent, is a perception process for automatic salient region selection for natural scenes [1]-[7], while topdown approach is a task-dependent cognitive processing affected by the performed tasks, feature distribution of targets, and so on [8]-[9]....
[...]
...Based on the FIT, many bottom-up saliency detection models have been proposed for 2D images/videos recently [1]-[6]....
[...]
...In Table 1, Model 1 in [23] represents the fusion method from 2D saliency detection model in [1] and depth model in [23]; Table 1....
[...]
...In contrast, the existing models in [23] which incorporate the 2D saliency methods [1, 2, 3] are designed for only bottom-up mechanis-...
[...]
3,653 citations
"Saliency Detection for Stereoscopic..." refers background in this paper
...explored the saliency analysis for stereoscopic images by extending a 2D image saliency detection model [25]....
[...]
3,475 citations
"Saliency Detection for Stereoscopic..." refers background or methods in this paper
...extended Itti’s model by using a more accurate measure of dissimilarity [2]....
[...]
...GBVS [2], AIM [3], FT [4], ICL [47], LSK [48],...
[...]
Related Papers (5)
Frequently Asked Questions (10)
Q2. What is the effect of the enhancement on the saliency map?
With the enhancement operation by the centre bias factor, the saliency values of center regions in images would increase, while with the enhancement operation by human visual acuity, the saliency values of non-salient regions in natural scenes would decrease and the saliency map would get visually better.
Q3. How many low-frequency AC coefficients are used to normalize the feature contrast?
Since texture feature is represented as 9 low-frequency AC coefficients, the authors calculate the feature contrast from texture by the L2 norm.
Q4. How to build a 3D saliency detection model?
Another method of 3D saliency detection model is built by incorporating depth saliency map into the traditional 2D saliency detection methods.
Q5. How does the DC coefficient of patches in depth map work?
Similar with feature extraction for color and luminance, the authors adopt the DC coefficients of patches in depth map calculated in Eq. (1) as D = MDC (MDC represents the DC coefficient of the image patch in depth map M).
Q6. How many low-frequency AC coefficients are used to represent the texture feature of an image patch?
Based on the study [35], the authors use the first 9 low-frequency AC coefficients to represent the texture feature for each image patch as T = {YAC1, YAC2, . . . , YAC9}.
Q7. What are the measures used to evaluate the performance of the proposed stereoscopic saliency?
Among these measures, PLCC andKLD are calculated directly from the comparison between the fixation density map and the predicted saliency map, while AUC is computed from the comparison between the actual gaze points and the predicted saliency map.
Q8. What are the performance values of the proposed saliency map?
Compared with feature maps from these low-level features of color, luminance, texture and depth, the final saliency map calculated from the proposed fusion method can get much better performance for saliency estimation for 3D images, as shown by the PLCC, KLD and AUC values in Table I.
Q9. What is the retina eccentricity between the salient pixel and nonsalient pixel?
The retina eccentricity e between the salient pixel and nonsalient pixel can be computed according to its relationship with spatial distance between image pixels.
Q10. What is the proposed saliency detection model?
The proposed 3D saliency detection model can obtain promising performance for saliency estimation for 3D images, as shown in the experiment section.