Background Prior-Based Salient Object Detection via Deep Reconstruction Residual
Summary (4 min read)
Introduction
- A few recent approaches tried to learn better representations from natural scenes for saliency detection by using independent component analysis (ICA) [8], sparse coding [9, 10], and low-rank matrix recovery [11].
- To be specific, in [15] and [16] the global contrast is derived in the frequency domain with the hypothesis that salient regions are normally less frequent.
- They represent the image as a close-loop graph with superpixels as nodes.
- Fig. 2 illustrates the workflow of the proposed framework.
II. THE PROPOSED APPROACH
- The authors discuss the proposed method for salient object detection in details.
- It includes three subsections, which in turn introduce SDAE, the proposed salient detection framework, and two useful post-processing steps, respectively.
A. Stacked Denoising Autoencoder (SDAE)
- Autoencoders are simple learning neural networks which aim to transform inputs into outputs with the least possible amount of distortion for learning latent patterns of the given data.
- Specifically, it includes an encoding process and a decoding process.
- Usually, training a DAE is straightforward, where the back-propagation algorithm can be used to compute the gradient of the objective function [26, 27], and the same target activation function can be used in all the layers when training SDAE.
B. Saliency Detection via Deep Reconstruction Residual
- As the authors mentioned in Section I, local and global contrast-based methods lack the ability to precisely compute the contrast between foreground objects and the background.
- The authors follow the basic rule of photographic composition and assume that the image boundary is mostly background.
- Specifically, the authors separately define four boundaries for each image as shown in side-specific SDAE training of Fig.
- Finally, the four residual maps are linearly combined to generate the saliency map R S . =R top bottom left rightS R R R R+ + + /4 (12).
C. Post Processing
- As discussed above, the authors compute saliency map R S at five different image scales to account for scale changes in salient objects.
- To integrate salient regions in different scales, the authors use the average value of the five single scale saliency maps to generate the multi-scale integrated saliency map R S .
- To further refine the results, two post-processing steps are adopted on the basis of the image organization priors and the region property as presented in details below.
1) Image organization refinement
- According to the visual organization rules in [33], these cases can be refined by considering the visual contextual effect.
- In the first component, as suggested by [34], which states that the salient pixels tend to group together, as they typically correspond to real objects in the scene, the authors propose to use a self-adaptive threshold ( ) R t = mean S to obtain the salient cluster firstly.
- In the second component, to deal with the case where highlighted regions omit a bit of real foreground, the authors follow [35] to include the immediate context by weighting the saliency value of each pixel based on their distance to the high salient pixel locations.
- To encode immediate context information, high salient pixel locations = R S tΦ > are found and the saliency value at all pixel locations are weighted by their distance to Φ .
2) Region smoothing
- In order to highlight the entire salient object uniformly and recover more edge information, inspired by [35], the authors refine the saliency of each pixel using the region information.
- Specifically, a graph based segmentation algorithm [36] is used to decompose the image into a number of small regions and the final saliency of each region is calculated by the average saliency value of all the pixels within it.
- Examples of region smoothing results are shown in the fifth column of Fig. 4.
III. EXPERIMENTS
- To evaluate the performance of the proposed salient object detection framework, the authors compared it with 9 state-of-the-art approaches, which have been published within last 3 years and in top journals or conferences.
- To obtain the performance of these 9 methods, the authors adopted either the author-provided implementations or author-provided saliency maps.
- To their best knowledge, this dataset is one of the largest test sets for salient object detection whose ground truth is in the form of manually labeled accurate object contours instead of rough bounding boxes.
- It can be observed that, compared with PD, GBMR, GS-S, GS-G, BLSM, and CNTX, the proposed method can highlight salient region more uniformly.
A. Evaluation Metrics
- By following previous works of [9, 12, 15, 16, 34, 41-43], four metrics are adopted in their experiments to quantitatively measure the performance of saliency map, which include the receiver operating characteristic (ROC) curve, area under the ROC curve (AUC), precision recall (PR) curve, and the average precision (AP).
- Observing the Gaussian-like distributions of the saliency value in the proposed saliency maps, an adaptive threshold T = +µ σ as suggested in [44] is used to segment the saliency maps.
- For each segmented foreground binary map T SF under the adaptive threshold T , the authors follow [51] to evaluate it by using the weighted F-measure.
- In order to take into consideration both the dependency between pixels and the location of the errors, a weighting function is applied to the errors as = ( ) w E min E,E ⋅A Β .
- Then, the weighted true positive w TP , the weighted false positive w FP and the weighted false negative w FN can be calculated by 1051-8215 (c) 2013 IEEE.
B. Parameters Analysis and Model Evaluation
- The authors analyze the effect of a few key parameters in the proposed model on performance.
- Here the authors conducted the evaluation on the SOD and SED datasets.
- Some examples of the experimental results obtained under different β are also given in Fig.
- From the second and the third column of Fig. 7, the authors can see that for the images with clustered background, the sparsity is an essential element for suppressing the saliency of the background regions.
- Similar phenomenon is also discovered in [48, 49].
C. Evaluations on the ASD D
- The authors conducted quantitative comparisons on the ASD dataset -of-the-art methods on the ASD dataset.
- > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE using ROC, PR, AUC, AP, and weighted performance metric.
- From the ROC that the proposed method achieves the highest true positive rate when the false positive rate is between about 0.05 and 1. result, the proposed method outperforms other 9 algorithms in terms of ROC and AUC.
- The statistics results can reflect the distributions of the true salient pixels and true background pixels on the calculated saliency value.
D. Evaluations on the SOD D
- The authors also conducted the comparisons on the more challenging -of-the-art methods on the SOD dataset.
- All the comparison results, including ROC, AUC, PR, AP, and weighted F-measure, are shown in Figs. 13-15.
- From Fig. 15, it is observed that the proposed approach can achieve the highest weighted F-measure also shows that the weighted recall values of most of the state-of-the-art are less than 0.6 whereas the proposed approach can achieve the highest weighted recall value that is around 0.64, which indicates the proposed method tends t the entire salient objects.
- For the foreground distribution and background distribution, similar observations can be found in comparison of results obtained from different approaches.
- As shown in Fig. 16, the distributions on the SOD dataset tend to worse obviously.
E. Evaluations on the SED dataset
- The proposed approach was also tested on the SED database, another challenging dataset.
- As GS-S and GS provided their codes and their results on this dataset, the authors are -CLICK HERE TO EDIT) < score.
- More encouragingly, compared with other state the proposed method has achieved the higher t in the whole ROC curve, and the higher precision values along almost the whole PR curve as well.
- Similar to the SOD dataset, SED dataset also contains a large number of images with complicated content and multiple salient objects.
- The experimental results show that the proposed algorithm has more powerful capability to handle these to.
F. Running time
- Table II lists the average execution time in processing an image of size 400×300 by using different approaches.
- For the implementation of the proposed method, the authors used the parallel computing toolbox of MATLAB and executed the code on the NVIDIA GPU named GeForce GTX Titan Black.
- For other state-of-the-art approaches, the authors used the source codes provided by their authors.
- The authors did not compare with GS because the corresponding codes have not been released by the authors.
- As can be seen, the proposed algorithm has the moderate computational complexity.
IV. CONCLUSION
- The authors have proposed a bottom detection framework based on the background prior.
- > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 13 approaches is twofold.
- First, instead of using traditional hand-designed features, the proposed algorithm adopted SDAE with deep structures to learn more powerful representations for saliency computation.
- For the further work, the authors tend to extend the proposed work in the following directions.
Did you find this useful? Give us your feedback
Citations
1,424 citations
1,370 citations
770 citations
Cites background or methods from "Background Prior-Based Salient Obje..."
...Background prior [9-11] hypothesizes that regions near image boundaries are probably backgrounds....
[...]
..., superpixels used in [9-11, 16-19] and object proposals used in [14]) either as the basic computational units to predict saliency or as the post-processing methods to smooth saliency maps....
[...]
767 citations
564 citations
Cites background from "Background Prior-Based Salient Obje..."
...One of the earliest pioneering works is [45], where Han et al....
[...]
References
16,717 citations
11,201 citations
"Background Prior-Based Salient Obje..." refers background in this paper
...These deep architectures have been shown to lead to state-of-the-art results on a number of classification and regression problems [24]....
[...]
...As a form of neural network, the classical autoencoder [24] is an unsupervised learning algorithm that applies backpropagation and sets the target values of the network outputs to be equal to the inputs....
[...]
...According to [24] and [25], x̃i = q D(x̃i|xi) is implemented by randomly selecting a fraction (10% in this paper) of the input data and forcing them to be zero....
[...]
10,525 citations
8,566 citations
"Background Prior-Based Salient Obje..." refers background or methods in this paper
...(b) Results from one local contrast method [5]....
[...]
...the center-surround difference [5], [6], [12], [13], incremental coding length [10], and self-resemblance [14]....
[...]
...[5] proposed three biological plausible features including color, intensity, and orientation....
[...]
5,791 citations
"Background Prior-Based Salient Obje..." refers methods in this paper
...Specifically, a graph-based segmentation algorithm [36] is used to decompose the image into a number...
[...]
Related Papers (5)
Frequently Asked Questions (15)
Q2. What are the future works in this paper?
For the further work, the authors tend to extend the proposed work in the following directions. Second, the proposed method can be extended to saliency detection in dynamic videos and many other applications such as image retrieval, image categorization, and image collection visualization.
Q3. What is the definition of the autoencoder?
As a form of neural network, the classical autoencoder [24] is an unsupervised learning algorithm that applies back-propagation and sets the target values of the network outputs to be equal to the inputs.
Q4. How is the training of a DAE?
training a DAE is straightforward, where the back-propagation algorithm can be used to compute the gradient of the objective function [26, 27], and the same target activation function can be used in all the layers when training SDAE.
Q5. What is the weighting function for the error?
In order to take into consideration both the dependency between pixels and the location of the errors, a weighting function is applied to the errors as= ( ) w E min E,E ⋅A Β .
Q6. What is the method for whitening the deep reconstruction residuals?
After normalization, the deep reconstruction residual maptopR , bottomR , leftR , and rightR are obtained based on the SDAEmodels for the top, bottom, left and right image boundary subsets, respectively.
Q7. How many foreground patches are used in the training process?
For the small number of foreground patches, the learning process of SDAE could decrease their influence by minimizing the objective function with the reconstruction error term when modeling the background.
Q8. What is the definition of an autoencoder?
Autoencoders are simple learning neural networks which aim to transform inputs into outputs with the least possible amount of distortion for learning latent patterns of the given data.
Q9. How can the authors extend the proposed method to other applications?
the proposed method can be extended to saliency detection in dynamic videos and many other applications such as image retrieval, image categorization, and image collection visualization.
Q10. What is the proposed method for calculating residual of SDAE?
the proposed work casted separation of salient objects from the background as a problem of calculating reconstruction residual of SDAE.
Q11. What is the effect of the sparsity constraint on the detection of the background regions?
if the sparsity constraint is set too big, it normally leads to less stable and discontinuous detection results (as shown in the forth column of Fig. 7).
Q12. What is the description of the salient object detection framework?
To their best knowledge, this dataset is one of the largest test sets for salient object detection whose ground truth is in the form of manually labeled accurate object contours instead of rough bounding boxes.
Q13. What is the effect of the proposed method on the saliency map?
The subjective evaluations by comparing with the ground truth suggest that the proposed method can yield saliency maps correctly and robustly in all three datasets.
Q14. What is the method for achieving the highest weighted recall value?
From Fig. 15, it is observed that the proposed approach can achieve the highest weighted F-measure also shows that the weighted recall values of most of the state-of-the-art are less than 0.6 whereas the proposed approach can achieve the highest weighted recall value that is around 0.64, which indicates the proposed method tends t the entire salient objects.
Q15. What is the weighted f-measure for the foreground?
As defined in [51], the matrix A captures the dependency between foreground pixels based on the Euclidean distance and the matrix Β assigns importance weights to false detections according to their distance from theforeground.