Saliency Detection via Absorbing Markov Chain
SummaryΒ (3 min read)
1. Introduction
- Saliency detection in computer vision aims to find the most informative and interesting region in a scene.
- All bottom-up saliency methods rely on some prior knowledge about salient objects and backgrounds, such as contrast, compactness, etc.
- These models still have some certain limitations.
- Hence, the absorbed time starting from object nodes is longer than that from background nodes.
3.3. Saliency Measure
- Given an input image represented as a Markov chain and some background absorbing states, the saliency of each transient state is defined as the expected number of times before being absorbed into all absorbing nodes by Eq 2.
- Because the authors compute the full resolution saliency map, some virtual nodes are added to the graph as absorbing states, which is detailed in the next section.
- In the conventional absorbing Markov chain problems, the absorbing nodes are manually labelled with the groundtruth.
- As absorbing nodes for saliency detection are selected by the proposed algorithm, some of them may be incorrect.
- They have insignificant effects on the final results, which are explained in the following sections.
4. Graph Construction
- The authors construct a single layer graph πΊ(π, πΈ) with superpixels [3] as nodes π and the links between pairs of nodes as edges πΈ. Because the salient objects seldom occupy all image borders [33] , they duplicate the boundary superpixels around the image borders as the virtual background absorbing nodes, as shown in Figure 2 .
- In addition, the authors enforce that all the transient nodes around the image borders (i.e., boundary nodes) are fully connected with each other, which can reduce the geodesic distance between similar superpixels.
- The authors first renumber the nodes so that the first π‘ nodes are transient nodes and the last π nodes are absorbing nodes, then define the affinity matrix A which represents the reverence of nodes as EQUATION ) where π (π) denotes the nodes connected to node π.
- One is the spatial distance between the two nodes.
- Their distance is larger, and the expected time is longer.
5. Saliency Detection
- Most saliency maps generated by the normalized absorbed time y are effective, but some background nodes near the image center may not be adequately suppressed when they are in long-range homogeneous region, as shown in Figure 3 .
- Most nodes in this kind of background regions have large transition probabilities, which means that the random walk may transfer many times among these nodes before reaching the absorbing nodes.
- The background regions near the image center possibly present comparative saliency with salient objects, thereby decreasing the contrast of objects and backgrounds in the resulted saliency maps.
- The larger π ππππ means that there are longer-range regions with mid-level saliency in the saliency map.
- It should be noted that the absorbing nodes may include object nodes when the salient objects touch the image boundaries, as shown in Figure 4 .
6. Experimental Results
- The authors evaluate the proposed method on four benchmark datasets.
- The second one is the ASD dataset, a subset of the MSRA dataset, which contains 1,000 images with accurate human-labelled ground truth provided by [2] .
- The third one is the SED dataset [28] , which contains: the single object sub-dataset SED1 and two objects sub-dataset SED2.
- This dataset is first used for salient object segmentation evaluation [23] , where seven subjects are asked to label the foreground salient object masks.
- The authors select and combine the object masks whose consistency scores are higher than 0.7 as the final ground truth as done in [33] .
Evaluation Metrics:
- The authors evaluate all methods by precision, recall and F-measure.
- The precision is defined as the ratio of salient pixels correctly assigned to all the pixels of extracted regions.
- The recall is defined as the ratio of detected salient pixels to the ground-truth number.
- The F-measure is the overall performance measurement computed by the weighted harmonic of precision and recall: EQUATION ) Similar as previous works, two evaluation criteria are used in their experiments.
- First, the authors bisegment the saliency map using every threshold in the range [0 : 0.05 : 1], and compute precision and recall at each value of the threshold to plot the precision-recall curve.
ASD:
- The authors evaluate the performance of the proposed method against fifteen state-of-the-art methods.
- The two evaluation criteria consistently show the proposed method outperforms all the other methods, where the CB [17] , SVO [7] , RC [8] and CA [27] are top-performance methods for saliency detection in a recent benchmark study [5] .
- Some visual comparison examples are shown in Figure 9 and more results can be found in the supplementary material.
- The authors note that the proposed method more uniformly highlights the salient regions while adequately suppresses the backgrounds than the other methods.
MSRA:
- This dataset contains the ground truth of salient region marked as bounding boxes by nine subjects.
- The authors accumulate the nine ground truth, and then choose the pixels with consistency score higher than 0.5 as salient region and fit a bounding box in the salient region.
- Figure 7 shows that the proposed method performs better than the other methods on this large dataset.
- Their recalls for adaptive thresholds are quite high and close to 1.
- That is because the background is suppressed badly, the cut saliency map contains almost the entire image with low precision.
SED:
- As shown in Figure 6 , the proposed method performs best on the SED1 dataset, while performs poorly compared with the RC and CB methods at the recall values from 0.7 to 1 on the SED2 dataset.
- That is because their method usually highlights one of two objects while the other has low saliency values due to the appearance diversity of two objects.
SOD:
- On this most challenging dataset, the authors evaluate the performance of the post-process step against the map obtained directly form absorbed time ( noted 'Before' ) and twelve state-of-the-art methods as shown in Figure 7 .
- The two evaluation criteria show the proposed method performs equally well or slightly better than the GS [33] method.
- The authors approach exploits the boundary prior to determine the absorbing nodes, therefore the small salient object touching image boundaries may be incorrectly suppressed.
- Figure 8 shows the typical failure cases.
- The authors compare the execution time of different methods.
7. Conclusion
- Based on the boundary prior, the authors set the virtual boundary nodes as absorbing nodes.
- The saliency of each node is computed as its absorbed time to absorbing nodes.
- Furthermore, the authors exploit the equilibrium distribution in ergodic Markov chain to weigh the absorbed time, thereby suppressing the saliency in long-range background regions.
- Experimental results show that the proposed method outperforms fifteen state-of-the-art methods on the four public datasets and is computationally efficient.
Did you find this useful? Give us your feedback
Citations
1,372Β citations
Additional excerpts
...149 16 DRFI [79] CVPR 2013 C .697 17 PCA [80] CVPR 2013 M + C 4.34 18 LBI [81] CVPR 2013 M + C 251. 19 GC [82] ICCV 2013 C .037 20 CHM [83] ICCV 2013 M + C 15.4 21 DSR [84] ICCV 2013 M + C 10.2 22 MC [85] ICCV 2013 M + C .195 23 UFO [86] ICCV 2013 M + C 20.3 24 MNP [52] Vis.Comp. 2013 M + C 21.0 25 GR [87] SPL 2013 M + C 1.35 26 RBD [88] CVPR 2014 M .269 27 HDCT [89] CVPR 2014 M 4.12 28 ST [90] TIP 20...
[...]
1,057Β citations
Cites background or methods from "Saliency Detection via Absorbing Ma..."
...To save the space, we only consider the top four models ranked in the survey [23]: SVO [51], CA [17], CB [32], and RC [15] and recently-developed methods: SF [21], LRK [78], HS [33], GMR [48], PCA [31], MC [50], DSR [49], RBD [55] that are not covered in [23]....
[...]
...Object prior, such as connectivity prior [45], concavity context [20], and auto-context cue [46], backgroundness prior [47]β[50], generic objectness prior [51]β[53], and background connectivity prior [38], [54], [55] are also studied for saliency computation....
[...]
983Β citations
Cites background from "Saliency Detection via Absorbing Ma..."
...A large number of approaches [63, 52, 40, 39, 32, 35, 60, 57, 56, 47, 41, 31, 27, 25, 24, 23, 11, 44, 17, 8, 13, 1, 21] are proposed to capture different saliency cues....
[...]
690Β citations
Additional excerpts
...Saliency cues such as center and object bias [31, 22], contrast information [38] and background prior [33, 15] have been shown to be effective in previous work....
[...]
608Β citations
Cites background or methods from "Saliency Detection via Absorbing Ma..."
...[85] propose to formulate the saliency detection via absorbing Markov Chain where the transient and absorbing nodes are superpixels around the image center and border, respectively....
[...]
...To this end, the backgroundness prior is adopted for salient object detection [85, 129, 210, 218], assuming that a narrow border of the image is the background region, i....
[...]
References
10,525Β citations
8,566Β citations
"Saliency Detection via Absorbing Ma..." refers background or methods in this paper
...SED: On this single object and two object dataset, we compare the proposed method with eleven state-of-the-art methods which are LR [29], CB [17], SVO [7], RC [8], CA [27], SER [31], FT [2], GB [14], SR [15], LC [37] and IT [16]....
[...]
...While the CA [27], IT [16], FT [2], SR [15] and LC [37] methods also show the same imbalance....
[...]
...[16] extract center-surround contrast at multiple spatial scales to find the prominent region....
[...]
...We compare our method with fifteen state-of-the-art saliency detection algorithms: the IT [16], MZ [20], LC [37], GB [14], SR [15], AC [1], FT [2], SER [31], CA [27], RC [8], CB [17], SVO [7], SF [25], LR [29] and GS [33] methods....
[...]
...MSRA: On the MSRA dataset, we compare the proposed method with eleven state-of-the-art methods which are LR [29], CB [17], SVO [7], RC [8], CA [27], SER [31], FT [2], GB [14], SR [15], LC [37] and IT [16]....
[...]
6,505Β citations
3,723Β citations
"Saliency Detection via Absorbing Ma..." refers background or methods in this paper
...SED: On this single object and two object dataset, we compare the proposed method with eleven state-of-the-art methods which are LR [29], CB [17], SVO [7], RC [8], CA [27], SER [31], FT [2], GB [14], SR [15], LC [37] and IT [16]....
[...]
...While the CA [27], IT [16], FT [2], SR [15] and LC [37] methods also show the same imbalance....
[...]
...The second one is the ASD dataset, a subset of the MSRA dataset, which contains 1,000 images with accurate human-labelled ground truth provided by [2]....
[...]
...Second, we compute the precision, recall and F-measure with an adaptive threshold proposed in [2], which is defined as twice the mean saliency of the image....
[...]
...We compare our method with fifteen state-of-the-art saliency detection algorithms: the IT [16], MZ [20], LC [37], GB [14], SR [15], AC [1], FT [2], SER [31], CA [27], RC [8], CB [17], SVO [7], SF [25], LR [29] and GS [33] methods....
[...]
3,653Β citations
"Saliency Detection via Absorbing Ma..." refers background or methods in this paper
...SED: On this single object and two object dataset, we compare the proposed method with eleven state-of-the-art methods which are LR [29], CB [17], SVO [7], RC [8], CA [27], SER [31], FT [2], GB [14], SR [15], LC [37] and IT [16]....
[...]
...Region contrast based methods [8, 17] first segment the image and then compute the global contrast of those segments as saliency, which can usually highlight the entire object....
[...]
...We compare our method with fifteen state-of-the-art saliency detection algorithms: the IT [16], MZ [20], LC [37], GB [14], SR [15], AC [1], FT [2], SER [31], CA [27], RC [8], CB [17], SVO [7], SF [25], LR [29] and GS [33] methods....
[...]
...MSRA: On the MSRA dataset, we compare the proposed method with eleven state-of-the-art methods which are LR [29], CB [17], SVO [7], RC [8], CA [27], SER [31], FT [2], GB [14], SR [15], LC [37] and IT [16]....
[...]
...The two evaluation criteria consistently show the proposed method outperforms all the other methods, where the CB [17], SVO [7], RC [8] and CA [27] are top-performance methods for saliency detection in a recent benchmark study [5]....
[...]
Related Papers (5)
Frequently Asked Questions (15)
Q2. What is the effect of using the boundary nodes as absorbing nodes?
Since the boundary nodes usually contain the global characteristics of the image background, by using them as absorbing nodes, the absorbed time of each transient node can reflect its overall similarity with the background, which helps to distinguish salient nodes from background nodes.Β
Q3. Why is the proposed method performing poorly on this dataset?
Due to scrambled backgrounds and heterogeneous foregrounds most images have, and the lack of top-down prior knowledge, the overall performance of the existing bottom-up saliency detection methods is low on this dataset.Β
Q4. What is the effect of the boundary nodes on the saliency of the image background?
as the absorbed time is the expected time to all the absorbing nodes, it covers the effect of all the boundary nodes, which can alleviate the influence of particular regions and encourage the similar nodes in a local context to have the similar saliency, thereby overcoming the defects of using the equilibrium distribution [9, 14, 11, 31].Β
Q5. How can random walk be used to detect saliency?
As salient objects seldom occupy all four image boundaries [33, 5] and the background regions often have appearance connectivity with image boundaries, when the authors use the boundary nodes as absorbing nodes, the random walk starting in background nodes can easily reach the absorbing nodes.Β
Q6. What are the limitations of equilibrium distribution based saliency models?
In addition, equilibrium distribution based saliency models only highlight the boundaries of salient object while object interior still has low saliency value.Β
Q7. What is the effect of the sparse connectivity of the graph?
The sparse connectivity of the graph results that the background nodes near the image center have longer absorbed time than the similar nodes near the image boundaries.Β
Q8. What is the method used to compute precision and recall?
the authors bisegment the saliency map using every threshold in the range [0 : 0.05 : 1], and compute precision and recall at each value of the threshold to plot the precision-recall curve.Β
Q9. Why are some virtual nodes added to the graph as absorbing states?
Because the authors compute the full resolution saliency map, some virtual nodes are added to the graph as absorbing states, which is detailed in the next section.Β
Q10. How can the saliency map be suppressed?
By the update processing, the saliency of the long-range homogeneous regions near the image center can be suppressed as Figure 3 illustrates.Β
Q11. How do the authors solve the saliency detection problem?
The authors further explore the effect of the equilibrium probability in saliency detection, and exploit it to regulate the absorbed time, thereby suppressing the saliency of this kind of regions.Β
Q12. How can a Markov chain be completely specified?
Given a set of states π = {π 1, π 2, . . . , π π}, a Markov chain can be completely specified by the π Γπ transition matrix P, in which πππ is the probability of moving from state π π to state π π .Β
Q13. What is the weighting column vector for the normalized absorbed time?
In this work, the authors use the normalized recurrent time of an ergodic Markov chain, of which the transition matrix is the row normalized Q, as the weight u.Β
Q14. How do the authors reduce the saliency map?
To alleviate this problem, the authors update the saliency map by using a weighted absorbed time yw, which can be denoted as:yw = NΓ u, (10) where u is the weighting column vector.Β
Q15. What is the saliency of each transient state?
Given an input image represented as a Markov chain and some background absorbing states, the saliency of each transient state is defined as the expected number of timesbefore being absorbed into all absorbing nodes by Eq 2.Β