Proceedings Article•DOI•

Saliency Detection via Absorbing Markov Chain

Q: What have the authors contributed in "Saliency detection via absorbing markov chain" ?

In this paper, the authors formulate saliency detection via absorbing Markov chain on an image graph model. The authors jointly consider the appearance divergence and spatial distribution of salient objects and the background. The authors further exploit the equilibrium distribution in an ergodic Markov chain to reduce the absorbed time in the long-range smooth background regions.

Q: Why is the proposed method performing poorly on this dataset?

Due to scrambled backgrounds and heterogeneous foregrounds most images have, and the lack of top-down prior knowledge, the overall performance of the existing bottom-up saliency detection methods is low on this dataset.

Q: What is the effect of the sparse connectivity of the graph?

The sparse connectivity of the graph results that the background nodes near the image center have longer absorbed time than the similar nodes near the image boundaries.

Bowen Jiang¹, Lihe Zhang¹, Huchuan Lu¹, Chuan Yang¹, Ming-Hsuan Yang² - Show less +1 more•Institutions (2)

Dalian University of Technology¹, University of California, Merced²

01 Dec 2013-pp 1665-1672

TL;DR: The appearance divergence and spatial distribution of salient objects and the background are considered and the equilibrium distribution in an ergodic Markov chain is exploited to reduce the absorbed time in the long-range smooth background regions.

read less

Abstract: In this paper, we formulate saliency detection via absorbing Markov chain on an image graph model. We jointly consider the appearance divergence and spatial distribution of salient objects and the background. The virtual boundary nodes are chosen as the absorbing nodes in a Markov chain and the absorbed time from each transient node to boundary absorbing nodes is computed. The absorbed time of transient node measures its global similarity with all absorbing nodes, and thus salient objects can be consistently separated from the background when the absorbed time is used as a metric. Since the time from transient node to absorbing nodes relies on the weights on the path and their spatial distance, the background region on the center of image may be salient. We further exploit the equilibrium distribution in an ergodic Markov chain to reduce the absorbed time in the long-range smooth background regions. Extensive experiments on four benchmark datasets demonstrate robustness and efficiency of the proposed method against the state-of-the-art methods.

...read moreread less

Summary (3 min read)

Jump to: [1. Introduction] – [2. Related Work] – [3.3. Saliency Measure] – [4. Graph Construction] – [5. Saliency Detection] – [6. Experimental Results] – [Evaluation Metrics:] – [ASD:] – [MSRA:] – [SED:] – [SOD:] and [7. Conclusion]

1. Introduction

Saliency detection in computer vision aims to find the most informative and interesting region in a scene.
All bottom-up saliency methods rely on some prior knowledge about salient objects and backgrounds, such as contrast, compactness, etc.
These models still have some certain limitations.
Hence, the absorbed time starting from object nodes is longer than that from background nodes.

3.3. Saliency Measure

Given an input image represented as a Markov chain and some background absorbing states, the saliency of each transient state is defined as the expected number of times before being absorbed into all absorbing nodes by Eq 2.
Because the authors compute the full resolution saliency map, some virtual nodes are added to the graph as absorbing states, which is detailed in the next section.
In the conventional absorbing Markov chain problems, the absorbing nodes are manually labelled with the groundtruth.
As absorbing nodes for saliency detection are selected by the proposed algorithm, some of them may be incorrect.
They have insignificant effects on the final results, which are explained in the following sections.

4. Graph Construction

The authors construct a single layer graph 𝐺(𝑉, 𝐸) with superpixels [3] as nodes 𝑉 and the links between pairs of nodes as edges 𝐸. Because the salient objects seldom occupy all image borders [33] , they duplicate the boundary superpixels around the image borders as the virtual background absorbing nodes, as shown in Figure 2 .
In addition, the authors enforce that all the transient nodes around the image borders (i.e., boundary nodes) are fully connected with each other, which can reduce the geodesic distance between similar superpixels.
The authors first renumber the nodes so that the first 𝑡 nodes are transient nodes and the last 𝑟 nodes are absorbing nodes, then define the affinity matrix A which represents the reverence of nodes as EQUATION ) where 𝑁 (𝑖) denotes the nodes connected to node 𝑖.
One is the spatial distance between the two nodes.
Their distance is larger, and the expected time is longer.

5. Saliency Detection

Most saliency maps generated by the normalized absorbed time y are effective, but some background nodes near the image center may not be adequately suppressed when they are in long-range homogeneous region, as shown in Figure 3 .
Most nodes in this kind of background regions have large transition probabilities, which means that the random walk may transfer many times among these nodes before reaching the absorbing nodes.
The background regions near the image center possibly present comparative saliency with salient objects, thereby decreasing the contrast of objects and backgrounds in the resulted saliency maps.
The larger 𝑠𝑐𝑜𝑟𝑒 means that there are longer-range regions with mid-level saliency in the saliency map.
It should be noted that the absorbing nodes may include object nodes when the salient objects touch the image boundaries, as shown in Figure 4 .

6. Experimental Results

The authors evaluate the proposed method on four benchmark datasets.
The second one is the ASD dataset, a subset of the MSRA dataset, which contains 1,000 images with accurate human-labelled ground truth provided by [2] .
The third one is the SED dataset [28] , which contains: the single object sub-dataset SED1 and two objects sub-dataset SED2.
This dataset is first used for salient object segmentation evaluation [23] , where seven subjects are asked to label the foreground salient object masks.
The authors select and combine the object masks whose consistency scores are higher than 0.7 as the final ground truth as done in [33] .

Evaluation Metrics:

The authors evaluate all methods by precision, recall and F-measure.
The precision is defined as the ratio of salient pixels correctly assigned to all the pixels of extracted regions.
The recall is defined as the ratio of detected salient pixels to the ground-truth number.
The F-measure is the overall performance measurement computed by the weighted harmonic of precision and recall: EQUATION ) Similar as previous works, two evaluation criteria are used in their experiments.
First, the authors bisegment the saliency map using every threshold in the range [0 : 0.05 : 1], and compute precision and recall at each value of the threshold to plot the precision-recall curve.

ASD:

The authors evaluate the performance of the proposed method against fifteen state-of-the-art methods.
The two evaluation criteria consistently show the proposed method outperforms all the other methods, where the CB [17] , SVO [7] , RC [8] and CA [27] are top-performance methods for saliency detection in a recent benchmark study [5] .
Some visual comparison examples are shown in Figure 9 and more results can be found in the supplementary material.
The authors note that the proposed method more uniformly highlights the salient regions while adequately suppresses the backgrounds than the other methods.

MSRA:

This dataset contains the ground truth of salient region marked as bounding boxes by nine subjects.
The authors accumulate the nine ground truth, and then choose the pixels with consistency score higher than 0.5 as salient region and fit a bounding box in the salient region.
Figure 7 shows that the proposed method performs better than the other methods on this large dataset.
Their recalls for adaptive thresholds are quite high and close to 1.
That is because the background is suppressed badly, the cut saliency map contains almost the entire image with low precision.

SED:

As shown in Figure 6 , the proposed method performs best on the SED1 dataset, while performs poorly compared with the RC and CB methods at the recall values from 0.7 to 1 on the SED2 dataset.
That is because their method usually highlights one of two objects while the other has low saliency values due to the appearance diversity of two objects.

SOD:

On this most challenging dataset, the authors evaluate the performance of the post-process step against the map obtained directly form absorbed time ( noted 'Before' ) and twelve state-of-the-art methods as shown in Figure 7 .
The two evaluation criteria show the proposed method performs equally well or slightly better than the GS [33] method.
The authors approach exploits the boundary prior to determine the absorbing nodes, therefore the small salient object touching image boundaries may be incorrectly suppressed.
Figure 8 shows the typical failure cases.
The authors compare the execution time of different methods.

7. Conclusion

Based on the boundary prior, the authors set the virtual boundary nodes as absorbing nodes.
The saliency of each node is computed as its absorbed time to absorbing nodes.
Furthermore, the authors exploit the equilibrium distribution in ergodic Markov chain to weigh the absorbed time, thereby suppressing the saliency in long-range background regions.
Experimental results show that the proposed method outperforms fifteen state-of-the-art methods on the four public datasets and is computationally efficient.

Did you find this useful? Give us your feedback

Figures (10)

Figure 5. Evaluation results on the ASD dataset. Left, middle: precision and recall rates for all algorithms. Right: precision, recall, and F-measure for adaptive thresholds. Our approach consistently outperforms all other methods.

Figure 6. Evaluation results on the SED dataset. Left two: the results for different methods on the SED1 dataset. Right two: the results for different methods on the SED2 dataset.

Figure 1. The time property of absorbing Markov chain and ergodic Markov chain. From left to right: input image with superpixels as nodes; the minimum hitting time of each node to all boundary nodes in ergodic Markov chain; the absorbed time of each node into all boundary nodes in absorbing Markov chain. Each kind of time is normalized as a saliency map respectively.

Figure 7. Evaluation results on the MSRA and SOD dataset. Left two: the results for different methods on the MSRA dataset. Right two: the results for different methods on the SOD dataset.

Figure 2. Illustration of the absorbing nodes. The superpixels outside the yellow bounding box are the duplicated boundary superpixels, which are used as the absorbing nodes.

Table 1. Comparison of average execution time (seconds per image).

Figure 9. Comparison of different methods on the ASD, SED and SOD datasets. The first three rows are from the ASD dataset, the middle three rows are from the SED dataset, the last three rows are from the SOD dataset.

Figure 3. Examples showing the benefits of the update processing. From left to right, input images, results without and with the update processing.

Figure 4. Examples in which the salient objects appear at the image boundaries. From top to down: input images, our saliency maps.

Content maybe subject to copyright Report

Saliency Detection via Absorbing Markov Chain

Bowen Jiang

, Lihe Zhang

, Huchuan Lu

, Chuan Yang

, and Ming-Hsuan Yang

Dalian University of Technology

University of California at Merced

Abstract

In this paper, we formulate saliency detection via ab-

sorbing Markov chain on an image graph model. We joint-

ly consider the appearance divergence and spatial distri-

bution of salient objects and the background. The virtual

boundary nodes are chosen as the absorbing nodes in a

Markov chain and the absorbed time from each transient

node to boundary absorbing nodes is computed. The ab-

sorbed time of transient node measures its global similar-

ity with all absorbing nodes, and thus salient objects can

be consistently separated from the background when the

absorbed time is used as a metric. Since the time from

transient node to absorbing nodes relies on the weights on

the path and their spatial distance, the background region

on the center of image may be salient. We further exploit

the equilibrium distribution in an ergodic Markov chain to

reduce the absorbed time in the long-range smooth back-

ground regions. Extensive experiments on four benchmark

datasets demonstrate robustness and efﬁciency of the pro-

posed method against the state-of-the-art methods.

1. Introduction

Saliency detection in computer vision aims to ﬁnd the

most informative and interesting region in a scene. It has

been effectively applied to numerous computer vision tasks

such as content based image retrieval [32] , image segmen-

tation [30], object recognition [24] and image adaptation

[21]. Existing methods are developed with bottom-up visu-

al cues [19, 10, 26, 34] or top-down models [4, 36].

All bottom-up saliency methods rely on some prior

knowledge about salient objects and backgrounds, such

as contrast, compactness, etc. Different saliency method-

s characterize the prior knowledge from different perspec-

tives. Itti et al. [16] extract center-surround contrast at mul-

tiple spatial scales to ﬁnd the prominent region. Bruce et al.

[6] exploit Shannons self-information measure in local con-

text to compute saliency. However, the local contrast does

not consider the global inﬂuence and only stands out at ob-

ject boundaries. Region contrast based methods [8, 17] ﬁrst

segment the image and then compute the global contrast of

those segments as saliency, which can usually highlight the

entire object. Fourier spectrum analysis has also been used

to detect visual saliency [15, 13]. Recently, Perazzi et al.

[25] unify the contrast and saliency computation into a s-

ingle high-dimensional Gaussian ﬁltering framework. Wei

et al. [33] exploit background priors and geodesic distance

for saliency detection. Yang et al. [35] cast saliency detec-

tion into a graph-based ranking problem, which performs

label propagation on a sparsely connected graph to char-

acterize the overall differences between salient object and

background.

In this work, we reconsider the properties of Markov ran-

dom walks and their relationship with saliency detection.

Existing random walk based methods consistently use the

equilibrium distribution in an ergodic Markov chain [9, 14]

or its extensions, e.g. the site entropy rate [31] and the

hitting time [11], to compute saliency, and have achieved

success in their own aspects. However, these models stil-

l have some certain limitations. Typically, saliency mea-

sure using the hitting time often highlights some particu-

lar small regions in objects or backgrounds. In addition,

equilibrium distribution based saliency models only high-

light the boundaries of salient object while object interior

still has low saliency value. To address these issues, we in-

vestigate the properties of absorbing Markov chains in this

work. Given an image graph as Markov chain and some

absorbing nodes, we compute the expected time to absorp-

tion (i.e. the absorbed time) for each transient node. The

nodes which have similar appearance (i.e. large transition

probabilities) and small spatial distance to absorbing nodes

can be absorbed faster. As salient objects seldom occupy all

four image boundaries [33, 5] and the background regions

often have appearance connectivity with image boundaries,

when we use the boundary nodes as absorbing nodes, the

random walk starting in background nodes can easily reach

the absorbing nodes. While object regions often have great

contrast to the image background, it is difﬁcult for a ran-

dom walk from object nodes to reach these absorbing nodes

(represented by boundary nodes). Hence, the absorbed time

starting from object nodes is longer than that from back-

ground nodes. In addition, in a long run, the absorbed time

with similar starting nodes is roughly the same. Inspired

Figure 1. The time property of absorbing Markov chain and ergod-

ic Markov chain. From left to right: input image with superpixels

as nodes; the minimum hitting time of each node to all boundary

nodes in ergodic Markov chain; the absorbed time of each node

into all boundary nodes in absorbing Markov chain. Each kind of

time is normalized as a saliency map respectively.

by these observations, we formulate saliency detection as a

random walk problem in the absorbing Markov chain.

The absorbed time is not always effective especially

when there are long-range smooth background regions near

the i mage center. We further explore the effect of the equi-

librium probability in saliency detection, and exploit it to

regulate the absorbed time, thereby suppressing the salien-

cy of this kind of regions.

2. Related Work

Previous works that simulate saliency detection in ran-

dom walk model include [9, 14, 11, 31]. Costa et al. [9]

identify the saliency region based on the frequency of visits

to each node at the equilibrium of the random walk. Harel

et al. [14] extend the above method by deﬁning a dissimi-

lar measure to model the transition probability between two

nodes. In [31], Wang et al. introduce the entropy rate and

incorporate the equilibrium distribution to measure the av-

erage information transmitted from a node to the others at

one step, which is used to predict visual attention. A ma-

jor problem using the equilibrium distribution is that this

approach often only highlights the texture and boundary re-

gions rather than the entire object, as the equilibrium prob-

ability in the cluttered region is larger than in homogeneous

region when using the dissimilarity of two nodes to rep-

resent their transition probability. Furthermore, the main

objectives in [9, 14, 31] are to predict human ﬁxations on

natural images as opposed to identifying salient regions that

correspond to objects, as illustrated in this paper.

The approach most related to ours is Gopalakrishnan et

al. [11], which exploits the hitting time on the fully con-

nected graph and the s parsely connected graph to ﬁnd the

most salient seed, based on which some background seed-

s are determined again. They then use the difference of the

hitting times to the two kinds of seeds to compute the salien-

cy for each node. While they alleviate the problem of using

the equilibrium distribution to measure saliency, the iden-

tiﬁcation of the salient seed is difﬁcult, especially for the

scenes with complex salient objects. More importantly, the

hitting time based saliency measure prefers to highlight the

global rare regions and does not suppress the backgrounds

very well, thereby decreasing the overall saliency of object-

s (See Figure 1). This can be explained as follows. The

hitting time is the expected time taken to reach a node if

the Markov chain is started in another node. The ergodic

Markov chain doesn’t have a mechanism that can synthet-

ically consider the relationships between a node and mul-

tiple speciﬁc nodes (e.g. seed nodes). In [11], to describe

the relevance of a node to background seeds, they use the

minimum hitting time to take all the background seeds into

account. The minimum time itself is sensitive to some noise

regions in the image.

Different from the above methods, we consider the ab-

sorbing Markov random walk, which includes two kinds of

nodes (i.e. absorbing nodes and transient nodes), to mea-

sure saliency. For an absorbing chain started in a transien-

t node, the probability of absorption in an absorbing node

indicates the relationship between the two nodes, and the

absorption time therefore implicates the selective relation-

ships between this transient node and all the absorbing n-

odes. Since the boundary nodes usually contain the global

characteristics of the image background, by using them as

absorbing nodes, the absorbed time of each transient node

can reﬂect its overall similarity with the background, which

helps to distinguish salient nodes from background nodes.

Moreover, as the absorbed time is the expected time to all

the absorbing nodes, it covers the effect of all the bound-

ary nodes, which can alleviate the inﬂuence of particular re-

gions and encourage the similar nodes in a local context to

have the similar saliency, thereby overcoming the defects of

using the equilibrium distribution [9, 14, 11, 31]. Different

from [9, 14] which directly use the equilibrium distribution

to simulate human attention, we exploit it to weigh the ab-

sorbed time, thereby suppressing the saliency of long-range

background regions with homogeneous appearance.

3. Principle of Markov Chain

Given a set of states 𝑆 = {𝑠

,𝑠

,...,𝑠

𝑚

},aMarkov

chain can be completely speciﬁed by the 𝑚 × 𝑚 transition

matrix P, in which 𝑝

𝑖𝑗

is the probability of moving from

state 𝑠

𝑖

to state 𝑠

𝑗

. This probability does not depend up-

on which state the chain is in before the current state. The

chain starts in some state and move from one state to anoth-

er successively.

3.1. Absorbing Markov Chain

The state 𝑠

𝑖

is absorbing when 𝑝

𝑖𝑖

=1, which means

𝑝

𝑖𝑗

=0for all 𝑖 ∕= 𝑗. A Markov chain is absorbing if it has

at least one absorbing state. It is possible to go from every

transient state to some absorbing state, not necessarily in

one step. Considering an absorbing chain with 𝑟 absorbing

states and 𝑡 transient states, renumber the states so that the

transient states come ﬁrst, then the transition matrix P has

the following canonical form,

P →





, (1)

where the ﬁrst 𝑡 states are transient and the last 𝑟 states are

absorbing. Q ∈ [0, 1]

𝑡×𝑡

contains the transition probabili-

ties between any pair of transient states, while R ∈ [0, 1]

𝑡×𝑟

contains the probabilities of moving from any transient state

to any absorbing state. 0 is the 𝑟 × 𝑡 zero matrix and I is the

𝑟 × 𝑟 identity matrix.

For an absorbing chain, we can derive its fundamental

matrix N =(I − Q)

−1

, where 𝑛

𝑖𝑗

can be interestingly

interpreted as the expected number of times that the chain

spends in the transient state 𝑗 given that the chain starts in

the transient state 𝑖 , and the sum



𝑗

𝑛

𝑖𝑗

reveals the expect-

ed number of times before absorption (into any absorbing

state). Thus, we can compute the absorbed time for each

transient state, that is,

y = N × c, (2)

where c is a 𝑡 dimensional column vector all of whose ele-

ments are 1.

3.2. Ergodic Markov Chain

An ergodic Markov chain is one in which it is possi-

ble to go from every state to every state, not necessarily

in one step. An ergodic chain with any starting state always

reaches equilibrium after a certain time, and the equilibri-

um state is characterized by the equilibrium distribution 𝜋,

which satisﬁes the equation

𝜋P = 𝜋, (3)

where P is the ergodic transition matrix. 𝜋 is a strictly

positive probability vector, where 𝜋

𝑖

describes the expected

probability of the chain staying in state 𝑠

𝑖

in equilibrium.

When the chain starts in state 𝑠

𝑖

, the mean recurrent time ℎ

𝑖

(i.e., the expected number of times to return to state 𝑠

𝑖

) can

be derived from the equilibrium distribution 𝜋. That is,

ℎ

𝑖

𝜋

𝑖

, (4)

where 𝑖 indexes all the states in the ergodic Markov chain.

The more states there are similar to state 𝑠

𝑖

nearby, the less

ℎ

𝑖

is. The derivation details and proofs can be found in [12].

3.3. Saliency Measure

Given an input image represented as a Markov chain

and some background absorbing states, the saliency of each

transient state is deﬁned as the expected number of times

Figure 2. Illustration of the absorbing nodes. The superpixels out-

side the yellow bounding box are the duplicated boundary super-

pixels, which are used as the absorbing nodes.

before being absorbed into all absorbing nodes by Eq 2. In

this work, the transition matrix is constructed on a sparse-

ly connected graph, where each node corresponds to a s-

tate. Because we compute the full resolution saliency map,

some virtual nodes are added to the graph as absorbing s-

tates, which is detailed in the next section.

In the conventional absorbing Markov chain problems,

the absorbing nodes are manually labelled with the ground-

truth. However, as absorbing nodes for saliency detection

are selected by the proposed algorithm, some of them may

be incorrect. They have insigniﬁcant effects on the ﬁnal

results, which are explained in the following sections.

4. Graph Construction

We construct a single layer graph 𝐺(𝑉,𝐸) with super-

pixels [3] as nodes 𝑉 and the links between pairs of nodes

as edges 𝐸. Because the salient objects seldom occupy all

image borders [33], we duplicate the boundary superpixels

around the image borders as the virtual background absorb-

ing nodes, as shown in Figure 2. On this graph, each node

(transient or absorbing) is connected to the transient nodes

which neighbour it or share common boundaries with its

neighbouring nodes. That means that any pair of absorb-

ing nodes are unconnected. In addition, we enforce that all

the transient nodes around the image borders (i.e., bound-

ary nodes) are fully connected with each other, which can

reduce the geodesic distance between similar superpixels.

The weights of the edges encode nodal afﬁnity such that n-

odes connected by an edge with high weight are considered

to be strongly connected and edges with low weights repre-

sent nearly disconnected nodes. In this work, the weight 𝑤

𝑖𝑗

of the edge 𝑒

𝑖𝑗

between adjacent nodes 𝑖 and 𝑗 is deﬁned as

𝑤

𝑖𝑗

= 𝑒

−

∥𝑥

𝑖

−𝑥

𝑗

∥

𝜎

, (5)

where 𝑥

𝑖

and 𝑥

𝑗

are the mean of two nodes in the CIE LAB

color space, and 𝜎 is a constant that controls the strength of

the weight. We ﬁrst renumber the nodes so that the ﬁrst 𝑡

nodes are transient nodes and the last 𝑟 nodes are absorbing

nodes, then deﬁne the afﬁnity matrix A which represents

the reverence of nodes as

𝑎

𝑖𝑗







𝑤

𝑖𝑗

𝑗 ∈ 𝑁(𝑖), 1 ≤ 𝑖 ≤ 𝑡

1 if 𝑖 = 𝑗

0 otherwise

(6)

where 𝑁 (𝑖) denotes the nodes connected to node 𝑖.The

degree matrix that records the sum of the weights connected

to each node is written as

D = diag(



𝑗

𝑎

𝑖𝑗

). (7)

Finally, the transition matrix P on the sparsely connected

graph is given as

P = D

−1

× A, (8)

which is actually the r aw normalized A. As the nodes are

locally connected, P is a sparse matrix with a small number

of nonzero elements.

The sparsely connected graph restricts the random walk

to only move within a local region in each step, hence the

expected time spent to move from transient node 𝑣

𝑡

to ab-

sorbing node 𝑣

𝑎

is determined by two major factors. One

is the spatial distance between the two nodes. Their dis-

tance is larger, and the expected time is longer. The other is

the transition probabilities of the nodes along the different

paths from 𝑣

𝑡

to 𝑣

𝑎

. Large probabilities are able to shorten

the expected time to absorption. Given starting node 𝑣

𝑡

,the

shorter the time is, the larger the probability of absorption

in node 𝑣

𝑎

is in a long run.

5. Saliency Detection

Given the transition matrix P by Eq. 8, we can easily

extract the matrix Q by Eq. 1, based on which the funda-

mental matrix N is computed. Then, we obtain the saliency

map S by normalizing the absorbed time y computed by

Eq. 2 to the range between 0 and 1, that is

S(𝑖)=

y(𝑖) 𝑖 =1, 2,...,𝑡, (9)

where 𝑖 indexes the transient nodes on graph, and

y denotes

the normalized absorbed time vector.

Most saliency maps generated by the normalized ab-

sorbed time

y are effective, but some background nodes

near the image center may not be adequately suppressed

when they are in long-range homogeneous region, as shown

in Figure 3. That can be explained as follows. Most n-

odes in this kind of background regions have large transi-

tion probabilities, which means that the random walk may

transfer many times among these nodes before reaching the

Figure 3. Examples showing the beneﬁts of the update process-

ing. From left to right, input images, results without and with the

update processing.

absorbing nodes. The sparse connectivity of the graph re-

sults that the background nodes near the image center have

longer absorbed time than the similar nodes near the im-

age boundaries. Consequently, the background regions n-

ear the image center possibly present comparative saliency

with salient objects, thereby decreasing the contrast of ob-

jects and backgrounds in the resulted saliency maps. To

alleviate this problem, we update the saliency map by using

a weighted absorbed time y

, which can be denoted as:

= N × u, (10)

where u is the weighting column vector. In this work, we

use the normalized recurrent time of an ergodic Markov

chain, of which the transition matrix is the row normalized

Q, as the weight u.

The equilibrium distribution 𝜋 for the ergodic Markov

chain can be computed from the afﬁnity matrix A as

𝜋

𝑖



𝑗

𝑎

𝑖𝑗



𝑖𝑗

𝑎

𝑖𝑗

, (11)

where 𝑖, 𝑗 index all the transient nodes. Since we deﬁne the

edge weight 𝑤

𝑖𝑗

as the similarity between two nodes, the

nodes within the homogeneous region have large weighted

sum



𝑗

𝑎

𝑖𝑗

. This means the recurrent time in this kind of

region is small as shown in Figure 3. For this reason, we use

the average recurrent time ℎ

𝑗

of each node 𝑗 to weight the

corresponding element 𝑛

𝑖𝑗

(i.e., the expected time spending

in node 𝑗 before absorption given starting node 𝑖 ) in each

row of the fundamental matrix N. Precisely, given the e-

quilibrium distribution 𝜋, ℎ

𝑗

is computed by Eq. 4 and the

weighting vector u is computed as:

𝑢

𝑗

ℎ

𝑗



𝑘

ℎ

𝑘

, (12)

where 𝑗 and 𝑘 index all the transient nodes on graph.

By the update processing, the saliency of the long-range

homogeneous regions near the image center can be sup-

pressed as Figure 3 illustrates. However, if the kind of re-

gion belongs to salient object, its saliency will be also in-

correctly suppressed. Therefore, we deﬁne a principle to

Figure 4. Examples in which the salient objects appear at the image

boundaries. From top to down: input images, our saliency maps.

decide which maps need to be further updated. We ﬁnd that

object regions have great global contrast to background re-

gions in good saliency maps, while it is not the case in the

defective maps as the examples in Figure 3, which consis-

tently contain a number of regions with mid-level saliency.

Hence, given a saliency map, we ﬁrst calculate its gray his-

togram g with ten bins, and then deﬁne a metric 𝑠𝑐𝑜𝑟𝑒 to

characterize this kind of tendency as follows:

𝑠𝑐𝑜𝑟𝑒 =



𝑏=1

𝑔(𝑏) × min(𝑏, (11 − 𝑏)), (13)

where 𝑏 indexes all the bins. The larger 𝑠𝑐𝑜𝑟𝑒 means that

there are longer-range regions with mid-level saliency in the

saliency map.

It should be noted that the absorbing nodes may in-

clude object nodes when the salient objects touch the im-

age boundaries, as shown in Figure 4. These imprecise

background absorbing nodes may result that the object re-

gions close to t he boundary are suppressed. However, the

absorbed time considers the effect of all boundary nodes

and depends on two factors: the edge weights on the path

and the spatial distance, so the parts of object which are far

from or different from the boundary absorbing nodes can be

highlighted correctly. The main procedure of the proposed

method is summarized in Algorithm 1.

Algorithm 1 Saliency detection based on Markov random walk

Input: An image and required parameters.

1. Construct a graph 𝐺 with superpixels as nodes, and use bound-

ary nodes as absorbing nodes;

2. Compute the afﬁnity matrix A by Eq. 6 and the transition ma-

trix P by Eq. 8;

3. Extract the matrix Q from P by Eq. 1, and compute the funda-

mental matrix N =(I − Q)

−1

and the map S by Eq. 9;

4. Compute the 𝑠𝑐𝑜𝑟𝑒 by Eq. 13, if 𝑠𝑐𝑜𝑟𝑒 < 𝛾 , output S and

return;

5. Compute the recurrent time h by Eq. 11 and 4, and the weight

u by Eq. 12, then compute the saliency map S by Eq. 10 and 9;

Output: the full resolution saliency map.

6. Experimental Results

We evaluate the proposed method on four benchmark

datasets. The ﬁrst one is the MSRA dataset [18] which con-

tains 5,000 images with the ground truth marked by bound-

ing boxes. The second one is the ASD dataset, a subset of

the MSRA dataset, which contains 1,000 images with accu-

rate human-labelled ground truth provided by [2]. The third

one is the SED dataset [28], which contains: the single ob-

ject sub-dataset SED1 and two objects sub-dataset SED2.

Each sub-dataset contains 100 images and have accurate

human-labelled ground truth. The fourth one is the most

challenging SOD dataset which contains 300 images from

the Berkeley segmentation dataset [22]. This dataset is ﬁrst

used for salient object segmentation evaluation [23], where

seven subjects are asked to label the foreground salient ob-

ject masks. For each object mask of each subject, a consis-

tency score is computed based on the labels of the other six

subjects. We select and combine the object masks whose

consistency scores are higher than 0.7 as the ﬁnal ground

truth as done in [33]. We compare our method with ﬁfteen

state-of-the-art saliency detection algorithms: the IT [16],

MZ [20], LC [37], GB [14], SR [15], AC [1], FT [2], S-

ER [31], CA [27], RC [8], CB [17], SVO [7], SF [25], L-

R [29] and GS [33] methods.

Experimental Setup: We set the number of superpixel n-

odes 𝑁 = 250 in all the experiments. There are two param-

eters in the proposed algorithm: the edge weight 𝜎 in Eq. 5

to controls the strength of weight between a pair of nodes

and the threshold 𝛾 of 𝑠𝑐𝑜𝑟𝑒 in Eq. 13 to indicate the quality

of the saliency map. These two parameters are empirically

chosen, 𝜎

=0.1 and 𝛾 =2for all the test images in the

experiments.

Evaluation Metrics: We evaluate all methods by precision,

recall and F-measure. The precision is deﬁned as the ratio

of salient pixels correctly assigned to all the pixels of ex-

tracted regions. The recall is deﬁned as the ratio of detected

salient pixels to the ground-truth number. The F-measure

is the overall performance measurement computed by the

weighted harmonic of precision and recall:

𝐹

𝛽

(1 + 𝛽

)𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑅𝑒𝑐𝑎𝑙𝑙

𝛽

𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙

. (14)

We set 𝛽

=0.3 to stress precision more than recall, the

same to [2, 8, 25]. Similar as previous works, two eval-

uation criteria are used in our experiments. First, we bi-

segment the saliency map using every threshold in the range

[0 : 0.05 : 1], and compute precision and recall at each val-

ue of the threshold to plot the precision-recall curve. Sec-

ond, we compute the precision, recall and F-measure with

an adaptive threshold proposed in [2], which is deﬁned as

twice the mean saliency of the image.

HTML Viewer

Frequently Asked Questions (15)

Q1. What have the authors contributed in "Saliency detection via absorbing markov chain" ?

In this paper, the authors formulate saliency detection via absorbing Markov chain on an image graph model. The authors jointly consider the appearance divergence and spatial distribution of salient objects and the background. The authors further exploit the equilibrium distribution in an ergodic Markov chain to reduce the absorbed time in the long-range smooth background regions.

Q2. What is the effect of using the boundary nodes as absorbing nodes?

Since the boundary nodes usually contain the global characteristics of the image background, by using them as absorbing nodes, the absorbed time of each transient node can reflect its overall similarity with the background, which helps to distinguish salient nodes from background nodes.

Q3. Why is the proposed method performing poorly on this dataset?

Due to scrambled backgrounds and heterogeneous foregrounds most images have, and the lack of top-down prior knowledge, the overall performance of the existing bottom-up saliency detection methods is low on this dataset.

Q4. What is the effect of the boundary nodes on the saliency of the image background?

as the absorbed time is the expected time to all the absorbing nodes, it covers the effect of all the boundary nodes, which can alleviate the influence of particular regions and encourage the similar nodes in a local context to have the similar saliency, thereby overcoming the defects of using the equilibrium distribution [9, 14, 11, 31].

Q5. How can random walk be used to detect saliency?

As salient objects seldom occupy all four image boundaries [33, 5] and the background regions often have appearance connectivity with image boundaries, when the authors use the boundary nodes as absorbing nodes, the random walk starting in background nodes can easily reach the absorbing nodes.

Q6. What are the limitations of equilibrium distribution based saliency models?

In addition, equilibrium distribution based saliency models only highlight the boundaries of salient object while object interior still has low saliency value.

Q7. What is the effect of the sparse connectivity of the graph?

The sparse connectivity of the graph results that the background nodes near the image center have longer absorbed time than the similar nodes near the image boundaries.

Q8. What is the method used to compute precision and recall?

the authors bisegment the saliency map using every threshold in the range [0 : 0.05 : 1], and compute precision and recall at each value of the threshold to plot the precision-recall curve.

Q9. Why are some virtual nodes added to the graph as absorbing states?

Because the authors compute the full resolution saliency map, some virtual nodes are added to the graph as absorbing states, which is detailed in the next section.

Q10. How can the saliency map be suppressed?

By the update processing, the saliency of the long-range homogeneous regions near the image center can be suppressed as Figure 3 illustrates.

Q11. How do the authors solve the saliency detection problem?

The authors further explore the effect of the equilibrium probability in saliency detection, and exploit it to regulate the absorbed time, thereby suppressing the saliency of this kind of regions.

Q12. How can a Markov chain be completely specified?

Given a set of states 𝑆 = {𝑠1, 𝑠2, . . . , 𝑠𝑚}, a Markov chain can be completely specified by the 𝑚 ×𝑚 transition matrix P, in which 𝑝𝑖𝑗 is the probability of moving from state 𝑠𝑖 to state 𝑠𝑗 .

Q13. What is the weighting column vector for the normalized absorbed time?

In this work, the authors use the normalized recurrent time of an ergodic Markov chain, of which the transition matrix is the row normalized Q, as the weight u.

Q14. How do the authors reduce the saliency map?

To alleviate this problem, the authors update the saliency map by using a weighted absorbed time yw, which can be denoted as:yw = N× u, (10) where u is the weighting column vector.

Q15. What is the saliency of each transient state?

Given an input image represented as a Markov chain and some background absorbing states, the saliency of each transient state is defined as the expected number of timesbefore being absorbed into all absorbing nodes by Eq 2.

Saliency Detection via Absorbing Markov Chain

Summary (3 min read)

1. Introduction

2. Related Work

3.3. Saliency Measure

4. Graph Construction

5. Saliency Detection

6. Experimental Results

Evaluation Metrics:

ASD:

MSRA:

SED:

SOD:

7. Conclusion

Figures (10)

Citations

Additional excerpts

Cites background or methods from "Saliency Detection via Absorbing Ma..."

Cites background from "Saliency Detection via Absorbing Ma..."

Additional excerpts

Cites background or methods from "Saliency Detection via Absorbing Ma..."

References

"Saliency Detection via Absorbing Ma..." refers background or methods in this paper

"Saliency Detection via Absorbing Ma..." refers background or methods in this paper

"Saliency Detection via Absorbing Ma..." refers background or methods in this paper

Related Papers (5)

Frequently Asked Questions (15)

Q1. What have the authors contributed in "Saliency detection via absorbing markov chain" ?

Q2. What is the effect of using the boundary nodes as absorbing nodes?

Q3. Why is the proposed method performing poorly on this dataset?

Q4. What is the effect of the boundary nodes on the saliency of the image background?

Q5. How can random walk be used to detect saliency?

Q6. What are the limitations of equilibrium distribution based saliency models?

Q7. What is the effect of the sparse connectivity of the graph?

Q8. What is the method used to compute precision and recall?

Q9. Why are some virtual nodes added to the graph as absorbing states?

Q10. How can the saliency map be suppressed?

Q11. How do the authors solve the saliency detection problem?

Q12. How can a Markov chain be completely specified?

Q13. What is the weighting column vector for the normalized absorbed time?

Q14. How do the authors reduce the saliency map?

Q15. What is the saliency of each transient state?