scispace - formally typeset
Search or ask a question
Proceedings Articleβ€’DOIβ€’

Saliency Detection via Absorbing Markov Chain

01 Dec 2013-pp 1665-1672
TL;DR: The appearance divergence and spatial distribution of salient objects and the background are considered and the equilibrium distribution in an ergodic Markov chain is exploited to reduce the absorbed time in the long-range smooth background regions.
Abstract: In this paper, we formulate saliency detection via absorbing Markov chain on an image graph model. We jointly consider the appearance divergence and spatial distribution of salient objects and the background. The virtual boundary nodes are chosen as the absorbing nodes in a Markov chain and the absorbed time from each transient node to boundary absorbing nodes is computed. The absorbed time of transient node measures its global similarity with all absorbing nodes, and thus salient objects can be consistently separated from the background when the absorbed time is used as a metric. Since the time from transient node to absorbing nodes relies on the weights on the path and their spatial distance, the background region on the center of image may be salient. We further exploit the equilibrium distribution in an ergodic Markov chain to reduce the absorbed time in the long-range smooth background regions. Extensive experiments on four benchmark datasets demonstrate robustness and efficiency of the proposed method against the state-of-the-art methods.

SummaryΒ (3 min read)

Jump to:Β [1. Introduction] – [2. Related Work] – [3.3. Saliency Measure] – [4. Graph Construction] – [5. Saliency Detection] – [6. Experimental Results] – [Evaluation Metrics:] – [ASD:] – [MSRA:] – [SED:] – [SOD:] and [7. Conclusion]

1. Introduction

  • Saliency detection in computer vision aims to find the most informative and interesting region in a scene.
  • All bottom-up saliency methods rely on some prior knowledge about salient objects and backgrounds, such as contrast, compactness, etc.
  • These models still have some certain limitations.
  • Hence, the absorbed time starting from object nodes is longer than that from background nodes.

3.3. Saliency Measure

  • Given an input image represented as a Markov chain and some background absorbing states, the saliency of each transient state is defined as the expected number of times before being absorbed into all absorbing nodes by Eq 2.
  • Because the authors compute the full resolution saliency map, some virtual nodes are added to the graph as absorbing states, which is detailed in the next section.
  • In the conventional absorbing Markov chain problems, the absorbing nodes are manually labelled with the groundtruth.
  • As absorbing nodes for saliency detection are selected by the proposed algorithm, some of them may be incorrect.
  • They have insignificant effects on the final results, which are explained in the following sections.

4. Graph Construction

  • The authors construct a single layer graph 𝐺(𝑉, 𝐸) with superpixels [3] as nodes 𝑉 and the links between pairs of nodes as edges 𝐸. Because the salient objects seldom occupy all image borders [33] , they duplicate the boundary superpixels around the image borders as the virtual background absorbing nodes, as shown in Figure 2 .
  • In addition, the authors enforce that all the transient nodes around the image borders (i.e., boundary nodes) are fully connected with each other, which can reduce the geodesic distance between similar superpixels.
  • The authors first renumber the nodes so that the first 𝑑 nodes are transient nodes and the last π‘Ÿ nodes are absorbing nodes, then define the affinity matrix A which represents the reverence of nodes as EQUATION ) where 𝑁 (𝑖) denotes the nodes connected to node 𝑖.
  • One is the spatial distance between the two nodes.
  • Their distance is larger, and the expected time is longer.

5. Saliency Detection

  • Most saliency maps generated by the normalized absorbed time y are effective, but some background nodes near the image center may not be adequately suppressed when they are in long-range homogeneous region, as shown in Figure 3 .
  • Most nodes in this kind of background regions have large transition probabilities, which means that the random walk may transfer many times among these nodes before reaching the absorbing nodes.
  • The background regions near the image center possibly present comparative saliency with salient objects, thereby decreasing the contrast of objects and backgrounds in the resulted saliency maps.
  • The larger π‘ π‘π‘œπ‘Ÿπ‘’ means that there are longer-range regions with mid-level saliency in the saliency map.
  • It should be noted that the absorbing nodes may include object nodes when the salient objects touch the image boundaries, as shown in Figure 4 .

6. Experimental Results

  • The authors evaluate the proposed method on four benchmark datasets.
  • The second one is the ASD dataset, a subset of the MSRA dataset, which contains 1,000 images with accurate human-labelled ground truth provided by [2] .
  • The third one is the SED dataset [28] , which contains: the single object sub-dataset SED1 and two objects sub-dataset SED2.
  • This dataset is first used for salient object segmentation evaluation [23] , where seven subjects are asked to label the foreground salient object masks.
  • The authors select and combine the object masks whose consistency scores are higher than 0.7 as the final ground truth as done in [33] .

Evaluation Metrics:

  • The authors evaluate all methods by precision, recall and F-measure.
  • The precision is defined as the ratio of salient pixels correctly assigned to all the pixels of extracted regions.
  • The recall is defined as the ratio of detected salient pixels to the ground-truth number.
  • The F-measure is the overall performance measurement computed by the weighted harmonic of precision and recall: EQUATION ) Similar as previous works, two evaluation criteria are used in their experiments.
  • First, the authors bisegment the saliency map using every threshold in the range [0 : 0.05 : 1], and compute precision and recall at each value of the threshold to plot the precision-recall curve.

ASD:

  • The authors evaluate the performance of the proposed method against fifteen state-of-the-art methods.
  • The two evaluation criteria consistently show the proposed method outperforms all the other methods, where the CB [17] , SVO [7] , RC [8] and CA [27] are top-performance methods for saliency detection in a recent benchmark study [5] .
  • Some visual comparison examples are shown in Figure 9 and more results can be found in the supplementary material.
  • The authors note that the proposed method more uniformly highlights the salient regions while adequately suppresses the backgrounds than the other methods.

MSRA:

  • This dataset contains the ground truth of salient region marked as bounding boxes by nine subjects.
  • The authors accumulate the nine ground truth, and then choose the pixels with consistency score higher than 0.5 as salient region and fit a bounding box in the salient region.
  • Figure 7 shows that the proposed method performs better than the other methods on this large dataset.
  • Their recalls for adaptive thresholds are quite high and close to 1.
  • That is because the background is suppressed badly, the cut saliency map contains almost the entire image with low precision.

SED:

  • As shown in Figure 6 , the proposed method performs best on the SED1 dataset, while performs poorly compared with the RC and CB methods at the recall values from 0.7 to 1 on the SED2 dataset.
  • That is because their method usually highlights one of two objects while the other has low saliency values due to the appearance diversity of two objects.

SOD:

  • On this most challenging dataset, the authors evaluate the performance of the post-process step against the map obtained directly form absorbed time ( noted 'Before' ) and twelve state-of-the-art methods as shown in Figure 7 .
  • The two evaluation criteria show the proposed method performs equally well or slightly better than the GS [33] method.
  • The authors approach exploits the boundary prior to determine the absorbing nodes, therefore the small salient object touching image boundaries may be incorrectly suppressed.
  • Figure 8 shows the typical failure cases.
  • The authors compare the execution time of different methods.

7. Conclusion

  • Based on the boundary prior, the authors set the virtual boundary nodes as absorbing nodes.
  • The saliency of each node is computed as its absorbed time to absorbing nodes.
  • Furthermore, the authors exploit the equilibrium distribution in ergodic Markov chain to weigh the absorbed time, thereby suppressing the saliency in long-range background regions.
  • Experimental results show that the proposed method outperforms fifteen state-of-the-art methods on the four public datasets and is computationally efficient.

Did you find this useful? Give us your feedback

Figures (10)

Content maybe subject toΒ copyrightΒ Β Β  Report

Saliency Detection via Absorbing Markov Chain
Bowen Jiang
1
, Lihe Zhang
1
, Huchuan Lu
1
, Chuan Yang
1
, and Ming-Hsuan Yang
2
1
Dalian University of Technology
2
University of California at Merced
Abstract
In this paper, we formulate saliency detection via ab-
sorbing Markov chain on an image graph model. We joint-
ly consider the appearance divergence and spatial distri-
bution of salient objects and the background. The virtual
boundary nodes are chosen as the absorbing nodes in a
Markov chain and the absorbed time from each transient
node to boundary absorbing nodes is computed. The ab-
sorbed time of transient node measures its global similar-
ity with all absorbing nodes, and thus salient objects can
be consistently separated from the background when the
absorbed time is used as a metric. Since the time from
transient node to absorbing nodes relies on the weights on
the path and their spatial distance, the background region
on the center of image may be salient. We further exploit
the equilibrium distribution in an ergodic Markov chain to
reduce the absorbed time in the long-range smooth back-
ground regions. Extensive experiments on four benchmark
datasets demonstrate robustness and efficiency of the pro-
posed method against the state-of-the-art methods.
1. Introduction
Saliency detection in computer vision aims to find the
most informative and interesting region in a scene. It has
been effectively applied to numerous computer vision tasks
such as content based image retrieval [32] , image segmen-
tation [30], object recognition [24] and image adaptation
[21]. Existing methods are developed with bottom-up visu-
al cues [19, 10, 26, 34] or top-down models [4, 36].
All bottom-up saliency methods rely on some prior
knowledge about salient objects and backgrounds, such
as contrast, compactness, etc. Different saliency method-
s characterize the prior knowledge from different perspec-
tives. Itti et al. [16] extract center-surround contrast at mul-
tiple spatial scales to find the prominent region. Bruce et al.
[6] exploit Shannons self-information measure in local con-
text to compute saliency. However, the local contrast does
not consider the global influence and only stands out at ob-
ject boundaries. Region contrast based methods [8, 17] first
segment the image and then compute the global contrast of
those segments as saliency, which can usually highlight the
entire object. Fourier spectrum analysis has also been used
to detect visual saliency [15, 13]. Recently, Perazzi et al.
[25] unify the contrast and saliency computation into a s-
ingle high-dimensional Gaussian filtering framework. Wei
et al. [33] exploit background priors and geodesic distance
for saliency detection. Yang et al. [35] cast saliency detec-
tion into a graph-based ranking problem, which performs
label propagation on a sparsely connected graph to char-
acterize the overall differences between salient object and
background.
In this work, we reconsider the properties of Markov ran-
dom walks and their relationship with saliency detection.
Existing random walk based methods consistently use the
equilibrium distribution in an ergodic Markov chain [9, 14]
or its extensions, e.g. the site entropy rate [31] and the
hitting time [11], to compute saliency, and have achieved
success in their own aspects. However, these models stil-
l have some certain limitations. Typically, saliency mea-
sure using the hitting time often highlights some particu-
lar small regions in objects or backgrounds. In addition,
equilibrium distribution based saliency models only high-
light the boundaries of salient object while object interior
still has low saliency value. To address these issues, we in-
vestigate the properties of absorbing Markov chains in this
work. Given an image graph as Markov chain and some
absorbing nodes, we compute the expected time to absorp-
tion (i.e. the absorbed time) for each transient node. The
nodes which have similar appearance (i.e. large transition
probabilities) and small spatial distance to absorbing nodes
can be absorbed faster. As salient objects seldom occupy all
four image boundaries [33, 5] and the background regions
often have appearance connectivity with image boundaries,
when we use the boundary nodes as absorbing nodes, the
random walk starting in background nodes can easily reach
the absorbing nodes. While object regions often have great
contrast to the image background, it is difficult for a ran-
dom walk from object nodes to reach these absorbing nodes
(represented by boundary nodes). Hence, the absorbed time
starting from object nodes is longer than that from back-
ground nodes. In addition, in a long run, the absorbed time
with similar starting nodes is roughly the same. Inspired
1

Figure 1. The time property of absorbing Markov chain and ergod-
ic Markov chain. From left to right: input image with superpixels
as nodes; the minimum hitting time of each node to all boundary
nodes in ergodic Markov chain; the absorbed time of each node
into all boundary nodes in absorbing Markov chain. Each kind of
time is normalized as a saliency map respectively.
by these observations, we formulate saliency detection as a
random walk problem in the absorbing Markov chain.
The absorbed time is not always effective especially
when there are long-range smooth background regions near
the i mage center. We further explore the effect of the equi-
librium probability in saliency detection, and exploit it to
regulate the absorbed time, thereby suppressing the salien-
cy of this kind of regions.
2. Related Work
Previous works that simulate saliency detection in ran-
dom walk model include [9, 14, 11, 31]. Costa et al. [9]
identify the saliency region based on the frequency of visits
to each node at the equilibrium of the random walk. Harel
et al. [14] extend the above method by defining a dissimi-
lar measure to model the transition probability between two
nodes. In [31], Wang et al. introduce the entropy rate and
incorporate the equilibrium distribution to measure the av-
erage information transmitted from a node to the others at
one step, which is used to predict visual attention. A ma-
jor problem using the equilibrium distribution is that this
approach often only highlights the texture and boundary re-
gions rather than the entire object, as the equilibrium prob-
ability in the cluttered region is larger than in homogeneous
region when using the dissimilarity of two nodes to rep-
resent their transition probability. Furthermore, the main
objectives in [9, 14, 31] are to predict human fixations on
natural images as opposed to identifying salient regions that
correspond to objects, as illustrated in this paper.
The approach most related to ours is Gopalakrishnan et
al. [11], which exploits the hitting time on the fully con-
nected graph and the s parsely connected graph to find the
most salient seed, based on which some background seed-
s are determined again. They then use the difference of the
hitting times to the two kinds of seeds to compute the salien-
cy for each node. While they alleviate the problem of using
the equilibrium distribution to measure saliency, the iden-
tification of the salient seed is difficult, especially for the
scenes with complex salient objects. More importantly, the
hitting time based saliency measure prefers to highlight the
global rare regions and does not suppress the backgrounds
very well, thereby decreasing the overall saliency of object-
s (See Figure 1). This can be explained as follows. The
hitting time is the expected time taken to reach a node if
the Markov chain is started in another node. The ergodic
Markov chain doesn’t have a mechanism that can synthet-
ically consider the relationships between a node and mul-
tiple specific nodes (e.g. seed nodes). In [11], to describe
the relevance of a node to background seeds, they use the
minimum hitting time to take all the background seeds into
account. The minimum time itself is sensitive to some noise
regions in the image.
Different from the above methods, we consider the ab-
sorbing Markov random walk, which includes two kinds of
nodes (i.e. absorbing nodes and transient nodes), to mea-
sure saliency. For an absorbing chain started in a transien-
t node, the probability of absorption in an absorbing node
indicates the relationship between the two nodes, and the
absorption time therefore implicates the selective relation-
ships between this transient node and all the absorbing n-
odes. Since the boundary nodes usually contain the global
characteristics of the image background, by using them as
absorbing nodes, the absorbed time of each transient node
can reflect its overall similarity with the background, which
helps to distinguish salient nodes from background nodes.
Moreover, as the absorbed time is the expected time to all
the absorbing nodes, it covers the effect of all the bound-
ary nodes, which can alleviate the influence of particular re-
gions and encourage the similar nodes in a local context to
have the similar saliency, thereby overcoming the defects of
using the equilibrium distribution [9, 14, 11, 31]. Different
from [9, 14] which directly use the equilibrium distribution
to simulate human attention, we exploit it to weigh the ab-
sorbed time, thereby suppressing the saliency of long-range
background regions with homogeneous appearance.
3. Principle of Markov Chain
Given a set of states 𝑆 = {𝑠
1
,𝑠
2
,...,𝑠
π‘š
},aMarkov
chain can be completely specified by the π‘š Γ— π‘š transition
matrix P, in which 𝑝
𝑖𝑗
is the probability of moving from
state 𝑠
𝑖
to state 𝑠
𝑗
. This probability does not depend up-
on which state the chain is in before the current state. The
chain starts in some state and move from one state to anoth-
er successively.
3.1. Absorbing Markov Chain
The state 𝑠
𝑖
is absorbing when 𝑝
𝑖𝑖
=1, which means
𝑝
𝑖𝑗
=0for all 𝑖 βˆ•= 𝑗. A Markov chain is absorbing if it has
at least one absorbing state. It is possible to go from every
transient state to some absorbing state, not necessarily in
one step. Considering an absorbing chain with π‘Ÿ absorbing
2

states and 𝑑 transient states, renumber the states so that the
transient states come first, then the transition matrix P has
the following canonical form,
P β†’
ξ˜‚
QR
0I
ξ˜ƒ
, (1)
where the first 𝑑 states are transient and the last π‘Ÿ states are
absorbing. Q ∈ [0, 1]
𝑑×𝑑
contains the transition probabili-
ties between any pair of transient states, while R ∈ [0, 1]
π‘‘Γ—π‘Ÿ
contains the probabilities of moving from any transient state
to any absorbing state. 0 is the π‘Ÿ Γ— 𝑑 zero matrix and I is the
π‘Ÿ Γ— π‘Ÿ identity matrix.
For an absorbing chain, we can derive its fundamental
matrix N =(I βˆ’ Q)
βˆ’1
, where 𝑛
𝑖𝑗
can be interestingly
interpreted as the expected number of times that the chain
spends in the transient state 𝑗 given that the chain starts in
the transient state 𝑖 , and the sum
ξ˜„
𝑗
𝑛
𝑖𝑗
reveals the expect-
ed number of times before absorption (into any absorbing
state). Thus, we can compute the absorbed time for each
transient state, that is,
y = N Γ— c, (2)
where c is a 𝑑 dimensional column vector all of whose ele-
ments are 1.
3.2. Ergodic Markov Chain
An ergodic Markov chain is one in which it is possi-
ble to go from every state to every state, not necessarily
in one step. An ergodic chain with any starting state always
reaches equilibrium after a certain time, and the equilibri-
um state is characterized by the equilibrium distribution πœ‹,
which satisfies the equation
πœ‹P = πœ‹, (3)
where P is the ergodic transition matrix. πœ‹ is a strictly
positive probability vector, where πœ‹
𝑖
describes the expected
probability of the chain staying in state 𝑠
𝑖
in equilibrium.
When the chain starts in state 𝑠
𝑖
, the mean recurrent time β„Ž
𝑖
(i.e., the expected number of times to return to state 𝑠
𝑖
) can
be derived from the equilibrium distribution πœ‹. That is,
β„Ž
𝑖
=
1
πœ‹
𝑖
, (4)
where 𝑖 indexes all the states in the ergodic Markov chain.
The more states there are similar to state 𝑠
𝑖
nearby, the less
β„Ž
𝑖
is. The derivation details and proofs can be found in [12].
3.3. Saliency Measure
Given an input image represented as a Markov chain
and some background absorbing states, the saliency of each
transient state is defined as the expected number of times
Figure 2. Illustration of the absorbing nodes. The superpixels out-
side the yellow bounding box are the duplicated boundary super-
pixels, which are used as the absorbing nodes.
before being absorbed into all absorbing nodes by Eq 2. In
this work, the transition matrix is constructed on a sparse-
ly connected graph, where each node corresponds to a s-
tate. Because we compute the full resolution saliency map,
some virtual nodes are added to the graph as absorbing s-
tates, which is detailed in the next section.
In the conventional absorbing Markov chain problems,
the absorbing nodes are manually labelled with the ground-
truth. However, as absorbing nodes for saliency detection
are selected by the proposed algorithm, some of them may
be incorrect. They have insignificant effects on the final
results, which are explained in the following sections.
4. Graph Construction
We construct a single layer graph 𝐺(𝑉,𝐸) with super-
pixels [3] as nodes 𝑉 and the links between pairs of nodes
as edges 𝐸. Because the salient objects seldom occupy all
image borders [33], we duplicate the boundary superpixels
around the image borders as the virtual background absorb-
ing nodes, as shown in Figure 2. On this graph, each node
(transient or absorbing) is connected to the transient nodes
which neighbour it or share common boundaries with its
neighbouring nodes. That means that any pair of absorb-
ing nodes are unconnected. In addition, we enforce that all
the transient nodes around the image borders (i.e., bound-
ary nodes) are fully connected with each other, which can
reduce the geodesic distance between similar superpixels.
The weights of the edges encode nodal affinity such that n-
odes connected by an edge with high weight are considered
to be strongly connected and edges with low weights repre-
sent nearly disconnected nodes. In this work, the weight 𝑀
𝑖𝑗
of the edge 𝑒
𝑖𝑗
between adjacent nodes 𝑖 and 𝑗 is defined as
𝑀
𝑖𝑗
= 𝑒
βˆ’
βˆ₯π‘₯
𝑖
βˆ’π‘₯
𝑗
βˆ₯
𝜎
2
, (5)
where π‘₯
𝑖
and π‘₯
𝑗
are the mean of two nodes in the CIE LAB
color space, and 𝜎 is a constant that controls the strength of
3

the weight. We first renumber the nodes so that the first 𝑑
nodes are transient nodes and the last π‘Ÿ nodes are absorbing
nodes, then define the affinity matrix A which represents
the reverence of nodes as
π‘Ž
𝑖𝑗
=
ξ˜…
ξ˜†
ξ˜‡
𝑀
𝑖𝑗
𝑗 ∈ 𝑁(𝑖), 1 ≀ 𝑖 ≀ 𝑑
1 if 𝑖 = 𝑗
0 otherwise
(6)
where 𝑁 (𝑖) denotes the nodes connected to node 𝑖.The
degree matrix that records the sum of the weights connected
to each node is written as
D = diag(

𝑗
π‘Ž
𝑖𝑗
). (7)
Finally, the transition matrix P on the sparsely connected
graph is given as
P = D
βˆ’1
Γ— A, (8)
which is actually the r aw normalized A. As the nodes are
locally connected, P is a sparse matrix with a small number
of nonzero elements.
The sparsely connected graph restricts the random walk
to only move within a local region in each step, hence the
expected time spent to move from transient node 𝑣
𝑑
to ab-
sorbing node 𝑣
π‘Ž
is determined by two major factors. One
is the spatial distance between the two nodes. Their dis-
tance is larger, and the expected time is longer. The other is
the transition probabilities of the nodes along the different
paths from 𝑣
𝑑
to 𝑣
π‘Ž
. Large probabilities are able to shorten
the expected time to absorption. Given starting node 𝑣
𝑑
,the
shorter the time is, the larger the probability of absorption
in node 𝑣
π‘Ž
is in a long run.
5. Saliency Detection
Given the transition matrix P by Eq. 8, we can easily
extract the matrix Q by Eq. 1, based on which the funda-
mental matrix N is computed. Then, we obtain the saliency
map S by normalizing the absorbed time y computed by
Eq. 2 to the range between 0 and 1, that is
S(𝑖)=
y(𝑖) 𝑖 =1, 2,...,𝑑, (9)
where 𝑖 indexes the transient nodes on graph, and
y denotes
the normalized absorbed time vector.
Most saliency maps generated by the normalized ab-
sorbed time
y are effective, but some background nodes
near the image center may not be adequately suppressed
when they are in long-range homogeneous region, as shown
in Figure 3. That can be explained as follows. Most n-
odes in this kind of background regions have large transi-
tion probabilities, which means that the random walk may
transfer many times among these nodes before reaching the
Figure 3. Examples showing the benefits of the update process-
ing. From left to right, input images, results without and with the
update processing.
absorbing nodes. The sparse connectivity of the graph re-
sults that the background nodes near the image center have
longer absorbed time than the similar nodes near the im-
age boundaries. Consequently, the background regions n-
ear the image center possibly present comparative saliency
with salient objects, thereby decreasing the contrast of ob-
jects and backgrounds in the resulted saliency maps. To
alleviate this problem, we update the saliency map by using
a weighted absorbed time y
w
, which can be denoted as:
y
w
= N Γ— u, (10)
where u is the weighting column vector. In this work, we
use the normalized recurrent time of an ergodic Markov
chain, of which the transition matrix is the row normalized
Q, as the weight u.
The equilibrium distribution πœ‹ for the ergodic Markov
chain can be computed from the affinity matrix A as
πœ‹
𝑖
=
ξ˜„
𝑗
π‘Ž
𝑖𝑗
ξ˜„
𝑖𝑗
π‘Ž
𝑖𝑗
, (11)
where 𝑖, 𝑗 index all the transient nodes. Since we define the
edge weight 𝑀
𝑖𝑗
as the similarity between two nodes, the
nodes within the homogeneous region have large weighted
sum
ξ˜„
𝑗
π‘Ž
𝑖𝑗
. This means the recurrent time in this kind of
region is small as shown in Figure 3. For this reason, we use
the average recurrent time β„Ž
𝑗
of each node 𝑗 to weight the
corresponding element 𝑛
𝑖𝑗
(i.e., the expected time spending
in node 𝑗 before absorption given starting node 𝑖 ) in each
row of the fundamental matrix N. Precisely, given the e-
quilibrium distribution πœ‹, β„Ž
𝑗
is computed by Eq. 4 and the
weighting vector u is computed as:
𝑒
𝑗
=
β„Ž
𝑗
ξ˜„
π‘˜
β„Ž
π‘˜
, (12)
where 𝑗 and π‘˜ index all the transient nodes on graph.
By the update processing, the saliency of the long-range
homogeneous regions near the image center can be sup-
pressed as Figure 3 illustrates. However, if the kind of re-
gion belongs to salient object, its saliency will be also in-
correctly suppressed. Therefore, we define a principle to
4

Figure 4. Examples in which the salient objects appear at the image
boundaries. From top to down: input images, our saliency maps.
decide which maps need to be further updated. We find that
object regions have great global contrast to background re-
gions in good saliency maps, while it is not the case in the
defective maps as the examples in Figure 3, which consis-
tently contain a number of regions with mid-level saliency.
Hence, given a saliency map, we first calculate its gray his-
togram g with ten bins, and then define a metric π‘ π‘π‘œπ‘Ÿπ‘’ to
characterize this kind of tendency as follows:
π‘ π‘π‘œπ‘Ÿπ‘’ =
10

𝑏=1
𝑔(𝑏) Γ— min(𝑏, (11 βˆ’ 𝑏)), (13)
where 𝑏 indexes all the bins. The larger π‘ π‘π‘œπ‘Ÿπ‘’ means that
there are longer-range regions with mid-level saliency in the
saliency map.
It should be noted that the absorbing nodes may in-
clude object nodes when the salient objects touch the im-
age boundaries, as shown in Figure 4. These imprecise
background absorbing nodes may result that the object re-
gions close to t he boundary are suppressed. However, the
absorbed time considers the effect of all boundary nodes
and depends on two factors: the edge weights on the path
and the spatial distance, so the parts of object which are far
from or different from the boundary absorbing nodes can be
highlighted correctly. The main procedure of the proposed
method is summarized in Algorithm 1.
Algorithm 1 Saliency detection based on Markov random walk
Input: An image and required parameters.
1. Construct a graph 𝐺 with superpixels as nodes, and use bound-
ary nodes as absorbing nodes;
2. Compute the affinity matrix A by Eq. 6 and the transition ma-
trix P by Eq. 8;
3. Extract the matrix Q from P by Eq. 1, and compute the funda-
mental matrix N =(I βˆ’ Q)
βˆ’1
and the map S by Eq. 9;
4. Compute the π‘ π‘π‘œπ‘Ÿπ‘’ by Eq. 13, if π‘ π‘π‘œπ‘Ÿπ‘’ < 𝛾 , output S and
return;
5. Compute the recurrent time h by Eq. 11 and 4, and the weight
u by Eq. 12, then compute the saliency map S by Eq. 10 and 9;
Output: the full resolution saliency map.
6. Experimental Results
We evaluate the proposed method on four benchmark
datasets. The first one is the MSRA dataset [18] which con-
tains 5,000 images with the ground truth marked by bound-
ing boxes. The second one is the ASD dataset, a subset of
the MSRA dataset, which contains 1,000 images with accu-
rate human-labelled ground truth provided by [2]. The third
one is the SED dataset [28], which contains: the single ob-
ject sub-dataset SED1 and two objects sub-dataset SED2.
Each sub-dataset contains 100 images and have accurate
human-labelled ground truth. The fourth one is the most
challenging SOD dataset which contains 300 images from
the Berkeley segmentation dataset [22]. This dataset is first
used for salient object segmentation evaluation [23], where
seven subjects are asked to label the foreground salient ob-
ject masks. For each object mask of each subject, a consis-
tency score is computed based on the labels of the other six
subjects. We select and combine the object masks whose
consistency scores are higher than 0.7 as the final ground
truth as done in [33]. We compare our method with fifteen
state-of-the-art saliency detection algorithms: the IT [16],
MZ [20], LC [37], GB [14], SR [15], AC [1], FT [2], S-
ER [31], CA [27], RC [8], CB [17], SVO [7], SF [25], L-
R [29] and GS [33] methods.
Experimental Setup: We set the number of superpixel n-
odes 𝑁 = 250 in all the experiments. There are two param-
eters in the proposed algorithm: the edge weight 𝜎 in Eq. 5
to controls the strength of weight between a pair of nodes
and the threshold 𝛾 of π‘ π‘π‘œπ‘Ÿπ‘’ in Eq. 13 to indicate the quality
of the saliency map. These two parameters are empirically
chosen, 𝜎
2
=0.1 and 𝛾 =2for all the test images in the
experiments.
Evaluation Metrics: We evaluate all methods by precision,
recall and F-measure. The precision is defined as the ratio
of salient pixels correctly assigned to all the pixels of ex-
tracted regions. The recall is defined as the ratio of detected
salient pixels to the ground-truth number. The F-measure
is the overall performance measurement computed by the
weighted harmonic of precision and recall:
𝐹
𝛽
=
(1 + 𝛽
2
)𝑃 π‘Ÿπ‘’π‘π‘–π‘ π‘–π‘œπ‘› Γ— π‘…π‘’π‘π‘Žπ‘™π‘™
𝛽
2
𝑃 π‘Ÿπ‘’π‘π‘–π‘ π‘–π‘œπ‘› + π‘…π‘’π‘π‘Žπ‘™π‘™
. (14)
We set 𝛽
2
=0.3 to stress precision more than recall, the
same to [2, 8, 25]. Similar as previous works, two eval-
uation criteria are used in our experiments. First, we bi-
segment the saliency map using every threshold in the range
[0 : 0.05 : 1], and compute precision and recall at each val-
ue of the threshold to plot the precision-recall curve. Sec-
ond, we compute the precision, recall and F-measure with
an adaptive threshold proposed in [2], which is defined as
twice the mean saliency of the image.
5

Citations
More filters
Journal Articleβ€’DOIβ€’
TL;DR: It is found that the models designed specifically for salient object detection generally work better than models in closely related areas, which provides a precise definition and suggests an appropriate treatment of this problem that distinguishes it from other problems.
Abstract: We extensively compare, qualitatively and quantitatively, 41 state-of-the-art models (29 salient object detection, 10 fixation prediction, 1 objectness, and 1 baseline) over seven challenging data sets for the purpose of benchmarking salient object detection and segmentation methods. From the results obtained so far, our evaluation shows a consistent rapid progress over the last few years in terms of both accuracy and running time. The top contenders in this benchmark significantly outperform the models identified as the best in the previous benchmark conducted three years ago. We find that the models designed specifically for salient object detection generally work better than models in closely related areas, which in turn provides a precise definition and suggests an appropriate treatment of this problem that distinguishes it from other problems. In particular, we analyze the influences of center bias and scene complexity in model performance, which, along with the hard cases for the state-of-the-art models, provide useful hints toward constructing more challenging large-scale data sets and better saliency models. Finally, we propose probable solutions for tackling several open problems, such as evaluation scores and data set bias, which also suggest future research directions in the rapidly growing field of salient object detection.

1,372Β citations


Additional excerpts

  • ...149 16 DRFI [79] CVPR 2013 C .697 17 PCA [80] CVPR 2013 M + C 4.34 18 LBI [81] CVPR 2013 M + C 251. 19 GC [82] ICCV 2013 C .037 20 CHM [83] ICCV 2013 M + C 15.4 21 DSR [84] ICCV 2013 M + C 10.2 22 MC [85] ICCV 2013 M + C .195 23 UFO [86] ICCV 2013 M + C 20.3 24 MNP [52] Vis.Comp. 2013 M + C 21.0 25 GR [87] SPL 2013 M + C 1.35 26 RBD [88] CVPR 2014 M .269 27 HDCT [89] CVPR 2014 M 4.12 28 ST [90] TIP 20...

    [...]

Proceedings Articleβ€’DOIβ€’
23 Jun 2013
TL;DR: This paper regards saliency map computation as a regression problem, which is based on multi-level image segmentation, and uses the supervised learning approach to map the regional feature vector to a saliency score, and finally fuses the saliency scores across multiple levels, yielding the salency map.
Abstract: Salient object detection has been attracting a lot of interest, and recently various heuristic computational models have been designed. In this paper, we regard saliency map computation as a regression problem. Our method, which is based on multi-level image segmentation, uses the supervised learning approach to map the regional feature vector to a saliency score, and finally fuses the saliency scores across multiple levels, yielding the saliency map. The contributions lie in two-fold. One is that we show our approach, which integrates the regional contrast, regional property and regional background ness descriptors together to form the master saliency map, is able to produce superior saliency maps to existing algorithms most of which combine saliency maps heuristically computed from different types of features. The other is that we introduce a new regional feature vector, background ness, to characterize the background, which can be regarded as a counterpart of the objectness descriptor [2]. The performance evaluation on several popular benchmark data sets validates that our approach outperforms existing state-of-the-arts.

1,057Β citations


Cites background or methods from "Saliency Detection via Absorbing Ma..."

  • ...To save the space, we only consider the top four models ranked in the survey [23]: SVO [51], CA [17], CB [32], and RC [15] and recently-developed methods: SF [21], LRK [78], HS [33], GMR [48], PCA [31], MC [50], DSR [49], RBD [55] that are not covered in [23]....

    [...]

  • ...Object prior, such as connectivity prior [45], concavity context [20], and auto-context cue [46], backgroundness prior [47]–[50], generic objectness prior [51]–[53], and background connectivity prior [38], [54], [55] are also studied for saliency computation....

    [...]

Proceedings Articleβ€’DOIβ€’
07 Jun 2015
TL;DR: This paper proposes a multi-context deep learning framework for salient object detection that employs deep Convolutional Neural Networks to model saliency of objects in images and investigates different pre-training strategies to provide a better initialization for training the deep neural networks.
Abstract: Low-level saliency cues or priors do not produce good enough saliency detection results especially when the salient object presents in a low-contrast background with confusing visual appearance. This issue raises a serious problem for conventional approaches. In this paper, we tackle this problem by proposing a multi-context deep learning framework for salient object detection. We employ deep Convolutional Neural Networks to model saliency of objects in images. Global context and local context are both taken into account, and are jointly modeled in a unified multi-context deep learning framework. To provide a better initialization for training the deep neural networks, we investigate different pre-training strategies, and a task-specific pre-training scheme is designed to make the multi-context modeling suited for saliency detection. Furthermore, recently proposed contemporary deep models in the ImageNet Image Classification Challenge are tested, and their effectiveness in saliency detection are investigated. Our approach is extensively evaluated on five public datasets, and experimental results show significant and consistent improvements over the state-of-the-art methods.

983Β citations


Cites background from "Saliency Detection via Absorbing Ma..."

  • ...A large number of approaches [63, 52, 40, 39, 32, 35, 60, 57, 56, 47, 41, 31, 27, 25, 24, 23, 11, 44, 17, 8, 13, 1, 21] are proposed to capture different saliency cues....

    [...]

Proceedings Articleβ€’DOIβ€’
07 Jun 2015
TL;DR: This method presents two interesting insights: first, local features learned by a supervised scheme can effectively capture local contrast, texture and shape information for saliency detection and second, the complex relationship between different global saliency cues can be captured by deep networks and exploited principally rather than heuristically.
Abstract: This paper presents a saliency detection algorithm by integrating both local estimation and global search. In the local estimation stage, we detect local saliency by using a deep neural network (DNN-L) which learns local patch features to determine the saliency value of each pixel. The estimated local saliency maps are further refined by exploring the high level object concepts. In the global search stage, the local saliency map together with global contrast and geometric information are used as global features to describe a set of object candidate regions. Another deep neural network (DNN-G) is trained to predict the saliency score of each object region based on the global features. The final saliency map is generated by a weighted sum of salient object regions. Our method presents two interesting insights. First, local features learned by a supervised scheme can effectively capture local contrast, texture and shape information for saliency detection. Second, the complex relationship between different global saliency cues can be captured by deep networks and exploited principally rather than heuristically. Quantitative and qualitative experiments on several benchmark data sets demonstrate that our algorithm performs favorably against the state-of-the-art methods.

690Β citations


Additional excerpts

  • ...Saliency cues such as center and object bias [31, 22], contrast information [38] and background prior [33, 15] have been shown to be effective in previous work....

    [...]

Journal Articleβ€’DOIβ€’
TL;DR: A comprehensive review of recent progress in salient object detection is provided and this field is situate among other closely related areas such as generic scene segmentation, object proposal generation, and saliency for fixation prediction.
Abstract: Detecting and segmenting salient objects from natural scenes, often referred to as salient object detection, has attracted great interest in computer vision. While many models have been proposed and several applications have emerged, a deep understanding of achievements and issues remains lacking. We aim to provide a comprehensive review of recent progress in salient object detection and situate this field among other closely related areas such as generic scene segmentation, object proposal generation, and saliency for fixation prediction. Covering 228 publications, we survey i) roots, key concepts, and tasks, ii) core techniques and main modeling trends, and iii) datasets and evaluation metrics for salient object detection. We also discuss open problems such as evaluation metrics and dataset bias in model performance, and suggest future research directions.

608Β citations


Cites background or methods from "Saliency Detection via Absorbing Ma..."

  • ...[85] propose to formulate the saliency detection via absorbing Markov Chain where the transient and absorbing nodes are superpixels around the image center and border, respectively....

    [...]

  • ...To this end, the backgroundness prior is adopted for salient object detection [85, 129, 210, 218], assuming that a narrow border of the image is the background region, i....

    [...]

References
More filters
Journal Articleβ€’DOIβ€’
TL;DR: In this article, a visual attention system inspired by the behavior and the neuronal architecture of the early primate visual system is presented, where multiscale image features are combined into a single topographical saliency map.
Abstract: A visual attention system, inspired by the behavior and the neuronal architecture of the early primate visual system, is presented. Multiscale image features are combined into a single topographical saliency map. A dynamical neural network then selects attended locations in order of decreasing saliency. The system breaks down the complex problem of scene understanding by rapidly selecting, in a computationally efficient manner, conspicuous locations to be analyzed in detail.

10,525Β citations

01 Jan 1998
TL;DR: A visual attention system, inspired by the behavior and the neuronal architecture of the early primate visual system, is presented, which breaks down the complex problem of scene understanding by rapidly selecting conspicuous locations to be analyzed in detail.

8,566Β citations


"Saliency Detection via Absorbing Ma..." refers background or methods in this paper

  • ...SED: On this single object and two object dataset, we compare the proposed method with eleven state-of-the-art methods which are LR [29], CB [17], SVO [7], RC [8], CA [27], SER [31], FT [2], GB [14], SR [15], LC [37] and IT [16]....

    [...]

  • ...While the CA [27], IT [16], FT [2], SR [15] and LC [37] methods also show the same imbalance....

    [...]

  • ...[16] extract center-surround contrast at multiple spatial scales to find the prominent region....

    [...]

  • ...We compare our method with fifteen state-of-the-art saliency detection algorithms: the IT [16], MZ [20], LC [37], GB [14], SR [15], AC [1], FT [2], SER [31], CA [27], RC [8], CB [17], SVO [7], SF [25], LR [29] and GS [33] methods....

    [...]

  • ...MSRA: On the MSRA dataset, we compare the proposed method with eleven state-of-the-art methods which are LR [29], CB [17], SVO [7], RC [8], CA [27], SER [31], FT [2], GB [14], SR [15], LC [37] and IT [16]....

    [...]

Proceedings Articleβ€’DOIβ€’
07 Jul 2001
TL;DR: In this paper, the authors present a database containing ground truth segmentations produced by humans for images of a wide variety of natural scenes, and define an error measure which quantifies the consistency between segmentations of differing granularities.
Abstract: This paper presents a database containing 'ground truth' segmentations produced by humans for images of a wide variety of natural scenes. We define an error measure which quantifies the consistency between segmentations of differing granularities and find that different human segmentations of the same image are highly consistent. Use of this dataset is demonstrated in two applications: (1) evaluating the performance of segmentation algorithms and (2) measuring probability distributions associated with Gestalt grouping factors as well as statistics of image region properties.

6,505Β citations

Proceedings Articleβ€’DOIβ€’
20 Jun 2009
TL;DR: This paper introduces a method for salient region detection that outputs full resolution saliency maps with well-defined boundaries of salient objects that outperforms the five algorithms both on the ground-truth evaluation and on the segmentation task by achieving both higher precision and better recall.
Abstract: Detection of visually salient image regions is useful for applications like object segmentation, adaptive compression, and object recognition. In this paper, we introduce a method for salient region detection that outputs full resolution saliency maps with well-defined boundaries of salient objects. These boundaries are preserved by retaining substantially more frequency content from the original image than other existing techniques. Our method exploits features of color and luminance, is simple to implement, and is computationally efficient. We compare our algorithm to five state-of-the-art salient region detection methods with a frequency domain analysis, ground truth, and a salient object segmentation application. Our method outperforms the five algorithms both on the ground-truth evaluation and on the segmentation task by achieving both higher precision and better recall.

3,723Β citations


"Saliency Detection via Absorbing Ma..." refers background or methods in this paper

  • ...SED: On this single object and two object dataset, we compare the proposed method with eleven state-of-the-art methods which are LR [29], CB [17], SVO [7], RC [8], CA [27], SER [31], FT [2], GB [14], SR [15], LC [37] and IT [16]....

    [...]

  • ...While the CA [27], IT [16], FT [2], SR [15] and LC [37] methods also show the same imbalance....

    [...]

  • ...The second one is the ASD dataset, a subset of the MSRA dataset, which contains 1,000 images with accurate human-labelled ground truth provided by [2]....

    [...]

  • ...Second, we compute the precision, recall and F-measure with an adaptive threshold proposed in [2], which is defined as twice the mean saliency of the image....

    [...]

  • ...We compare our method with fifteen state-of-the-art saliency detection algorithms: the IT [16], MZ [20], LC [37], GB [14], SR [15], AC [1], FT [2], SER [31], CA [27], RC [8], CB [17], SVO [7], SF [25], LR [29] and GS [33] methods....

    [...]

Proceedings Articleβ€’DOIβ€’
20 Jun 2011
TL;DR: This work proposes a regional contrast based saliency extraction algorithm, which simultaneously evaluates global contrast differences and spatial coherence, and consistently outperformed existing saliency detection methods.
Abstract: Automatic estimation of salient object regions across images, without any prior assumption or knowledge of the contents of the corresponding scenes, enhances many computer vision and computer graphics applications. We introduce a regional contrast based salient object detection algorithm, which simultaneously evaluates global contrast differences and spatial weighted coherence scores. The proposed algorithm is simple, efficient, naturally multi-scale, and produces full-resolution, high-quality saliency maps. These saliency maps are further used to initialize a novel iterative version of GrabCut, namely SaliencyCut, for high quality unsupervised salient object segmentation. We extensively evaluated our algorithm using traditional salient object detection datasets, as well as a more challenging Internet image dataset. Our experimental results demonstrate that our algorithm consistently outperforms 15 existing salient object detection and segmentation methods, yielding higher precision and better recall rates. We also show that our algorithm can be used to efficiently extract salient object masks from Internet images, enabling effective sketch-based image retrieval (SBIR) via simple shape comparisons. Despite such noisy internet images, where the saliency regions are ambiguous, our saliency guided image retrieval achieves a superior retrieval rate compared with state-of-the-art SBIR methods, and additionally provides important target object region information.

3,653Β citations


"Saliency Detection via Absorbing Ma..." refers background or methods in this paper

  • ...SED: On this single object and two object dataset, we compare the proposed method with eleven state-of-the-art methods which are LR [29], CB [17], SVO [7], RC [8], CA [27], SER [31], FT [2], GB [14], SR [15], LC [37] and IT [16]....

    [...]

  • ...Region contrast based methods [8, 17] first segment the image and then compute the global contrast of those segments as saliency, which can usually highlight the entire object....

    [...]

  • ...We compare our method with fifteen state-of-the-art saliency detection algorithms: the IT [16], MZ [20], LC [37], GB [14], SR [15], AC [1], FT [2], SER [31], CA [27], RC [8], CB [17], SVO [7], SF [25], LR [29] and GS [33] methods....

    [...]

  • ...MSRA: On the MSRA dataset, we compare the proposed method with eleven state-of-the-art methods which are LR [29], CB [17], SVO [7], RC [8], CA [27], SER [31], FT [2], GB [14], SR [15], LC [37] and IT [16]....

    [...]

  • ...The two evaluation criteria consistently show the proposed method outperforms all the other methods, where the CB [17], SVO [7], RC [8] and CA [27] are top-performance methods for saliency detection in a recent benchmark study [5]....

    [...]

Frequently Asked Questions (15)
Q1. What have the authors contributed in "Saliency detection via absorbing markov chain" ?

In this paper, the authors formulate saliency detection via absorbing Markov chain on an image graph model.Β The authors jointly consider the appearance divergence and spatial distribution of salient objects and the background.Β The authors further exploit the equilibrium distribution in an ergodic Markov chain to reduce the absorbed time in the long-range smooth background regions.Β 

Since the boundary nodes usually contain the global characteristics of the image background, by using them as absorbing nodes, the absorbed time of each transient node can reflect its overall similarity with the background, which helps to distinguish salient nodes from background nodes.Β 

Due to scrambled backgrounds and heterogeneous foregrounds most images have, and the lack of top-down prior knowledge, the overall performance of the existing bottom-up saliency detection methods is low on this dataset.Β 

as the absorbed time is the expected time to all the absorbing nodes, it covers the effect of all the boundary nodes, which can alleviate the influence of particular regions and encourage the similar nodes in a local context to have the similar saliency, thereby overcoming the defects of using the equilibrium distribution [9, 14, 11, 31].Β 

As salient objects seldom occupy all four image boundaries [33, 5] and the background regions often have appearance connectivity with image boundaries, when the authors use the boundary nodes as absorbing nodes, the random walk starting in background nodes can easily reach the absorbing nodes.Β 

In addition, equilibrium distribution based saliency models only highlight the boundaries of salient object while object interior still has low saliency value.Β 

The sparse connectivity of the graph results that the background nodes near the image center have longer absorbed time than the similar nodes near the image boundaries.Β 

the authors bisegment the saliency map using every threshold in the range [0 : 0.05 : 1], and compute precision and recall at each value of the threshold to plot the precision-recall curve.Β 

Because the authors compute the full resolution saliency map, some virtual nodes are added to the graph as absorbing states, which is detailed in the next section.Β 

By the update processing, the saliency of the long-range homogeneous regions near the image center can be suppressed as Figure 3 illustrates.Β 

The authors further explore the effect of the equilibrium probability in saliency detection, and exploit it to regulate the absorbed time, thereby suppressing the saliency of this kind of regions.Β 

Given a set of states 𝑆 = {𝑠1, 𝑠2, . . . , π‘ π‘š}, a Markov chain can be completely specified by the π‘š Γ—π‘š transition matrix P, in which 𝑝𝑖𝑗 is the probability of moving from state 𝑠𝑖 to state 𝑠𝑗 .Β 

In this work, the authors use the normalized recurrent time of an ergodic Markov chain, of which the transition matrix is the row normalized Q, as the weight u.Β 

To alleviate this problem, the authors update the saliency map by using a weighted absorbed time yw, which can be denoted as:yw = NΓ— u, (10) where u is the weighting column vector.Β 

Given an input image represented as a Markov chain and some background absorbing states, the saliency of each transient state is defined as the expected number of timesbefore being absorbed into all absorbing nodes by Eq 2.Β