scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Normalized cuts and image segmentation

17 Jun 1997-pp 731-737
TL;DR: This work treats image segmentation as a graph partitioning problem and proposes a novel global criterion, the normalized cut, for segmenting the graph, which measures both the total dissimilarity between the different groups as well as the total similarity within the groups.
Abstract: We propose a novel approach for solving the perceptual grouping problem in vision. Rather than focusing on local features and their consistencies in the image data, our approach aims at extracting the global impression of an image. We treat image segmentation as a graph partitioning problem and propose a novel global criterion, the normalized cut, for segmenting the graph. The normalized cut criterion measures both the total dissimilarity between the different groups as well as the total similarity within the groups. We show that an efficient computational technique based on a generalized eigenvalue problem can be used to optimize this criterion. We have applied this approach to segmenting static images and found results very encouraging.

Summary (4 min read)

1 INTRODUCTION

  • The authors present a general framework for this problem, focusing specifically on the case of image segmentation.
  • There are two aspects to be considered here.
  • The authors propose a new graph-theoretic criterion for measuring the goodness of an image partitionÐthe normalized cut.

2 GROUPING AS GRAPH PARTITIONING

  • The degree of dissimilarity between these two pieces can be computed as total weight of the edges that have been removed.
  • Finding the minimum cut of a graph is a well-studied problem and there exist efficient algorithms for solving it.
  • Wu and Leahy [25] proposed a clustering method based on this minimum cut criterion.
  • In fact, any cut that partitions out individual nodes on the right half will have smaller cut value than the cut that partitions the nodes into the left and right halves.
  • In the same spirit, the authors can define a measure for total normalized association within groups for a given partition: Nassoc A;B assoc A;A assoc A; V assoc B;B assoc B; V ; 3 where assoc A;A and assoc B;B are total weights of edges connecting nodes within A and B, respectively.

2.1 Computing the Optimal Partition

  • Hence, z0 is, in fact, the smallest eigenvector of (7) and all eigenvectors of (7) are perpendicular to each other.
  • Now, recall a simple fact about the Rayleigh quotient [11]: Let A be a real symmetric matrix.
  • Thus, the second smallest eigenvector of the generalized eigensystem (6) is the real valued solution to their normalized cut problem.
  • Roughly speaking, this forces the indicator vector y to take similar values for nodes i and j that are tightly coupled (large wij).

3 THE GROUPING ALGORITHM

  • The authors grouping algorithm consists of the following steps: 1. Given an image or image sequence, set up a weighted graph G V;E and set the weight on the edge connecting two nodes to be a measure of the similarity between the two nodes.
  • Dx for eigenvectors with the smallest eigenvalues.
  • Use the eigenvector with the second smallest eigenvalue to bipartition the graph.
  • Decide if the current partition should be subdivided and recursively repartition the segmented parts if necessary.
  • The grouping algorithm, as well as its computational complexity, can be best illustrated by using the following example.

3.1 Example: Brightness Images

  • The weight on that edge should reflect the likelihood that the two pixels belong to one object.
  • In the ideal case, the eigenvector should only take on two discrete values and the signs of the values can tell us exactly how to partition the graph.
  • One can take 0 or the median value as the splitting point or one can search for the splitting point such that the resulting partition has the best Ncut A;B value.
  • After the graph is broken into two pieces, the authors can recursively run their algorithm on the two partitioned parts.
  • In their experiments, the authors find that simple thresholding on the ratio described above can be used to exclude unstable eigenvectors.

3.2 Recursive Two-Way Ncut

  • Dx for eigenvectors with the smallest eigenvalues.
  • Use the eigenvector with the second smallest eigenvalue to bipartition the graph by finding the splitting point such that Ncut is minimized.
  • Decide if the current partition should be subdivided by checking the stability of the cut, and make sure Ncut is below the prespecified value.
  • Recursively repartition the segmented parts if necessary.
  • The number of groups segmented by this method is controlled directly by the maximum allowed Ncut.

3.3 Simultanous K-Way Cut with Multiple Eigenvectors

  • One drawback of the recursive 2-way cut is its treatment of the oscillatory eigenvectors.
  • Also, the approach is computationally wasteful; only the second eigenvector is used, whereas the next few small eigenvectors also contain useful partitioning information.
  • Ðthey exacerbate the oversegmentation, but that will be dealt with subsequently.
  • In the second step, one can proceed in the following two ways: 1. Greedy pruning: Iteratively merge two segments at a time until only k segments are left.
  • The results presented in this paper are all based on the recursive 2-way partitioning algorithm outlined in Section 3.2.

4 EXPERIMENTS

  • The authors have applied their grouping algorithm to image segmentation based on brightness, color, texture, or motion information.
  • Note that the weight wij 0 for any pair of nodes i and j that are more than r pixels apart.
  • Fig. 5 shows a point set and the segmentation result.
  • The normalized cut criterion is indeed able to partition the point set in a desirable way.
  • In the motion case, the authors will treat the image sequence as a spatiotemporal data set.

4.1 Computation Time

  • On the 100 120 test images shown here, the normalized cut algorithm takes about 2 minutes on Intel Pentium 200MHz machines.
  • A multiresolution implementation can be used to reduce this running time further on larger images.
  • In their current experiments, with this implementation, the running time on a 300 400 image can be reduced to about 20 seconds on Intel Pentium 300MHz machines.
  • In their current implementation, the sparse eigenvalue decomposition is computed using the LASO2 numerical package developed by Scott.

4.2 Choice of Graph Edge Weight

  • The exponential weighting function is chosen here for its relative simplicity, as well as neutrality, since the focus of this paper is on developing a general segmentation procedure, given a feature similarity measure.
  • The authors found this choice of weight function is quite adequate for typical image and feature spaces.
  • Section 6.1 shows the effect of using different weighting functions and parameters on the output of the normalized cut algorithm.
  • The general problem of defining feature similarity incorporating a variety of cues is not a trivial one.
  • Some of these issues are addressed in [15].

5 RELATIONSHIP TO SPECTRAL GRAPH THEORY

  • The computational approach that the authors have developed for image segmentation is based on concepts from spectral graph theory.
  • This is a rich area of mathematics and the idea of using eigenvectors of the Laplacian for finding partitions of graphs can be traced back to Cheeger [4], Donath and Hoffman [7], and Fiedler [9].
  • Chung points out that the eigenvalues of this ªnormalizedº.
  • One cannot simultaneously minimize the disasso- ciation across the partitions while maximizing the association within the groups.
  • There are also other explanations why the normalized cut has better behavior from graph theoretical point of view, as pointed out by Chung [5].

5.1 A Physical Interpretation

  • As one might expect, a physical analogy can be set up for the generalized eigenvalue system (6) that the authors used to approximate the solution of normalized cut.
  • The authors can construct a spring-mass system from the weighted graph by taking graph nodes as physical nodes and graph edges as springs connecting each pair of nodes.
  • Nodes that have stronger spring connections among them will likely oscillate together.
  • Eventually, the group will ªpopº off from the image plane.
  • In fact, it can be shown that the fundamental modes of oscillation of this spring mass system are exactly the generalized eigenvectors of (6).

6 RELATIONSHIP TO OTHER GRAPH THEORETIC APPROACHES TO IMAGE SEGMENTATION

  • In the computer vision community, there has been some been previous work on image segmentation formulated as a graph partition problem.
  • Wu and Leahy [25] use the minimum cut criterion for their segmentation.
  • Cox et al. use an efficient discrete algorithm to solve their optimization problem assuming the graph is planar.
  • Sarkar and Boyer [19] use the eigenvector with the largest eigenvalue of the system Using a similar derivation as in Section 2.1, the authors can see that the first largest eigenvector of their system approximates minA V assoc A;A jAj and the second largest eigenvector approximates minA V ;B V assoc A;A jAj assoc B;B jBj .
  • As the authors will see later in the section, this situation can happen quite often in practice.

7 CONCLUSION

  • The authors developed a grouping algorithm based on the view that perceptual grouping should be a process that aims to extract global impressions of a scene and provides a hierarchical description of it.
  • By treating the grouping problem as a graph partitioning problem, the authors proposed the normalized cut criteria for segmenting the graph.
  • In finding an efficient algorithm for computing the minimum normalized cut, the authors showed that a generalized eigenvalue system provides a real valued solution to their problem.
  • A computational method based on this idea has been developed and applied to segmentation of brightness, color, and texture images.
  • For all other partitions, the Ncut value will be bounded below by 4an cÿ1=c .

Did you find this useful? Give us your feedback

Figures (18)

Content maybe subject to copyright    Report

Normalized Cuts and Image Segmentation
Jianbo Shi and Jitendra Malik, Member, IEEE
AbstractÐWe propose a novel approach for solving the perceptual grouping problem in vision. Rather than focusing on local features
and their consistencies in the image data, our approach aims at extracting the global impression of an image. We treat image
segmentation as a graph partitioning problem and propose a novel global criterion, the normalized cut, for segmenting the graph. The
normalized cut criterion measures both the total dissimilarity between the different groups as well as the total similarity within the
groups. We show that an efficient computational technique based on a generalized eigenvalue problem can be used to optimize this
criterion. We have applied this approach to segmenting static images, as well as motion sequences, and found the results to be very
encouraging.
Index TermsÐGrouping, image segmentation, graph partitioning.
æ
1INTRODUCTION
N
EARLY 75 years ago, Wertheimer [24] pointed out the
importance of perceptual grouping and organization
in vision and listed several key factors, such as similarity,
proximity, and good continuation, which lead to visual
grouping. However, even to this day, many of the
computational issues of perceptual grouping have re-
mained unresolved. In this paper, we present a general
framework for this problem, focusing specifically on the
case of image segmentation.
Since there are many possible partitions of the domain I
of an image into subsets, how do we pick the ªrightº one?
There are two aspects to be considered here. The first is that
there may not be a single correct answer. A Bayesian view is
appropriateÐthere are several possible interpretations in
the context of prior world knowledge. The difficulty, of
course, is in specifying the prior world knowledge. Some of
it is low level, such as coherence of brightness, color,
texture, or motion, but equally important is mid- or high-
level knowledge about symmetries of objects or object
models. The second aspect is that the partitioning is
inherently hierarchical. Therefore, it is more appropriate
to think of returning a tree structure corresponding to a
hierarchical partition instead of a single ªflatº partition.
This suggests that image segmentation based on low-
level cues cannot and should not aim to produce a complete
final ªcorrectº segmentation. The objective should instead
be to use the low-level coherence of brightness, color, texture, or
motion attributes to sequentially come up with hierarchical
partitions. Mid- and high-level knowledge can be used to
either confirm these groups or select some for further
attention. This attention could result in further repartition-
ing or grouping. The key point is that image partitioning is
to be done from the big picture downward, rather like a
painter first marking out the major areas and then filling in
the details.
Prior literature on the related problems of clustering,
grouping and image segmentation is huge. The clustering
community [12] has offered us agglomerative and divisive
algorithms; in image segmentation, we have region-based
merge and split algorithms. The hierarchical divisive
approach that we advocate produces a tree, the dendrogram.
While most of these ideas go back to the 1970s (and earlier),
the 1980s brought in the use of Markov Random Fields [10]
and variational formulations [17], [2], [14]. The MRF and
variational formulations also exposed two basic questions:
1. What is the criterion that one wants to optimize?
2. Is there an efficient algorithm for carrying out the
optimization?
Many an attractive criterion has been doomed by the
inability to find an effective algorithm to find its mini-
mumÐgreedy or gradient descent type approaches fail to
find global optima for these high-dimensional, nonlinear
problems.
Our approach is most related to the graph theoretic
formulation of grouping. The set of points in an arbitrary
feature space are represented as a weighted undirected
graph GG VV;EE, where the nodes of the graph are the
points in the feature space, and an edge is formed between
every pair of nodes. The weight on each edge, wii; jj,isa
function of the similarity between nodes ii and jj.
In grouping, we seek to partition the set of vertices into
disjoint sets V
1
; V
2
; ...; V
m
, where by some measure the
similarity among the vertices in a set V
i
is high and, across
different sets V
i
, V
j
is low.
To partition a graph, we need to also ask the following
questions:
1. What is the precise criterion for a good partition?
2. How can such a partition be computed efficiently?
In the image segmentation and data clustering commu-
nity, there has been much previous work using variations of
the minimal spanning tree or limited neighborhood set
approaches. Although those use efficient computational
888 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 22, NO. 8, AUGUST 2000
. J. Shi is with the Robotics Institute, Carnegie Mellon University, 5000
Forbes Ave., Pittsburgh, PA 15213. E-mail: jshi@cs.cmu.edu
. J. Malik is with the Electrical Engineering and Computer Science Division,
University of California at Berkeley, Berkeley, CA 94720.
E-mail: malik@cs.berkeley.edu.
Manuscript received 4 Feb. 1998; accepted 16 Nov. 1999.
Recommended for acceptance by M. Shah.
For information on obtaining reprints of this article, please send e-mail to:
tpami@computer.org, and reference IEEECS Log Number 107618.
0162-8828/00/$10.00 ß 2000 IEEE

methods, the segmentation criteria used in most of them are
based on local properties of the graph. Because perceptual
grouping is about extracting the global impressions of a
scene, as we saw earlier, this partitioning criterion often
falls short of this main goal.
In this paper, we propose a new graph-theoretic criterion
for measuring the goodness of an image partitionÐthe
normalized cut. We introduce and justify this criterion in
Section 2. The minimization of this criterion can be
formulated as a generalized eigenvalue problem. The
eigenvectors can be used to construct good partitions of
the image and the process can be continued recursively as
desired (Section 2.1). Section 3 gives a detailed explanation
of the steps of our grouping algorithm. In Section 4, we
show experimental results. The formulation and minimiza-
tion of the normalized cut criterion draws on a body of
results from the field of spectral graph theory (Section 5).
Relationship to work in computer vision is discussed in
Section 6 and comparison with related eigenvector based
segmentation methods is represented in Section 6.1. We
conclude in Section 7.
The main results in this paper were first presented in [20].
2GROUPING AS GRAPH PARTITIONING
A graph G V; E can be partitioned into two disjoint
sets, A; B, A [ B V , A \ B ;, by simply removing edges
connecting the two parts. The degree of dissimilarity
between these two pieces can be computed as total weight
of the edges that have been removed. In graph theoretic
language, it is called the cut:
cutA; B
X
u2A;v2B
wu; v: 1
The optimal bipartitioning of a graph is the one that
minimizes this cut value. Although there are an exponential
number of such partitions, finding the minimum cut of a
graph is a well-studied problem and there exist efficient
algorithms for solving it.
Wu and Leahy [25] proposed a clustering method based
on this minimum cut criterion. In particular, they seek to
partition a graph into k-subgraphs such that the maximum
cut across the subgroups is minimized. This problem can be
efficiently solved by recursively finding the minimum cuts
that bisect the existing segments. As shown in Wu and
Leahy's work, this globally optimal criterion can be used to
produce good segmentation on some of the images.
However, as Wu and Leahy also noticed in their work,
the minimum cut criteria favors cutting small sets of
isolated nodes in the graph. This is not surprising since
the cut defined in (1) increases with the number of edges
going across the two partitioned parts. Fig. 1 illustrates one
such case. Assuming the edge weights are inversely
proportional to the distance between the two nodes, we
see the cut that partitions out node n
1
or n
2
will have a very
small value. In fact, any cut that partitions out individual
nodes on the right half will have smaller cut value than the
cut that partitions the nodes into the left and right halves.
To avoid this unnatural bias for partitioning out small
sets of points, we propose a new measure of disassociation
between two groups. Instead of looking at the value of total
edge weight connecting the two partitions, our measure
computes the cut cost as a fraction of the total edge
connections to all the nodes in the graph. We call this
disassociation measure the normalized cut (Ncut):
NcutA; B
cutA; B
assocA; V
cutA; B
assocB; V
; 2
where assocA; V 
P
u2A;t2V
wu; t is the total connection
from nodes in A to all nodes in the graph and assocB; V is
similarly defined. With this definition of the disassociation
between the groups, the cut that partitions out small
isolated points will no longer have small Ncut value, since
the cut value will almost certainly be a large percentage of
the total connection from that small set to all other nodes. In
the case illustrated in Fig. 1, we see that the cut
1
value
across node n
1
will be 100 percent of the total connection
from that node.
In the same spirit, we can define a measure for total
normalized association within groups for a given partition:
NassocA; B
assocA; A
assocA; V
assocB; B
assocB; V
; 3
where assocA; A and assocB; B are total weights of
edges connecting nodes within A and B, respectively. We
see again this is an unbiased measure, which reflects how
tightly on average nodes within the group are connected to
each other.
Another important property of this definition of associa-
tion and disassociation of a partition is that they are
naturally related:
NcutA; B
cutA; B
assocA; V
cutA; B
assocB; V
assocA; V ÿassocA; A
assocA; V
assocB; V ÿassocB; B
assocB; V
2 ÿ
assocA; A
assocA; V
assocB; B
assocB; V

2 ÿ NassocA; B:
Hence, the two partition criteria that we seek in our
grouping algorithm, minimizing the disassociation between
the groups and maximizing the association within the
SHI AND MALIK: NORMALIZED CUTS AND IMAGE SEGMENTATION 889
Fig. 1. A case where minimum cut gives a bad partition.

groups, are in fact identical and can be satisfied simulta-
neously. In our algorithm, we will use this normalized cut
as the partition criterion.
Unfortunately, minimizing normalized cut exactly is NP-
complete, even for the special case of graphs on grids. The
proof, due to Papadimitriou, can be found in Appendix A.
However, we will show that, when we embed the normal-
ized cut problem in the real value domain, an approximate
discrete solution can be found efficiently.
2.1 Computing the Optimal Partition
Given a partition of nodes of a graph, V, into two sets A and
B, let xx be an N jVV j dimensional indicator vector, x
i
1 if
node i is in A and ÿ1, otherwise. Let ddi
P
j
wi; j be the
total connection from node ii to all other nodes. With the
definitions xx and dd, we can rewrite NcutA; B as:
NcutA; B
cutA; B
assocA; V
cutB; A
assocB; V
P
xx
i
>0;xx
j
<0
ÿw
ij
xx
i
xx
j
P
xx
i
>0
dd
i
P
xx
i
<0;xx
j
>0
ÿw
ij
xx
i
xx
j
P
xx
i
<0
dd
i
:
Let D be an N N diagonal matrix with dd on its diagonal,
W be an N N symmetrical matrix with Wi; jw
ij
,
k
P
x
i
>0
dd
i
P
i
dd
i
;
and 1 be an N 1 vector of all ones. Using the fact
11xx
2
and
11ÿxx
2
are indicator vectors for x
i
> 0 and x
i
< 0, respectively,
we can rewrite 4Ncutxx as:
1 xx
T
D ÿ W1 xx
k1
T
D1
1 ÿ xx
T
D ÿ W1 ÿ xx
1 ÿ k1
T
D1
xx
T
D ÿ Wxx 1
T
D ÿ W1
k1 ÿ k1
T
D1
21 ÿ 2k1
T
D ÿ Wxx
k1 ÿ k1
T
D1
:
Let
xxxx
T
D ÿ Wxx;
xx1
T
D ÿ Wxx;
1
T
D ÿ W1;
and
M 1
T
D1;
we can then further expand the above equation as:
xx21 ÿ 2kxx
k1 ÿ kM
xx21 ÿ 2kxx
k1 ÿ kM
ÿ
2xx
M
2xx
M
2
M
:
Dropping the last constant term, which in this case equals 0,
we get
1 ÿ 2k 2k
2
xx21 ÿ 2kxx
k1 ÿ kM
2xx
M
1ÿ2k2k
2
1ÿk
2
xx
21ÿ2k
1ÿk
2
xx
k
1ÿk
M
2xx
M
:
Letting b
k
1ÿk
, and since 0, it becomes
1 b
2
xx21 ÿ b
2
xx
bM
2bxx
bM
1 b
2
xx
bM
21 ÿ b
2
xx
bM
2bxx
bM
ÿ
2b
bM
1 b
2
xx
T
D ÿ Wxx 1
T
D ÿ W1
b1
T
D1
21 ÿ b
2
1
T
D ÿ Wxx
b1
T
D1
2bxx
T
D ÿ Wxx
b1
T
D1
ÿ
2b1
T
D ÿ W1
b1
T
D1
1 xx
T
D ÿ W1 xx
b1
T
D1
b
2
1 ÿ xx
T
D ÿ W1 ÿ xx
b1
T
D1
ÿ
2b1 ÿ xx
T
D ÿ W1 xx
b1
T
D1
1 xxÿb1 ÿ xx
T
D ÿ W1 xxÿb1 ÿ xx
b1
T
D1
:
Setting yy 1 xxÿb1 ÿ xx, it is easy to see that
yy
T
D1
X
x
i
>0
dd
i
ÿ b
X
x
i
<0
dd
i
0 4
since b
k
1ÿk
P
x
i
>0
dd
i
P
x
i
<0
dd
i
and
yy
T
Dyy
X
x
i
>0
dd
i
b
2
X
x
i
<0
dd
i
b
X
x
i
<0
dd
i
b
2
X
x
i
<0
dd
i
b
X
x
i
<0
dd
i
b
X
x
i
<0
dd
i
b1
T
D1:
Putting everything together we have,
min
xx
Ncutxxmin
yy
yy
T
D ÿ Wyy
yy
T
Dyy
; 5
with the condition yyi2f1; ÿbg and yy
T
D1 0.
Note that the above expression is the Rayleigh quotient
[11]. If yy is relaxed to take on real values, we can minimize
(5) by solving the generalized eigenvalue system,
D ÿ Wyy Dyy: 6
However, we have two constraints on yy which come from
the condition on the corresponding indicator vector xx. First,
consider the constraint yy
T
D1 0. We can show this
constraint on yy is automatically satisfied by the solution of
the generalized eigensystem. We will do so by first
890 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 22, NO. 8, AUGUST 2000

transforming (6) into a standard eigensystem and showing
the corresponding condition is satisfied there. Rewrite (6) as
D
ÿ
1
2
D ÿ WD
ÿ
1
2
zz zz; 7
where zz D
1
2
yy. One can easily verify that zz
0
D
1
2
1 is an
eigenvector of (7) with eigenvalue of 0. Furthermore,
D
ÿ
1
2
D ÿ WD
ÿ
1
2
is symmetric positive semidefinite since
D ÿ W, also called the Laplacian matrix, is known to be
positive semidefinite [18]. Hence, zz
0
is, in fact, the smallest
eigenvector of (7) and all eigenvectors of (7) are perpendi-
cular to each other. In particular, zz
1
, the second smallest
eigenvector, is perpendicular to zz
0
. Translating this state-
ment back into the general eigensystem (6), we have:
1) yy
0
1 is the smallest eigenvector with eigenvalue of 0
and 2) 0 zz
T
1
zz
0
yy
T
1
D1, where yy
1
is the second smallest
eigenvector of (6).
Now, recall a simple fact about the Rayleigh quotient [11]:
Let A be a real symmetric matrix. Under the constraint that
xx is orthogonal to the j-1 smallest eigenvectors xx
1
; ...;xx
jÿ1
,
the quotient
xx
T
Axx
xx
T
xx
is minimized by the next smallest
eigenvector xx
j
and its minimum value is the corresponding
eigenvalue
j
.
As a result, we obtain:
zz
1
arg:min
zz
T
zz
0
0
zz
T
D
ÿ
1
2
D ÿ WD
ÿ
1
2
zz
zz
T
zz
8
and, consequently,
yy
1
arg:min
yy
T
D10
yy
T
D ÿ Wyy
yy
T
Dyy
: 9
Thus, the second smallest eigenvector of the generalized
eigensystem (6) is the real valued solution to our normal-
ized cut problem. The only reason that it is not necessarily
the solution to our original problem is that the second
constraint on yy that yyi takes on two discrete values is not
automatically satisfied. In fact, relaxing this constraint is
what makes this optimization problem tractable in the first
place. We will show in Section 3 how this real valued
solution can be transformed into a discrete form.
A similar argument can also be made to show that the
eigenvector with the third smallest eigenvalue is the real
valued solution that optimally subpartitions the first two
parts. In fact, this line of argument can be extended to show
that one can subdivide the existing graphs, each time using
the eigenvector with the next smallest eigenvalue. How-
ever, in practice, because the approximation error from the
real valued solution to the discrete valued solution
accumulates with every eigenvector taken and all eigen-
vectors have to satisfy a global mutual orthogonality
constraint, solutions based on higher eigenvectors become
unreliable. It is best to restart solving the partitioning
problem on each subgraph individually.
It is interesting to note that, while the second smallest
eigenvector yy of (6) only approximates the optimal normal-
ized cut solution, it exactly minimizes the following
problem:
inf
yy
T
D10
P
i
P
j
yyiÿyyj
2
w
ij
P
i
yyi
2
di
; 10
in real-valued domain, where diDi; i.Roughly
speaking, this forces the indicator vector yy to take similar
values for nodes i and j that are tightly coupled (large w
ij
).
In summary, we propose using the normalized cut
criterion for graph partitioning and we have shown how
this criterion can be computed efficiently by solving a
generalized eigenvalue problem.
3THE GROUPING ALGORITHM
Our grouping algorithm consists of the following steps:
1. Given an image or image sequence, set up a
weighted graph G V; E and set the weight on
the edge connecting two nodes to be a measure of
the similarity between the two nodes.
2. Solve D ÿ Wxx Dxx for eigenvectors with the
smallest eigenvalues.
3. Use the eigenvector with the second smallest
eigenvalue to bipartition the graph.
4. Decide if the current partition should be subdivided
and recursively repartition the segmented parts if
necessary.
The grouping algorithm, as well as its computational
complexity, can be best illustrated by using the following
example.
3.1 Example: Brightness Images
Fig. 2 shows an image that we would like to segment. The
steps are:
1. Construct a weighted graph G V; E by taking
each pixel as a node and connecting each pair of
pixels by an edge. The weight on that edge should
reflect the likelihood that the two pixels belong to
one object. Using just the brightness value of the
pixels and their spatial location, we can define the
graph edge weight connecting the two nodes i and j
as:
w
ij
e
ÿ
kFF
i
ÿFF
j
k
2
2
2
I
e
ÿ
kXX
i
ÿXX
j
k
2
2
2
X
if kXXiÿXXjk
2
<r
0 otherwise:
8
<
:
11
2. Solve for the eigenvectors with the smallest eigen-
values of the system
D ÿ Wyy Dyy: 12
As we saw above, the generalized eigensystem in
(12) can be transformed into a standard eigenvalue
problem of
D
ÿ
1
2
D ÿ WD
ÿ
1
2
xx xx: 13
Solving a standard eigenvalue problem for all
eigenvectors takes On
3
operations, where n is the
number of nodes in the graph. This becomes
impractical for image segmentation applications
where n is the number of pixels in an image.
SHI AND MALIK: NORMALIZED CUTS AND IMAGE SEGMENTATION 891

Fortunately, our graph partitioning has the follow-
ing properties: 1) The graphs are often only locally
connected and the resulting eigensystems are very
sparse, 2) only the top few eigenvectors are needed
for graph partitioning, and 3) the precision require-
ment for the eigenvectors is low, often only the right
sign bit is required. These special properties of our
problem can be fully exploited by an eigensolver
called the Lanczos method. The running time of a
Lanczos algorithm is OmnOmMn [11],
where m is the maximum number of matrix-vector
computations required and Mn is the cost of a
matrix-vector computation of Axx, where
A D
ÿ
1
2
D ÿ WD
ÿ
1
2
. Note that the sparsity struc-
ture of A is identical to that of the weight matrix W.
Since W is sparse, so is A and the matrix-vector
computation is only On.
To see why this is the case, we will look at the cost
of the inner product of one row of AA with a vector x.
Let y
i
AA
i
x
P
j
AA
ij
x
j
. For a fixed i, AA
ij
is only
nonzero if node j is in a spatial neighborhood of i.
Hence, there are only a fixed number of operations
required for each AA
i
x and the total cost of
computing AAx is On.
The constant factor is determined by the size of
the spatial neighborhood of a node. It turns out that
we can substantially cut down additional connec-
tions from each node to its neighbors by randomly
selecting the connections within the neighborhood
for the weighted graph. Empirically, we have found
that one can remove up to 90 percent of the total
connections with each of the neighborhoods when
the neighborhoods are large without affecting the
eigenvector solution to the system.
Putting everything together, each of the matrix-
vector computations cost On operations with a
small constant factor. The number m depends on
many factors [11]. In our experiments on image
segmentation, we observed that m is typically less
than On
1
2
.
Fig. 3 shows the smallest eigenvectors computed
for the generalized eigensystem with the weight
matrix defined above.
3. Once the eigenvectors are computed, we can parti-
tion the graph into two pieces using the second
smallest eigenvector. In the ideal case, the eigenvec-
tor should only take on two discrete values and the
signs of the values can tell us exactly how to
partition the graph. However, our eigenvectors can
take on continuous values and we need to choose a
splitting point to partition it into two parts. There are
many different ways of choosing such a splitting
point. One can take 0 or the median value as the
splitting point or one can search for the splitting
point such that the resulting partition has the best
NcutA; B value. We take the latter approach in our
work. Currently, the search is done by checking l
evenly spaced possible splitting points, and comput-
ing the best Ncut among them. In our experiments,
the values in the eigenvectors are usually well
separated and this method of choosing a splitting
point is very reliable even with a small l.
4. After the graph is broken into two pieces, we can
recursively run our algorithm on the two partitioned
parts. Or, equivalently, we could take advantage of
the special properties of the other top eigenvectors as
explained in the previous section to subdivide the
graph based on those eigenvectors. The recursion
stops once the Ncut value exceeds certain limit.
We also impose a stability criterion on the
partition. As we saw earlier, and as we see in the
eigenvectors with the seventh to ninth smallest
eigenvalues (Fig. 3g-h), sometimes an eigenvector
can take on the shape of a continuous function,
rather that the discrete indicator function that we
seek. From the view of segmentation, such an
eigenvector is attempting to subdivide an image
region where there is no sure way of breaking it. In
fact, if we are forced to partition the image based on
this eigenvector, we will see there are many different
splitting points which have similar Ncut values.
Hence, the partition will be highly uncertain and
unstable. In our current segmentation scheme, we
simply choose to ignore all those eigenvectors which
have smoothly varying eigenvector values. We
achieve this by imposing a stability criterion which
measures the degree of smoothness in the eigenvec-
tor values. The simplest measure is based on first
computing the histogram of the eigenvector values
and then computing the ratio between the minimum
and maximum values in the bins. When the
eigenvector values are continuously varying, the
values in the histogram bins will stay relatively the
same and the ratio will be relatively high. In our
experiments, we find that simple thresholding on the
ratio described above can be used to exclude
unstable eigenvectors. We have set that value to be
0.06 in all our experiments.
Fig. 4 shows the final segmentation for the image
shown in Fig. 2.
3.2 Recursive Two-Way Ncut
In summary, our grouping algorithm consists of the
following steps:
892 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 22, NO. 8, AUGUST 2000
Fig. 2. A gray level image of a baseball game.

Citations
More filters
Journal ArticleDOI
TL;DR: This work treats image segmentation as a graph partitioning problem and proposes a novel global criterion, the normalized cut, for segmenting the graph, which measures both the total dissimilarity between the different groups as well as the total similarity within the groups.
Abstract: We propose a novel approach for solving the perceptual grouping problem in vision. Rather than focusing on local features and their consistencies in the image data, our approach aims at extracting the global impression of an image. We treat image segmentation as a graph partitioning problem and propose a novel global criterion, the normalized cut, for segmenting the graph. The normalized cut criterion measures both the total dissimilarity between the different groups as well as the total similarity within the groups. We show that an efficient computational technique based on a generalized eigenvalue problem can be used to optimize this criterion. We applied this approach to segmenting static images, as well as motion sequences, and found the results to be very encouraging.

13,789 citations


Cites background or result from "Normalized cuts and image segmentat..."

  • ...The main results in this paper were first presented in [ 20 ]....

    [...]

  • ...Our work, originally presented in [ 20 ], represents the first application of spectral partitioning to computer vision or image analysis....

    [...]

Journal ArticleDOI
TL;DR: In this article, the authors present the most common spectral clustering algorithms, and derive those algorithms from scratch by several different approaches, and discuss the advantages and disadvantages of these algorithms.
Abstract: In recent years, spectral clustering has become one of the most popular modern clustering algorithms. It is simple to implement, can be solved efficiently by standard linear algebra software, and very often outperforms traditional clustering algorithms such as the k-means algorithm. On the first glance spectral clustering appears slightly mysterious, and it is not obvious to see why it works at all and what it really does. The goal of this tutorial is to give some intuition on those questions. We describe different graph Laplacians and their basic properties, present the most common spectral clustering algorithms, and derive those algorithms from scratch by several different approaches. Advantages and disadvantages of the different spectral clustering algorithms are discussed.

9,141 citations

Journal ArticleDOI
TL;DR: A thorough exposition of community structure, or clustering, is attempted, from the definition of the main elements of the problem, to the presentation of most methods developed, with a special focus on techniques designed by statistical physicists.
Abstract: The modern science of networks has brought significant advances to our understanding of complex systems. One of the most relevant features of graphs representing real systems is community structure, or clustering, i. e. the organization of vertices in clusters, with many edges joining vertices of the same cluster and comparatively few edges joining vertices of different clusters. Such clusters, or communities, can be considered as fairly independent compartments of a graph, playing a similar role like, e. g., the tissues or the organs in the human body. Detecting communities is of great importance in sociology, biology and computer science, disciplines where systems are often represented as graphs. This problem is very hard and not yet satisfactorily solved, despite the huge effort of a large interdisciplinary community of scientists working on it over the past few years. We will attempt a thorough exposition of the topic, from the definition of the main elements of the problem, to the presentation of most methods developed, with a special focus on techniques designed by statistical physicists, from the discussion of crucial issues like the significance of clustering and how methods should be tested and compared against each other, to the description of applications to real networks.

9,057 citations

Book
24 Aug 2012
TL;DR: This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach, and is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.
Abstract: Today's Web-enabled deluge of electronic data calls for automated methods of data analysis. Machine learning provides these, developing methods that can automatically detect patterns in data and then use the uncovered patterns to predict future data. This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach. The coverage combines breadth and depth, offering necessary background material on such topics as probability, optimization, and linear algebra as well as discussion of recent developments in the field, including conditional random fields, L1 regularization, and deep learning. The book is written in an informal, accessible style, complete with pseudo-code for the most important algorithms. All topics are copiously illustrated with color images and worked examples drawn from such application domains as biology, text processing, computer vision, and robotics. Rather than providing a cookbook of different heuristic methods, the book stresses a principled model-based approach, often using the language of graphical models to specify models in a concise and intuitive way. Almost all the models described have been implemented in a MATLAB software package--PMTK (probabilistic modeling toolkit)--that is freely available online. The book is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.

8,059 citations


Cites background or methods from "Normalized cuts and image segmentat..."

  • ..., (Shi and Malik 2000; Meila 2001), creates a stochastic matrix where each row sums to one: Lrw D−1L = I−D−1W (25....

    [...]

  • ...This suggests the following algorithm: find the smallest K eigenvectors of Lrw , create U, cluster the rows of U using K-means, then infer the partitioning of the original points (Shi and Malik 2000)....

    [...]

Journal ArticleDOI
TL;DR: A new superpixel algorithm is introduced, simple linear iterative clustering (SLIC), which adapts a k-means clustering approach to efficiently generate superpixels and is faster and more memory efficient, improves segmentation performance, and is straightforward to extend to supervoxel generation.
Abstract: Computer vision applications have come to rely increasingly on superpixels in recent years, but it is not always clear what constitutes a good superpixel algorithm. In an effort to understand the benefits and drawbacks of existing methods, we empirically compare five state-of-the-art superpixel algorithms for their ability to adhere to image boundaries, speed, memory efficiency, and their impact on segmentation performance. We then introduce a new superpixel algorithm, simple linear iterative clustering (SLIC), which adapts a k-means clustering approach to efficiently generate superpixels. Despite its simplicity, SLIC adheres to boundaries as well as or better than previous methods. At the same time, it is faster and more memory efficient, improves segmentation performance, and is straightforward to extend to supervoxel generation.

7,849 citations


Cites background from "Normalized cuts and image segmentat..."

  • ...Table 1 provides a qualitative and quanti- tative summary of the reviewed methods, including their relative performance....

    [...]

  • ...Index Terms—Superpixels, segmentation, clustering, k-means Ç...

    [...]

References
More filters
Book
01 Jan 1983

34,729 citations

01 Jan 1967
TL;DR: The k-means algorithm as mentioned in this paper partitions an N-dimensional population into k sets on the basis of a sample, which is a generalization of the ordinary sample mean, and it is shown to give partitions which are reasonably efficient in the sense of within-class variance.
Abstract: The main purpose of this paper is to describe a process for partitioning an N-dimensional population into k sets on the basis of a sample. The process, which is called 'k-means,' appears to give partitions which are reasonably efficient in the sense of within-class variance. That is, if p is the probability mass function for the population, S = {S1, S2, * *, Sk} is a partition of EN, and ui, i = 1, 2, * , k, is the conditional mean of p over the set Si, then W2(S) = ff=ISi f z u42 dp(z) tends to be low for the partitions S generated by the method. We say 'tends to be low,' primarily because of intuitive considerations, corroborated to some extent by mathematical analysis and practical computational experience. Also, the k-means procedure is easily programmed and is computationally economical, so that it is feasible to process very large samples on a digital computer. Possible applications include methods for similarity grouping, nonlinear prediction, approximating multivariate distributions, and nonparametric tests for independence among several variables. In addition to suggesting practical classification methods, the study of k-means has proved to be theoretically interesting. The k-means concept represents a generalization of the ordinary sample mean, and one is naturally led to study the pertinent asymptotic behavior, the object being to establish some sort of law of large numbers for the k-means. This problem is sufficiently interesting, in fact, for us to devote a good portion of this paper to it. The k-means are defined in section 2.1, and the main results which have been obtained on the asymptotic behavior are given there. The rest of section 2 is devoted to the proofs of these results. Section 3 describes several specific possible applications, and reports some preliminary results from computer experiments conducted to explore the possibilities inherent in the k-means idea. The extension to general metric spaces is indicated briefly in section 4. The original point of departure for the work described here was a series of problems in optimal classification (MacQueen [9]) which represented special

24,320 citations


"Normalized cuts and image segmentat..." refers background or methods in this paper

  • ...We constructed the graph G=(V,E) by taking each pixel as a node and defining the edge weight function as following: 2 2 2 ( , ) , i j i j I X F F X X i jW i j e e X X r σ σ − − − − = ∗ − 1 ( ), intensity ( ) [ , sin( ), cos( )]( ), color [ * , ..., * ]( ), texturen I i F i v v s h v s h i I f I f…...

    [...]

  • ...Computational complexity issues remain unsolved, even though the method performs well for a preprocessing....

    [...]

Journal ArticleDOI
TL;DR: The analogy between images and statistical mechanics systems is made and the analogous operation under the posterior distribution yields the maximum a posteriori (MAP) estimate of the image given the degraded observations, creating a highly parallel ``relaxation'' algorithm for MAP estimation.
Abstract: We make an analogy between images and statistical mechanics systems. Pixel gray levels and the presence and orientation of edges are viewed as states of atoms or molecules in a lattice-like physical system. The assignment of an energy function in the physical system determines its Gibbs distribution. Because of the Gibbs distribution, Markov random field (MRF) equivalence, this assignment also determines an MRF image model. The energy function is a more convenient and natural mechanism for embodying picture attributes than are the local characteristics of the MRF. For a range of degradation mechanisms, including blurring, nonlinear deformations, and multiplicative or additive noise, the posterior distribution is an MRF with a structure akin to the image model. By the analogy, the posterior distribution defines another (imaginary) physical system. Gradual temperature reduction in the physical system isolates low energy states (``annealing''), or what is the same thing, the most probable states under the Gibbs distribution. The analogous operation under the posterior distribution yields the maximum a posteriori (MAP) estimate of the image given the degraded observations. The result is a highly parallel ``relaxation'' algorithm for MAP estimation. We establish convergence properties of the algorithm and we experiment with some simple pictures, for which good restorations are obtained at low signal-to-noise ratios.

18,761 citations


"Normalized cuts and image segmentat..." refers background in this paper

  • ...While most of these ideas go back to the 1970s (and earlier), the 1980s brought in the use of Markov Random Fields [ 10 ] and variational formulations [17], [2], [14]....

    [...]

01 Jan 1988

9,439 citations


"Normalized cuts and image segmentat..." refers methods in this paper

  • ...The clustering community [12] has offered us agglomerative and divisive algorithms; in image segmentation, we have region-based merge and split algorithms....

    [...]

Book
01 Jan 1988

8,586 citations

Frequently Asked Questions (12)
Q1. What have the authors contributed in "Normalized cuts and image segmentation" ?

ÐWe propose a novel approach for solving the perceptual grouping problem in vision. Rather than focusing on local features and their consistencies in the image data, their approach aims at extracting the global impression of an image. The authors treat image segmentation as a graph partitioning problem and propose a novel global criterion, the normalized cut, for segmenting the graph. The authors show that an efficient computational technique based on a generalized eigenvalue problem can be used to optimize this criterion. 

A graph G V;E can be partitioned into two disjoint sets, A;B, A [B V , A \\B ;, by simply removing edges connecting the two parts. 

The authors can construct a spring-mass system from the weighted graph by taking graph nodes as physical nodes and graph edges as springs connecting each pair of nodes. 

the authors have foundthat one can remove up to 90 percent of the totalconnections with each of the neighborhoods whenthe neighborhoods are large without affecting theeigenvector solution to the system. 

The stability criteria keeps usfrom cutting oscillatory eigenvectors, but it also prevents uscutting the subsequent eigenvectors which might be perfectpartitioning vectors. 

Under the constraint that x is orthogonal to the j-1 smallest eigenvectors x1; . . . ; xjÿ1, the quotient xTAx xT x is minimized by the next smallesteigenvector xj and its minimum value is the corresponding eigenvalue j. 

In the ideal case, the eigenvector should only take on two discrete values and the signs of the values can tell us exactly how to partition the graph. 

Imagine what would happen if the authors were to give a hard shake to this spring-mass system, forcing the nodes to oscillate in the direction perpendicular to the image plane. 

Some of it is low level, such as coherence of brightness, color, texture, or motion, but equally important is mid- or highlevel knowledge about symmetries of objects or object models. 

Spectral graph theory provides us some guidance on the goodness of the approximation to the normalized cut provided by the second eigenvalue of the normalized Laplacian. 

Although there are an exponential number of such partitions, finding the minimum cut of a graph is a well-studied problem and there exist efficient algorithms for solving it. 

In their currentexperiments, with this implementation, the running time ona 300 400 image can be reduced to about 20 seconds on Intel Pentium 300MHz machines.