Proceedings Article•DOI•

Normalized cuts and image segmentation

Q: How can the authors partition a graph into two disjoint sets?

A graph G V;E can be partitioned into two disjoint sets, A;B, A [B V , A \\B ;, by simply removing edges connecting the two parts.

Q: How many neighborhoods can be solved with the smallest eigenvalues?

the authors have foundthat one can remove up to 90 percent of the totalconnections with each of the neighborhoods whenthe neighborhoods are large without affecting theeigenvector solution to the system.

Q: What is the stability criteria for a recursive 2-way cut?

The stability criteria keeps usfrom cutting oscillatory eigenvectors, but it also prevents uscutting the subsequent eigenvectors which might be perfectpartitioning vectors.

Q: What is the smallest eigenvector of (7)?

Under the constraint that x is orthogonal to the j-1 smallest eigenvectors x1; . . . ; xjÿ1, the quotient xTAx xT x is minimized by the next smallesteigenvector xj and its minimum value is the corresponding eigenvalue j.

Q: What would happen if the authors were to give a hard shake to this spring-mass?

Imagine what would happen if the authors were to give a hard shake to this spring-mass system, forcing the nodes to oscillate in the direction perpendicular to the image plane.

Q: What is the way to explain the goodness of the approximation to the normalized?

Spectral graph theory provides us some guidance on the goodness of the approximation to the normalized cut provided by the second eigenvalue of the normalized Laplacian.

Jianbo Shi¹, Jitendra Malik¹•Institutions (1)

University of California, Berkeley¹

17 Jun 1997-pp 731-737

TL;DR: This work treats image segmentation as a graph partitioning problem and proposes a novel global criterion, the normalized cut, for segmenting the graph, which measures both the total dissimilarity between the different groups as well as the total similarity within the groups.

read less

Abstract: We propose a novel approach for solving the perceptual grouping problem in vision. Rather than focusing on local features and their consistencies in the image data, our approach aims at extracting the global impression of an image. We treat image segmentation as a graph partitioning problem and propose a novel global criterion, the normalized cut, for segmenting the graph. The normalized cut criterion measures both the total dissimilarity between the different groups as well as the total similarity within the groups. We show that an efficient computational technique based on a generalized eigenvalue problem can be used to optimize this criterion. We have applied this approach to segmenting static images and found results very encouraging.

...read moreread less

Summary (4 min read)

Jump to: [1 INTRODUCTION] – [2 GROUPING AS GRAPH PARTITIONING] – [2.1 Computing the Optimal Partition] – [3 THE GROUPING ALGORITHM] – [3.1 Example: Brightness Images] – [3.2 Recursive Two-Way Ncut] – [3.3 Simultanous K-Way Cut with Multiple Eigenvectors] – [4 EXPERIMENTS] – [4.1 Computation Time] – [4.2 Choice of Graph Edge Weight] – [5 RELATIONSHIP TO SPECTRAL GRAPH THEORY] – [5.1 A Physical Interpretation] – [6 RELATIONSHIP TO OTHER GRAPH THEORETIC APPROACHES TO IMAGE SEGMENTATION] – [6.1 Comparison with Related Eigenvector-Based Methods] and [7 CONCLUSION]

1 INTRODUCTION

The authors present a general framework for this problem, focusing specifically on the case of image segmentation.
There are two aspects to be considered here.
The authors propose a new graph-theoretic criterion for measuring the goodness of an image partitionÐthe normalized cut.

2 GROUPING AS GRAPH PARTITIONING

The degree of dissimilarity between these two pieces can be computed as total weight of the edges that have been removed.
Finding the minimum cut of a graph is a well-studied problem and there exist efficient algorithms for solving it.
Wu and Leahy [25] proposed a clustering method based on this minimum cut criterion.
In fact, any cut that partitions out individual nodes on the right half will have smaller cut value than the cut that partitions the nodes into the left and right halves.
In the same spirit, the authors can define a measure for total normalized association within groups for a given partition: Nassoc A;B assoc A;A assoc A; V assoc B;B assoc B; V ; 3 where assoc A;A and assoc B;B are total weights of edges connecting nodes within A and B, respectively.

2.1 Computing the Optimal Partition

Hence, z0 is, in fact, the smallest eigenvector of (7) and all eigenvectors of (7) are perpendicular to each other.
Now, recall a simple fact about the Rayleigh quotient [11]: Let A be a real symmetric matrix.
Thus, the second smallest eigenvector of the generalized eigensystem (6) is the real valued solution to their normalized cut problem.
Roughly speaking, this forces the indicator vector y to take similar values for nodes i and j that are tightly coupled (large wij).

3 THE GROUPING ALGORITHM

The authors grouping algorithm consists of the following steps: 1. Given an image or image sequence, set up a weighted graph G V;E and set the weight on the edge connecting two nodes to be a measure of the similarity between the two nodes.
Dx for eigenvectors with the smallest eigenvalues.
Use the eigenvector with the second smallest eigenvalue to bipartition the graph.
Decide if the current partition should be subdivided and recursively repartition the segmented parts if necessary.
The grouping algorithm, as well as its computational complexity, can be best illustrated by using the following example.

3.1 Example: Brightness Images

The weight on that edge should reflect the likelihood that the two pixels belong to one object.
In the ideal case, the eigenvector should only take on two discrete values and the signs of the values can tell us exactly how to partition the graph.
One can take 0 or the median value as the splitting point or one can search for the splitting point such that the resulting partition has the best Ncut A;B value.
After the graph is broken into two pieces, the authors can recursively run their algorithm on the two partitioned parts.
In their experiments, the authors find that simple thresholding on the ratio described above can be used to exclude unstable eigenvectors.

3.2 Recursive Two-Way Ncut

Dx for eigenvectors with the smallest eigenvalues.
Use the eigenvector with the second smallest eigenvalue to bipartition the graph by finding the splitting point such that Ncut is minimized.
Decide if the current partition should be subdivided by checking the stability of the cut, and make sure Ncut is below the prespecified value.
Recursively repartition the segmented parts if necessary.
The number of groups segmented by this method is controlled directly by the maximum allowed Ncut.

3.3 Simultanous K-Way Cut with Multiple Eigenvectors

One drawback of the recursive 2-way cut is its treatment of the oscillatory eigenvectors.
Also, the approach is computationally wasteful; only the second eigenvector is used, whereas the next few small eigenvectors also contain useful partitioning information.
Ðthey exacerbate the oversegmentation, but that will be dealt with subsequently.
In the second step, one can proceed in the following two ways: 1. Greedy pruning: Iteratively merge two segments at a time until only k segments are left.
The results presented in this paper are all based on the recursive 2-way partitioning algorithm outlined in Section 3.2.

4 EXPERIMENTS

The authors have applied their grouping algorithm to image segmentation based on brightness, color, texture, or motion information.
Note that the weight wij 0 for any pair of nodes i and j that are more than r pixels apart.
Fig. 5 shows a point set and the segmentation result.
The normalized cut criterion is indeed able to partition the point set in a desirable way.
In the motion case, the authors will treat the image sequence as a spatiotemporal data set.

4.1 Computation Time

On the 100 120 test images shown here, the normalized cut algorithm takes about 2 minutes on Intel Pentium 200MHz machines.
A multiresolution implementation can be used to reduce this running time further on larger images.
In their current experiments, with this implementation, the running time on a 300 400 image can be reduced to about 20 seconds on Intel Pentium 300MHz machines.
In their current implementation, the sparse eigenvalue decomposition is computed using the LASO2 numerical package developed by Scott.

4.2 Choice of Graph Edge Weight

The exponential weighting function is chosen here for its relative simplicity, as well as neutrality, since the focus of this paper is on developing a general segmentation procedure, given a feature similarity measure.
The authors found this choice of weight function is quite adequate for typical image and feature spaces.
Section 6.1 shows the effect of using different weighting functions and parameters on the output of the normalized cut algorithm.
The general problem of defining feature similarity incorporating a variety of cues is not a trivial one.
Some of these issues are addressed in [15].

5 RELATIONSHIP TO SPECTRAL GRAPH THEORY

The computational approach that the authors have developed for image segmentation is based on concepts from spectral graph theory.
This is a rich area of mathematics and the idea of using eigenvectors of the Laplacian for finding partitions of graphs can be traced back to Cheeger [4], Donath and Hoffman [7], and Fiedler [9].
Chung points out that the eigenvalues of this ªnormalizedº.
One cannot simultaneously minimize the disasso- ciation across the partitions while maximizing the association within the groups.
There are also other explanations why the normalized cut has better behavior from graph theoretical point of view, as pointed out by Chung [5].

5.1 A Physical Interpretation

As one might expect, a physical analogy can be set up for the generalized eigenvalue system (6) that the authors used to approximate the solution of normalized cut.
The authors can construct a spring-mass system from the weighted graph by taking graph nodes as physical nodes and graph edges as springs connecting each pair of nodes.
Nodes that have stronger spring connections among them will likely oscillate together.
Eventually, the group will ªpopº off from the image plane.
In fact, it can be shown that the fundamental modes of oscillation of this spring mass system are exactly the generalized eigenvectors of (6).

6 RELATIONSHIP TO OTHER GRAPH THEORETIC APPROACHES TO IMAGE SEGMENTATION

In the computer vision community, there has been some been previous work on image segmentation formulated as a graph partition problem.
Wu and Leahy [25] use the minimum cut criterion for their segmentation.
Cox et al. use an efficient discrete algorithm to solve their optimization problem assuming the graph is planar.
Sarkar and Boyer [19] use the eigenvector with the largest eigenvalue of the system Using a similar derivation as in Section 2.1, the authors can see that the first largest eigenvector of their system approximates minA V assoc A;A jAj and the second largest eigenvector approximates minA V ;B V assoc A;A jAj assoc B;B jBj .
As the authors will see later in the section, this situation can happen quite often in practice.

7 CONCLUSION

The authors developed a grouping algorithm based on the view that perceptual grouping should be a process that aims to extract global impressions of a scene and provides a hierarchical description of it.
By treating the grouping problem as a graph partitioning problem, the authors proposed the normalized cut criteria for segmenting the graph.
In finding an efficient algorithm for computing the minimum normalized cut, the authors showed that a generalized eigenvalue system provides a real valued solution to their problem.
A computational method based on this idea has been developed and applied to segmentation of brightness, color, and texture images.
For all other partitions, the Ncut value will be bounded below by 4an cÿ1=c .

Did you find this useful? Give us your feedback

Figures (18)

Fig. 3. Subplot (a) plots the smallest eigenvectors of the generalized eigenvalue system (11). Subplots (b)-(i) show the eigenvectors corresponding the second smallest to the ninth smallest eigenvalues of the system. The eigenvectors are reshaped to be the size of the image.

Fig. 16. A weighting function with medium rate of fall-off: w x eÿd x 0:2 , shown in subplot (a) in solid line. The dotted lines show the two alternative weighting functions used in Figs. 14 and 15. Subplot (b) shows the corresponding graph weight matrix W . The two columns (c) and (d) below show the first and second extreme eigenvectors for the Normalized cut (row 1), Average cut (row 2), and average association (row 3). All three of these algorithms perform satisfactorily in this case, with normalized cut producing a clearer solution than the other two cuts.

Fig. 1. A case where minimum cut gives a bad partition.

Fig. 4. (a) shows the original image of size 80 100. Image intensity is normalized to lie within 0 and 1. Subplots (b)-(h) show the components of the partition with Ncut value less than 0.04. Parameter setting: I 0:1, X 4:0, r 5.

Fig. 11. Subimages (a) and (b) show two frames of an image sequence. Segmentation results on this two frame image sequence are shown in subimages (c) to (g). Segments in (c) and (d) correspond to the person in the foreground and segments in (e) to (g) correspond to the background. The reason that the head of the person is segmented away from the body is that, although they have similar motion, their motion profiles are different. The head region contains 2D textures and the motion profiles are more peaked, while, in the body region, the motion profiles are more spread out. Segment (e) is broken away from (f) and (g) for the same reason.

Fig. 15. A weighting function with slow rate of fall-off: w x 1ÿ d x , shown in subplot (a) in solid line. The dotted lines show the two alternative weighting functions used in Figs. 14 and 16. Subplot (b) shows the corresponding graph weight matrix W . The two columns (c) and (d) below show the first, and second extreme eigenvectors for the Normalized cut (row 1), Average cut (row 2), and Average association (row 3). In this case, both normalized cut and average association give the right partition, while the average cut has trouble deciding on where to cut.

Fig. 14. A weighting function with fast rate of fall-off: w x eÿ d x 0:1 2 , shown in subplot (a) in solid line. The dotted lines show the two alternative weighting functions used in Figs. 15 and 16. Subplot (b) shows the corresponding graph weight matrix W . The two columns (c) and (d) below show the first, and second extreme eigenvectors for the Normalized cut (row 1), Average cut (row 2), and Average association (row 3). For both normalized cut and average cut, the smallest eigenvector is a constant vector as predicted. In this case, both normalized cut and average cut perform well, while the average association fails to do the right thing. Instead, it tries to pick out isolated small clusters.

Fig. 5. (a) Point set generated by two Poisson processes, with densities of 2.5 and 1.0 on the left and right clusters respectively, (b)4 and indicate the partition of point set in (a). Parameter settings: X 5, r 3.

Fig. 6. (a) A synthetic image showing a noisy ªstepº image. Intensity varies from 0 to 1, and Gaussian noise with 0:2 is added. Subplot (b) shows the eigenvector with the second smallest eigenvalue and subplot (c) shows the resulting partition.

Fig. 7. (a) A synthetic image showing three image patches forming a junction. Image intensity varies from 0 to 1 and Gaussian noise with 0:1 is added. (b)-(d) show the top three components of the partition.

Fig. 18. (a) shows a weighted graph on a regular grid. The missing edges on the grids have weights of 0. In comparison to the integers x1; x2; . . . ; xn, M is a large number (M > 2k 2), and a is very small number (0 < a < 1=n). (b) shows a cut that has a Ncut value less than 4ancÿ1=c . This cut, which only goes through edges with weight equal to a or na, has the property that the xis on each side of the partition sum up to k.

Fig. 12. Relationship between normalized cut and other eigenvector-based partitioning techniques. Compared to the average cut and average association formulation, normalized cut seeks a balance between the goal of finding clumps and finding splits.

Fig. 13. A set of randomly distributed points in 1D. The first 20 points are randomly distributed from 0.0 to 0.5 and the remaining 12 points are randomly distributed from 0.65 to 1.0. Segmentation result of these points with different weighting functions are shown in Figs. 14, 15, and 16.

Fig. 8. (a) shows a 126 106 weather radar image. (b)-(g) show the components of the partition with Ncut value less than 0.08. Parameter setting: I 0:007, x 15:0, r 10.

Fig. 10. (a) shows an image of a zebra. The remaining images show the major components of the partition. The texture features used correspond to convolutions with DOOG filters [16] at six orientations and five scales.

Fig. 9. (a) shows a 77 107 color image. (b)-(e) show the components of the partition with Ncut value less than 0.04. Parameter settings: I 0:01, X 4:0, r 5.

Fig. 2. A gray level image of a baseball game.

Fig. 17. Normalized cut and average association result on the zebra image in Fig. 10. Subplot (a) shows the second largest eigenvector of Wx Dx, approximating the normalized cut vector. Subplots (b)-(e) show the first to fourth largest eigenvectors of Wx x, approximating the average association vector, using the same graph weight matrix. In this image, pixels on the zebra body have, on average, lower degree of coherence than the pixels in the background. The average association, with its tendency to find tight clusters, partitions out only small clusters in the background. The normalized cut algorithm, having to balance the goal of clustering and segmentation, finds the better partition in this case.

Content maybe subject to copyright Report

Normalized Cuts and Image Segmentation

Jianbo Shi and Jitendra Malik, Member, IEEE

AbstractÐWe propose a novel approach for solving the perceptual grouping problem in vision. Rather than focusing on local features

and their consistencies in the image data, our approach aims at extracting the global impression of an image. We treat image

segmentation as a graph partitioning problem and propose a novel global criterion, the normalized cut, for segmenting the graph. The

normalized cut criterion measures both the total dissimilarity between the different groups as well as the total similarity within the

groups. We show that an efficient computational technique based on a generalized eigenvalue problem can be used to optimize this

criterion. We have applied this approach to segmenting static images, as well as motion sequences, and found the results to be very

encouraging.

Index TermsÐGrouping, image segmentation, graph partitioning.

1INTRODUCTION

EARLY 75 years ago, Wertheimer [24] pointed out the

importance of perceptual grouping and organization

in vision and listed several key factors, such as similarity,

proximity, and good continuation, which lead to visual

grouping. However, even to this day, many of the

computational issues of perceptual grouping have re-

mained unresolved. In this paper, we present a general

framework for this problem, focusing specifically on the

case of image segmentation.

Since there are many possible partitions of the domain I

of an image into subsets, how do we pick the ªrightº one?

There are two aspects to be considered here. The first is that

there may not be a single correct answer. A Bayesian view is

appropriateÐthere are several possible interpretations in

the context of prior world knowledge. The difficulty, of

course, is in specifying the prior world knowledge. Some of

it is low level, such as coherence of brightness, color,

texture, or motion, but equally important is mid- or high-

level knowledge about symmetries of objects or object

models. The second aspect is that the partitioning is

inherently hierarchical. Therefore, it is more appropriate

to think of returning a tree structure corresponding to a

hierarchical partition instead of a single ªflatº partition.

This suggests that image segmentation based on low-

level cues cannot and should not aim to produce a complete

final ªcorrectº segmentation. The objective should instead

be to use the low-level coherence of brightness, color, texture, or

motion attributes to sequentially come up with hierarchical

partitions. Mid- and high-level knowledge can be used to

either confirm these groups or select some for further

attention. This attention could result in further repartition-

ing or grouping. The key point is that image partitioning is

to be done from the big picture downward, rather like a

painter first marking out the major areas and then filling in

the details.

Prior literature on the related problems of clustering,

grouping and image segmentation is huge. The clustering

community [12] has offered us agglomerative and divisive

algorithms; in image segmentation, we have region-based

merge and split algorithms. The hierarchical divisive

approach that we advocate produces a tree, the dendrogram.

While most of these ideas go back to the 1970s (and earlier),

the 1980s brought in the use of Markov Random Fields [10]

and variational formulations [17], [2], [14]. The MRF and

variational formulations also exposed two basic questions:

1. What is the criterion that one wants to optimize?

2. Is there an efficient algorithm for carrying out the

optimization?

Many an attractive criterion has been doomed by the

inability to find an effective algorithm to find its mini-

mumÐgreedy or gradient descent type approaches fail to

find global optima for these high-dimensional, nonlinear

problems.

Our approach is most related to the graph theoretic

formulation of grouping. The set of points in an arbitrary

feature space are represented as a weighted undirected

graph GG VV;EE, where the nodes of the graph are the

points in the feature space, and an edge is formed between

every pair of nodes. The weight on each edge, wii; jj,isa

function of the similarity between nodes ii and jj.

In grouping, we seek to partition the set of vertices into

disjoint sets V

; V

; ...; V

, where by some measure the

similarity among the vertices in a set V

is high and, across

different sets V

, V

is low.

To partition a graph, we need to also ask the following

questions:

1. What is the precise criterion for a good partition?

2. How can such a partition be computed efficiently?

In the image segmentation and data clustering commu-

nity, there has been much previous work using variations of

the minimal spanning tree or limited neighborhood set

approaches. Although those use efficient computational

888 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 22, NO. 8, AUGUST 2000

. J. Shi is with the Robotics Institute, Carnegie Mellon University, 5000

Forbes Ave., Pittsburgh, PA 15213. E-mail: jshi@cs.cmu.edu

. J. Malik is with the Electrical Engineering and Computer Science Division,

University of California at Berkeley, Berkeley, CA 94720.

E-mail: malik@cs.berkeley.edu.

Manuscript received 4 Feb. 1998; accepted 16 Nov. 1999.

Recommended for acceptance by M. Shah.

For information on obtaining reprints of this article, please send e-mail to:

tpami@computer.org, and reference IEEECS Log Number 107618.

0162-8828/00/$10.00 ß 2000 IEEE

methods, the segmentation criteria used in most of them are

based on local properties of the graph. Because perceptual

grouping is about extracting the global impressions of a

scene, as we saw earlier, this partitioning criterion often

falls short of this main goal.

In this paper, we propose a new graph-theoretic criterion

for measuring the goodness of an image partitionÐthe

normalized cut. We introduce and justify this criterion in

Section 2. The minimization of this criterion can be

formulated as a generalized eigenvalue problem. The

eigenvectors can be used to construct good partitions of

the image and the process can be continued recursively as

desired (Section 2.1). Section 3 gives a detailed explanation

of the steps of our grouping algorithm. In Section 4, we

show experimental results. The formulation and minimiza-

tion of the normalized cut criterion draws on a body of

results from the field of spectral graph theory (Section 5).

Relationship to work in computer vision is discussed in

Section 6 and comparison with related eigenvector based

segmentation methods is represented in Section 6.1. We

conclude in Section 7.

The main results in this paper were first presented in [20].

2GROUPING AS GRAPH PARTITIONING

A graph G V; E can be partitioned into two disjoint

sets, A; B, A [ B  V , A \ B ;, by simply removing edges

connecting the two parts. The degree of dissimilarity

between these two pieces can be computed as total weight

of the edges that have been removed. In graph theoretic

language, it is called the cut:

cutA; B

u2A;v2B

wu; v: 1

The optimal bipartitioning of a graph is the one that

minimizes this cut value. Although there are an exponential

number of such partitions, finding the minimum cut of a

graph is a well-studied problem and there exist efficient

algorithms for solving it.

Wu and Leahy [25] proposed a clustering method based

on this minimum cut criterion. In particular, they seek to

partition a graph into k-subgraphs such that the maximum

cut across the subgroups is minimized. This problem can be

efficiently solved by recursively finding the minimum cuts

that bisect the existing segments. As shown in Wu and

Leahy's work, this globally optimal criterion can be used to

produce good segmentation on some of the images.

However, as Wu and Leahy also noticed in their work,

the minimum cut criteria favors cutting small sets of

isolated nodes in the graph. This is not surprising since

the cut defined in (1) increases with the number of edges

going across the two partitioned parts. Fig. 1 illustrates one

such case. Assuming the edge weights are inversely

proportional to the distance between the two nodes, we

see the cut that partitions out node n

or n

will have a very

small value. In fact, any cut that partitions out individual

nodes on the right half will have smaller cut value than the

cut that partitions the nodes into the left and right halves.

To avoid this unnatural bias for partitioning out small

sets of points, we propose a new measure of disassociation

between two groups. Instead of looking at the value of total

edge weight connecting the two partitions, our measure

computes the cut cost as a fraction of the total edge

connections to all the nodes in the graph. We call this

disassociation measure the normalized cut (Ncut):

NcutA; B

cutA; B

assocA; V 



cutA; B

assocB; V 

; 2

where assocA; V 

u2A;t2V

wu; t is the total connection

from nodes in A to all nodes in the graph and assocB; V  is

similarly defined. With this definition of the disassociation

between the groups, the cut that partitions out small

isolated points will no longer have small Ncut value, since

the cut value will almost certainly be a large percentage of

the total connection from that small set to all other nodes. In

the case illustrated in Fig. 1, we see that the cut

value

across node n

will be 100 percent of the total connection

from that node.

In the same spirit, we can define a measure for total

normalized association within groups for a given partition:

NassocA; B

assocA; A

assocA; V 



assocB; B

assocB; V 

; 3

where assocA; A and assocB; B are total weights of

edges connecting nodes within A and B, respectively. We

see again this is an unbiased measure, which reflects how

tightly on average nodes within the group are connected to

each other.

Another important property of this definition of associa-

tion and disassociation of a partition is that they are

naturally related:

NcutA; B

cutA; B

assocA; V 



cutA; B

assocB; V 



assocA; V ÿassocA; A

assocA; V 



assocB; V ÿassocB; B

assocB; V 

 2 ÿ

assocA; A

assocA; V 



assocB; B

assocB; V 



 2 ÿ NassocA; B:

Hence, the two partition criteria that we seek in our

grouping algorithm, minimizing the disassociation between

the groups and maximizing the association within the

SHI AND MALIK: NORMALIZED CUTS AND IMAGE SEGMENTATION 889

Fig. 1. A case where minimum cut gives a bad partition.

groups, are in fact identical and can be satisfied simulta-

neously. In our algorithm, we will use this normalized cut

as the partition criterion.

Unfortunately, minimizing normalized cut exactly is NP-

complete, even for the special case of graphs on grids. The

proof, due to Papadimitriou, can be found in Appendix A.

However, we will show that, when we embed the normal-

ized cut problem in the real value domain, an approximate

discrete solution can be found efficiently.

2.1 Computing the Optimal Partition

Given a partition of nodes of a graph, V, into two sets A and

B, let xx be an N jVV j dimensional indicator vector, x

 1 if

node i is in A and ÿ1, otherwise. Let ddi

wi; j be the

total connection from node ii to all other nodes. With the

definitions xx and dd, we can rewrite NcutA; B as:

NcutA; B

cutA; B

assocA; V 



cutB; A

assocB; V 



xx

>0;xx

<0

ÿw



xx

<0;xx

>0

ÿw

Let D be an N  N diagonal matrix with dd on its diagonal,

W be an N  N symmetrical matrix with Wi; jw

k 

;

and 1 be an N  1 vector of all ones. Using the fact

11xx

and

11ÿxx

are indicator vectors for x

> 0 and x

< 0, respectively,

we can rewrite 4Ncutxx as:



1  xx

D ÿ W1  xx



1 ÿ xx

D ÿ W1 ÿ xx

1 ÿ k1



xx

D ÿ Wxx  1

D ÿ W1

k1 ÿ k1



21 ÿ 2k1

D ÿ Wxx

k1 ÿ k1

Let

xxxx

D ÿ Wxx;

xx1

D ÿ Wxx;

  1

D ÿ W1;

and

M  1

D1;

we can then further expand the above equation as:



xx21 ÿ 2kxx

k1 ÿ kM



xx21 ÿ 2kxx

k1 ÿ kM

2xx



2xx



2

Dropping the last constant term, which in this case equals 0,

we get



1 ÿ 2k  2k

xx21 ÿ 2kxx

k1 ÿ kM



2xx



1ÿ2k2k



1ÿk

xx

21ÿ2k

1ÿk

xx

1ÿk



2xx

Letting b 

1ÿk

, and since   0, it becomes



1  b

xx21 ÿ b

xx



2bxx



1  b

xx



21 ÿ b

xx



2bxx

2b



1  b

xx

D ÿ Wxx  1

D ÿ W1



21 ÿ b

1

D ÿ Wxx



2bxx

D ÿ Wxx

2b1

D ÿ W1



1  xx

D ÿ W1  xx



1 ÿ xx

D ÿ W1 ÿ xx

2b1 ÿ xx

D ÿ W1  xx



1  xxÿb1 ÿ xx

D ÿ W1  xxÿb1 ÿ xx

Setting yy 1  xxÿb1 ÿ xx, it is easy to see that

D1 

ÿ b

 0 4

since b 

1ÿk



and

Dyy 

 b

 b

 b

 b

 b



 b1

D1:

Putting everything together we have,

min

Ncutxxmin

D ÿ Wyy

Dyy

; 5

with the condition yyi2f1; ÿbg and yy

D1  0.

Note that the above expression is the Rayleigh quotient

[11]. If yy is relaxed to take on real values, we can minimize

(5) by solving the generalized eigenvalue system,

D ÿ Wyy  Dyy: 6

However, we have two constraints on yy which come from

the condition on the corresponding indicator vector xx. First,

consider the constraint yy

D1  0. We can show this

constraint on yy is automatically satisfied by the solution of

the generalized eigensystem. We will do so by first

890 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 22, NO. 8, AUGUST 2000

transforming (6) into a standard eigensystem and showing

the corresponding condition is satisfied there. Rewrite (6) as

D ÿ WD

zz  zz; 7

where zz  D

yy. One can easily verify that zz

 D

1 is an

eigenvector of (7) with eigenvalue of 0. Furthermore,

D ÿ WD

is symmetric positive semidefinite since

D ÿ W, also called the Laplacian matrix, is known to be

positive semidefinite [18]. Hence, zz

is, in fact, the smallest

eigenvector of (7) and all eigenvectors of (7) are perpendi-

cular to each other. In particular, zz

, the second smallest

eigenvector, is perpendicular to zz

. Translating this state-

ment back into the general eigensystem (6), we have:

1) yy

 1 is the smallest eigenvector with eigenvalue of 0

and 2) 0  zz

 yy

D1, where yy

is the second smallest

eigenvector of (6).

Now, recall a simple fact about the Rayleigh quotient [11]:

Let A be a real symmetric matrix. Under the constraint that

xx is orthogonal to the j-1 smallest eigenvectors xx

; ...;xx

jÿ1

the quotient

Axx

is minimized by the next smallest

eigenvector xx

and its minimum value is the corresponding

eigenvalue 

As a result, we obtain:

 arg:min

0

D ÿ WD

8

and, consequently,

 arg:min

D10

D ÿ Wyy

Dyy

: 9

Thus, the second smallest eigenvector of the generalized

eigensystem (6) is the real valued solution to our normal-

ized cut problem. The only reason that it is not necessarily

the solution to our original problem is that the second

constraint on yy that yyi takes on two discrete values is not

automatically satisfied. In fact, relaxing this constraint is

what makes this optimization problem tractable in the first

place. We will show in Section 3 how this real valued

solution can be transformed into a discrete form.

A similar argument can also be made to show that the

eigenvector with the third smallest eigenvalue is the real

valued solution that optimally subpartitions the first two

parts. In fact, this line of argument can be extended to show

that one can subdivide the existing graphs, each time using

the eigenvector with the next smallest eigenvalue. How-

ever, in practice, because the approximation error from the

real valued solution to the discrete valued solution

accumulates with every eigenvector taken and all eigen-

vectors have to satisfy a global mutual orthogonality

constraint, solutions based on higher eigenvectors become

unreliable. It is best to restart solving the partitioning

problem on each subgraph individually.

It is interesting to note that, while the second smallest

eigenvector yy of (6) only approximates the optimal normal-

ized cut solution, it exactly minimizes the following

problem:

inf

D10

yyiÿyyj

yyi

di

; 10

in real-valued domain, where diDi; i.Roughly

speaking, this forces the indicator vector yy to take similar

values for nodes i and j that are tightly coupled (large w

In summary, we propose using the normalized cut

criterion for graph partitioning and we have shown how

this criterion can be computed efficiently by solving a

generalized eigenvalue problem.

3THE GROUPING ALGORITHM

Our grouping algorithm consists of the following steps:

1. Given an image or image sequence, set up a

weighted graph G V; E and set the weight on

the edge connecting two nodes to be a measure of

the similarity between the two nodes.

2. Solve D ÿ Wxx  Dxx for eigenvectors with the

smallest eigenvalues.

3. Use the eigenvector with the second smallest

eigenvalue to bipartition the graph.

4. Decide if the current partition should be subdivided

and recursively repartition the segmented parts if

necessary.

The grouping algorithm, as well as its computational

complexity, can be best illustrated by using the following

example.

3.1 Example: Brightness Images

Fig. 2 shows an image that we would like to segment. The

steps are:

1. Construct a weighted graph G V; E by taking

each pixel as a node and connecting each pair of

pixels by an edge. The weight on that edge should

reflect the likelihood that the two pixels belong to

one object. Using just the brightness value of the

pixels and their spatial location, we can define the

graph edge weight connecting the two nodes i and j

as:

 e

kFF

i

ÿFF

j





kXX

i

ÿXX

j



if kXXiÿXXjk

0 otherwise:

11

2. Solve for the eigenvectors with the smallest eigen-

values of the system

D ÿ Wyy  Dyy: 12

As we saw above, the generalized eigensystem in

(12) can be transformed into a standard eigenvalue

problem of

D ÿ WD

xx  xx: 13

Solving a standard eigenvalue problem for all

eigenvectors takes On

 operations, where n is the

number of nodes in the graph. This becomes

impractical for image segmentation applications

where n is the number of pixels in an image.

SHI AND MALIK: NORMALIZED CUTS AND IMAGE SEGMENTATION 891

Fortunately, our graph partitioning has the follow-

ing properties: 1) The graphs are often only locally

connected and the resulting eigensystems are very

sparse, 2) only the top few eigenvectors are needed

for graph partitioning, and 3) the precision require-

ment for the eigenvectors is low, often only the right

sign bit is required. These special properties of our

problem can be fully exploited by an eigensolver

called the Lanczos method. The running time of a

Lanczos algorithm is OmnOmMn [11],

where m is the maximum number of matrix-vector

computations required and Mn is the cost of a

matrix-vector computation of Axx, where

A  D

D ÿ WD

. Note that the sparsity struc-

ture of A is identical to that of the weight matrix W.

Since W is sparse, so is A and the matrix-vector

computation is only On.

To see why this is the case, we will look at the cost

of the inner product of one row of AA with a vector x.

Let y

 AA

 x 

. For a fixed i, AA

is only

nonzero if node j is in a spatial neighborhood of i.

Hence, there are only a fixed number of operations

required for each AA

 x and the total cost of

computing AAx is On.

The constant factor is determined by the size of

the spatial neighborhood of a node. It turns out that

we can substantially cut down additional connec-

tions from each node to its neighbors by randomly

selecting the connections within the neighborhood

for the weighted graph. Empirically, we have found

that one can remove up to 90 percent of the total

connections with each of the neighborhoods when

the neighborhoods are large without affecting the

eigenvector solution to the system.

Putting everything together, each of the matrix-

vector computations cost On operations with a

small constant factor. The number m depends on

many factors [11]. In our experiments on image

segmentation, we observed that m is typically less

than On

.

Fig. 3 shows the smallest eigenvectors computed

for the generalized eigensystem with the weight

matrix defined above.

3. Once the eigenvectors are computed, we can parti-

tion the graph into two pieces using the second

smallest eigenvector. In the ideal case, the eigenvec-

tor should only take on two discrete values and the

signs of the values can tell us exactly how to

partition the graph. However, our eigenvectors can

take on continuous values and we need to choose a

splitting point to partition it into two parts. There are

many different ways of choosing such a splitting

point. One can take 0 or the median value as the

splitting point or one can search for the splitting

point such that the resulting partition has the best

NcutA; B value. We take the latter approach in our

work. Currently, the search is done by checking l

evenly spaced possible splitting points, and comput-

ing the best Ncut among them. In our experiments,

the values in the eigenvectors are usually well

separated and this method of choosing a splitting

point is very reliable even with a small l.

4. After the graph is broken into two pieces, we can

recursively run our algorithm on the two partitioned

parts. Or, equivalently, we could take advantage of

the special properties of the other top eigenvectors as

explained in the previous section to subdivide the

graph based on those eigenvectors. The recursion

stops once the Ncut value exceeds certain limit.

We also impose a stability criterion on the

partition. As we saw earlier, and as we see in the

eigenvectors with the seventh to ninth smallest

eigenvalues (Fig. 3g-h), sometimes an eigenvector

can take on the shape of a continuous function,

rather that the discrete indicator function that we

seek. From the view of segmentation, such an

eigenvector is attempting to subdivide an image

region where there is no sure way of breaking it. In

fact, if we are forced to partition the image based on

this eigenvector, we will see there are many different

splitting points which have similar Ncut values.

Hence, the partition will be highly uncertain and

unstable. In our current segmentation scheme, we

simply choose to ignore all those eigenvectors which

have smoothly varying eigenvector values. We

achieve this by imposing a stability criterion which

measures the degree of smoothness in the eigenvec-

tor values. The simplest measure is based on first

computing the histogram of the eigenvector values

and then computing the ratio between the minimum

and maximum values in the bins. When the

eigenvector values are continuously varying, the

values in the histogram bins will stay relatively the

same and the ratio will be relatively high. In our

experiments, we find that simple thresholding on the

ratio described above can be used to exclude

unstable eigenvectors. We have set that value to be

0.06 in all our experiments.

Fig. 4 shows the final segmentation for the image

shown in Fig. 2.

3.2 Recursive Two-Way Ncut

In summary, our grouping algorithm consists of the

following steps:

892 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 22, NO. 8, AUGUST 2000

Fig. 2. A gray level image of a baseball game.

HTML Viewer

Frequently Asked Questions (12)

Q1. What have the authors contributed in "Normalized cuts and image segmentation" ?

ÐWe propose a novel approach for solving the perceptual grouping problem in vision. Rather than focusing on local features and their consistencies in the image data, their approach aims at extracting the global impression of an image. The authors treat image segmentation as a graph partitioning problem and propose a novel global criterion, the normalized cut, for segmenting the graph. The authors show that an efficient computational technique based on a generalized eigenvalue problem can be used to optimize this criterion.

Q2. How can the authors partition a graph into two disjoint sets?

A graph G V;E can be partitioned into two disjoint sets, A;B, A [B V , A \\B ;, by simply removing edges connecting the two parts.

Q3. How can the authors construct a spring-mass system from the weighted graph?

The authors can construct a spring-mass system from the weighted graph by taking graph nodes as physical nodes and graph edges as springs connecting each pair of nodes.

Q4. How many neighborhoods can be solved with the smallest eigenvalues?

the authors have foundthat one can remove up to 90 percent of the totalconnections with each of the neighborhoods whenthe neighborhoods are large without affecting theeigenvector solution to the system.

Q5. What is the stability criteria for a recursive 2-way cut?

The stability criteria keeps usfrom cutting oscillatory eigenvectors, but it also prevents uscutting the subsequent eigenvectors which might be perfectpartitioning vectors.

Q6. What is the smallest eigenvector of (7)?

Under the constraint that x is orthogonal to the j-1 smallest eigenvectors x1; . . . ; xjÿ1, the quotient xTAx xT x is minimized by the next smallesteigenvector xj and its minimum value is the corresponding eigenvalue j.

Q7. What is the way to partition a graph?

In the ideal case, the eigenvector should only take on two discrete values and the signs of the values can tell us exactly how to partition the graph.

Q8. What would happen if the authors were to give a hard shake to this spring-mass?

Imagine what would happen if the authors were to give a hard shake to this spring-mass system, forcing the nodes to oscillate in the direction perpendicular to the image plane.

Q9. What is the importance of knowledge about symmetries of objects?

Some of it is low level, such as coherence of brightness, color, texture, or motion, but equally important is mid- or highlevel knowledge about symmetries of objects or object models.

Q10. What is the way to explain the goodness of the approximation to the normalized?

Spectral graph theory provides us some guidance on the goodness of the approximation to the normalized cut provided by the second eigenvalue of the normalized Laplacian.

Q11. What is the way to find the minimum cut of a graph?

Although there are an exponential number of such partitions, finding the minimum cut of a graph is a well-studied problem and there exist efficient algorithms for solving it.

Q12. How long can the image run on a 300 400 machine?

In their currentexperiments, with this implementation, the running time ona 300 400 image can be reduced to about 20 seconds on Intel Pentium 300MHz machines.

Normalized cuts and image segmentation

Summary (4 min read)

1 INTRODUCTION

2 GROUPING AS GRAPH PARTITIONING

2.1 Computing the Optimal Partition

3 THE GROUPING ALGORITHM

3.1 Example: Brightness Images

3.2 Recursive Two-Way Ncut

3.3 Simultanous K-Way Cut with Multiple Eigenvectors

4 EXPERIMENTS

4.1 Computation Time

4.2 Choice of Graph Edge Weight

5 RELATIONSHIP TO SPECTRAL GRAPH THEORY

5.1 A Physical Interpretation

6 RELATIONSHIP TO OTHER GRAPH THEORETIC APPROACHES TO IMAGE SEGMENTATION

7 CONCLUSION

Figures (18)

Citations

Cites background or result from "Normalized cuts and image segmentat..."

Cites background or methods from "Normalized cuts and image segmentat..."

Cites background from "Normalized cuts and image segmentat..."

References

"Normalized cuts and image segmentat..." refers background or methods in this paper

"Normalized cuts and image segmentat..." refers background in this paper

"Normalized cuts and image segmentat..." refers methods in this paper

Related Papers (5)

Frequently Asked Questions (12)

Q1. What have the authors contributed in "Normalized cuts and image segmentation" ?

Q2. How can the authors partition a graph into two disjoint sets?

Q3. How can the authors construct a spring-mass system from the weighted graph?

Q4. How many neighborhoods can be solved with the smallest eigenvalues?

Q5. What is the stability criteria for a recursive 2-way cut?

Q6. What is the smallest eigenvector of (7)?

Q7. What is the way to partition a graph?

Q8. What would happen if the authors were to give a hard shake to this spring-mass?

Q9. What is the importance of knowledge about symmetries of objects?

Q10. What is the way to explain the goodness of the approximation to the normalized?

Q11. What is the way to find the minimum cut of a graph?

Q12. How long can the image run on a 300 400 machine?