What is the goal of the study?

The goal is to find a labeling f that assigns each pixel p ∈ P a label fp ∈ L, where f is both piecewise smooth and consistent with the observed data.

What is the cost of an elementary cut?

The cost of an elementary cut C is|C| = ∑p∈P|C ∩ {tαp , t ᾱ p }| (6)+ ∑{p,q}∈N fp=fq|C ∩ e{p,q}| + ∑{p,q}∈N fp 6=fq|C ∩ E{p,q}|.

What is the importance of non-Potts energy functions?

This example demonstrates the need for non-Potts energy functions, as minimizing E2 gives significant “banding” problems (shown in the second image).

(Open Access) Fast approximate energy minimization via graph cuts (1999) | Yuri Boykov

Q: What are the contributions in "Fast approximate energy minimization via graph cuts" ?

In this paper the authors address the problem of minimizing a large class of energy functions that occur in early vision. The authors propose two algorithms that use graph cuts to compute a local minimum even when very large moves are allowed. The first move the authors consider is an α-βswap: for a pair of labels α, β, this move exchanges the labels between an arbitrary set of pixels labeled α and another arbitrary set labeled β. The second move the authors consider is an α-expansion: for a label α, this move assigns an arbitrary set of pixels the label α. The authors experimentally demonstrate the effectiveness of their approach on image restoration, stereo and motion. In this framework, one seeks the labeling f that minimizes the energy E ( f ) = Esmooth ( f ) + Edata ( f ).

Q: What is the tlink weight for a cut C on G?

It is easy to show that a cut C severs an n-link e{p,q} between neighboring pixels on Gαβ if and only if C leaves the pixels p and q connected to different terminals.

Q: What is the simplest algorithm to find f?

the expansion move algorithm produces a labeling f such that E(f∗) ≤ E(f) ≤ 2k ·E(f∗) where f∗ is the global minimum and k = max{V (α,β) : α6=β} min{V (α,β) : α6=β} (see [8]).

Proceedings of “Internation Conference on Computer Vision”, K erkyra, Greece, September 1999 vol.I, p.377

Fast Approximate Energy Minimization via Graph Cuts

Yuri Boykov Olga Veksler Ramin Zabih

Computer Science Department

Cornell University

Ithaca, NY 14853

Abstract

In this paper we address the problem of minimizing

a large class of energy functions that occur in early

vision. The major restriction is that the energy func-

tion’s sm oothness term must only involve pairs of pix-

els. We propose two algorithms that use graph cuts to

compute a local minimum even when very large moves

are allowed. The ﬁrst move we consider is an α-β-

swap: for a pair of labels α, β, this move exchanges

the labels between an arbitrary set of pixels labeled α

and another arbitrary set labeled β. Our ﬁrst algo-

rithm generates a labeling such that there is no swap

move that decreases the energy. The second move we

consider is an α-expansion: for a label α, this move

assigns an arbitrary set of pixels the label α. Our sec-

ond algorithm, which requires the smoothness term to

be a metric, generates a labeling such that t here is n o

expansion move that decreases the energy. Moreover,

this solution is within a known factor of the global min-

imum. We experimentally demonstrate the eﬀective-

ness of our approach on image restoration, stereo and

motion.

1 Energy minimization in early vision

Many early vision problems require estimating

some spatially varying q uantity (such as intensity or

disparity) from no isy measurements. Such quantities

tend to be piecewise smooth; they vary smoothly at

most points, but change dramatically at object bound-

aries. Every pixel p ∈ P must be assigned a label in

some set L; for motion or stereo, the labels are dispar-

ities, while for image restoration they represent inten-

sities. The goal is to ﬁnd a labeling f that assigns each

pixel p ∈ P a lab e l f

∈ L, where f is both piecewise

smooth and consistent with the obse rved data.

These vision problems can be naturally formulated

in terms of energy minimization. In this fr amework,

one seeks the labeling f that minimizes the energy

E(f) = E

smooth

(f) + E

data

(f).

Here E

smooth

measures the extent to which f is not

piecewise smooth, while E

data

measures the disag ree-

ment between f and the observed data. Many diﬀer-

ent energy functions have be e n proposed in the liter -

ature. The form of E

data

is typically

data

(f) =

p∈P

where D

measures how appropriate a label is for the

pixel p given the observed data. In image restoration,

for e xample, D

) is typically (f

− i

)

, where i

the observed intensity of the pixel p.

The choice of E

smooth

is a critica l issue, and

many diﬀerent functions have been proposed. For

example, in standard regularization-based vision

[6], E

smooth

makes f smooth everywhere. This

leads to poor results at object boundaries. En-

ergy functions that do not have this problem are

called discontinu ity-preserving. A large number of

discontinuity-preserving energy functions have be e n

proposed (see for e xample [7]). Geman and Geman’s

seminal paper [3] gave a Bayesian interpretation of

many energy functions, and proposed a discontinuity-

preserving energy function based on Markov Random

Fields (MRF’s).

The major diﬃculty with energy minimization for

early vision lies in the enormous computational costs.

Typically these energy functions have many local min-

ima (i.e., they are non-convex). Worse still, the space

of possible labelings has dimension |P|, which is many

thousands. T here have bee n numerous attempts to

design fast algorithms for energy minimization. Simu-

lated annealing was popularized in co mputer visio n by

[3], and is widely us e d since it can optimize an arbi-

trary energy function. Unfortunately, minimizing an

arbitrary energy function requires exponential time,

and as a consequence simulated annealing is very slow.

In practice, annealing is ineﬃcient partly be cause at

each step it changes the value of a single pixel.

The energy functions that we cons ider in this pa-

per ar ise in a variety of diﬀerent contexts, including

the Bayesian labeling of MRF’s. We allow D

to be

Proceedings of “Internation Conference on Computer Vision”, Kerkyra, Greece, September 1999 vol.I, p.378

arbitrary, and c onsider smoothing terms of the form

smooth

{p,q}∈N

{p,q}

, f

), (1)

where N is the set of pairs of adjacent pixels. In spe-

cial cases such energies can be minimized exactly. If

the number of possible labels is |L| = 2 then the exact

solution can be found in polynomial time by comput-

ing a minimum cost cut on a certain graph [4]. If

L is a ﬁnite 1D set and the interaction potential is

V (f

, f

) = |f

−f

| then the exact minimum can also

be found eﬃciently via graph cuts [5, 2]. In general,

however, the problem is NP- hard [8].

In this paper we develop algorithms that approx-

imately minimize energy E(f) for an arbitrary ﬁnite

set of labels L under two fairly general classes of in-

teraction p otentials V : semi-metric and metric. V is

called a semi-metric on the spa c e o f labe ls L if for

any pair of labels α, β ∈ L it satisﬁes two properties:

V (α, β) = V (β, α) ≥ 0 and V (α, β) = 0 ⇔ α = β.

If V also satisﬁes the triangle inequality

V (α, β) ≤ V (α, γ) + V (γ, β) (2)

for any α, β, γ in L then V is called a metric. Note

that both semi-metric and metric include impor-

tant cases of discontinuity-preserving interaction po-

tentials. For example, the truncated L

distance

V (α, β) = min(K, ||α − β||) and the Potts interaction

penalty V (α, β) = δ(α 6= β) are both metrics.

The algorithms described in this paper generalize

the approach that we originally developed for the case

of the Potts model [2]. In particular, we compute a la-

beling which is a local minimum even when very large

moves are allowed. We begin w ith an over view of our

energy minimization algorithms, which are based on

graph cuts. Our ﬁrst a lgorithm, described in section 3,

is based on α-β-swap moves and works for any semi-

metric V

{p,q}

’s. Our second algorithm, described in

section 4, is based on more interesting α-ex pansion

moves but works only for metric V

{p,q}

’s (i.e., the addi-

tional triangle inequality constraint is req uired). Note

that α- expansion moves produce a solution within a

known factor of the global minimum of E. A proof of

this can b e found in [8].

2 Energy minimization via graph cuts

The most impor tant property of these methods is

that they produce a local minimum even when large

moves are allowed. In this section, we discuss the

moves we allow, which are best described in terms of

partitions. We sketch the algorithms and list their ba-

sic properties. We then formally introduce the notion

of a graph cut, which is the bas is for our methods.

1. Star t with a n arbitrar y la beling f

2. Set success := 0

3. For each pair of labels { α, β} ⊂ L

3.1. Fi nd

f = arg min E(f

) among f

within

one α-β swap of f (see Section 3)

3.2. If E(

f) < E(f), set f :=

and success : = 1

4. If succe ss = 1 goto 2

5. Retu rn f

1. Star t with a n arbitrar y la beling f

2. Set success := 0

3. For each label α ∈ L

3.1. Fi nd

f = arg min E(f

) among f

within

one α-expansi on of f (see Section 4)

3.2. If E(

f) < E(f), set f :=

and success : = 1

4. If succe ss = 1 goto 2

5. Retu rn f

Figure 1: Our swap move algorithm (top) and expan-

sion move algorithm (bottom).

2.1 Partitions and move spaces

Any labeling f can b e uniquely represented by a

partition of image pixels P = {P

| l ∈ L} where P

{p ∈ P | f

= l} is a subset of pixels assigned label l.

Since there is an obvious one to one correspondence

between labelings f and partitions P, we can use these

notions interchangingly.

Given a pair of labels α, β, a move from a partition

P (labe ling f) to a new partition P

(labe ling f

) is

called an α-β swap if P

= P

for any label l 6= α, β.

This means that the only diﬀerence between P and P

is that some pixels that were labeled α in P are now

labeled β in P

, and some pixels that were labe led β

in P are now labeled α in P

Given a label α, a move from a partition P (labeling

f) to a new partition P

(labe ling f

) is called an α-

expansion if P

⊂ P

and P

⊂ P

for any label l 6= α.

In other words, an α-expansion move allows any set of

image pixels to change their labels to α.

Note that a move which gives an arbitrary label α to

a single pixel is both an α-β swap and an α-expansion.

As a consequence, the standard move space used in

annealing is a special case of our move spaces.

2.2 Algorithms and properties

We have develope d two energy minimization algo-

rithms, which are shown in ﬁgure 1. The structure of

Proceedings of “Internation Conference on Computer Vision”, Kerkyra, Greece, September 1999 vol.I, p.379

the algorithms is quite s imila r. We will call a single

execution of steps 3.1–3.2 an iteration, and an execu-

tion of steps 2–4 a cycle. In each cycle, the algorithm

performs an iteration for every label (expansion move

algorithm) or for every pair of labels (swap move al-

gorithm), in a certain or der that can be ﬁxed or ran-

dom. A cycle is successful if a strictly better labeling

is found at any iteration. The algor ithm stops after

the ﬁrst unsuccessful cycle since no further improve-

ment is possible. Obviously, a cycle in the swap move

algorithm takes |L|

iterations, and a cycle in the ex-

pansion move algorithm takes |L| iterations.

These algorithms have several important proper-

ties. First, the algorithms are guaranteed to terminate

in a ﬁnite number of cycles; in fact, under fairly gen-

eral assumptions we can prove termination in O(|P|)

cycles [8]. However, in the experiments we report in

section 5, the algorithm stops after a few cycles and

most of the improvements occur during the ﬁrst cycle.

Second, once the algo rithm has terminated, the en-

ergy o f the resulting labeling is a local minimum with

respect to a swap or an expansion move. Finally, the

expansion move algo rithm produces a labeling f such

that E(f

∗

) ≤ E(f) ≤ 2k ·E(f

∗

) where f

∗

is the global

minimum and k =

max{V (α,β) : α6=β}

min{V (α,β) : α6=β}

(see [8]).

2.3 Graph cuts

The key part of each algorithm is step 3.1, where

graph cuts are used to eﬃciently ﬁnd

f. Let G = hV, Ei

be a weighted graph with two distinguished vertices

called the terminals. A cut C ⊂ E is a set of edges

such that the terminals are separ ated in the induced

graph G(C) = hV, E −Ci. In addition, no proper subset

of C separ ates the terminals in G(C). The cost of the

cut C, denoted |C|, equals the sum of its edge weights.

The minimum cut problem is to ﬁnd the cut with

smallest cost. There are many algorithms for this

problem with low-order polynomial complexity [1]; in

practice they run in near-linear time for our graphs.

Step 3.1 uses a single minimum cut on a graph

whose size is O (|P|). The graph is dynamically up-

dated after each iteration. The details of this mini-

mum cut are quite diﬀerent for the swap move and

the expansion move algorithms, as described in the

next two sections .

3 Finding the optimal swap move

Given an input labeling f (partition P) and a pair

of labels α, β, we wish to ﬁnd a labeling

f that min-

imizes E over all labe lings within one α-β swap of f .

This is the critical step in the algorithm given at the

top of Figure 1. Our technique is based on comput-

ing a lab e ling co rresponding to a minimum cut on a

...

{r,s}

{p,q}

Figure 2: An example of the gr aph G

αβ

for a 1D image .

The set of pixels in the image is P

αβ

= P

∪ P

where

= {p, r, s} and P

= {q, . . . , w}.

graph G

αβ

= hV

αβ

, E

αβ

i. The structure of this graph

is dynamically determined by the current par tition P

and by the labels α, β.

This section is organized as follows. First we de-

scribe the construction of G

αβ

for a given f (or P).

We show that cuts C on G

αβ

correspond in a natural

way to labelings f

which are within one α-β swap

move of f . Theorem 1 shows that the cost of a cut

is |C| = E(f

) plus a constant. A corollary from this

theorem states our main result that the desired label-

ing

f equals f

where C is a minimum cut on G

αβ

The structure of the graph is illustrated in Figure 2.

For legibility, this ﬁgure shows the case of 1D image.

For any image the structure of G

αβ

will be as follows.

The set of vertices includes the two terminals α and β,

as well as image pixels p in the sets P

and P

(that

is f

∈ {α, β}). Thus, the set of vertices V

αβ

consists

of α, β, and P

αβ

= P

∪ P

. Each pixel p ∈ P

αβ

connected to the terminals α and β by edges t

and

, respectively. For brevity, we will re fer to these

edges as t-links (terminal links). Each pair of pixels

{p, q} ⊂ P

αβ

which are neighbors (i.e. {p , q} ∈ N ) is

connected by a n edge e

{p,q}

which we will call an n- link

(neighbor link). The s e t of edges E

αβ

thus consists of

p∈P

αβ

, t

} (the t-links) and

{p,q}∈N

p,q∈P

αβ

{p,q}

(the

n-links). The weights assigned to the edges are

edge weight for

(α) +

q∈N

q6∈P

αβ

{p,q}

(α, f

) p ∈ P

αβ

(β) +

q∈N

q6∈P

αβ

{p,q}

(β, f

) p ∈ P

αβ

{p,q}

(α, β)

{p,q}∈N

p,q∈P

αβ

Proceedings of “Internation Conference on Computer Vision”, Kerkyra, Greece, September 1999 vol.I, p.380

Any cut C on G

αβ

must sever (include) exactly one t-

link for any pixel p ∈ P

αβ

: if neither t-link were in C,

there would be a path between the terminals; while if

both t-links were cut, then a proper subset of C would

be a cut. Thus, any cut leaves each pixel in P

αβ

with

exactly one t-link. This deﬁnes a natural labeling f

corresponding to a cut C on G

αβ











α if t

∈ C for p ∈ P

αβ

β if t

∈ C for p ∈ P

αβ

for p ∈ P , p /∈ P

αβ

(3)

In other words, if the pixel p is in P

αβ

then p is as-

signed label α when the cut C separa tes p from the

terminal α; similarly, p is assigned label β when C

separates p from the terminal β. If p is not in P

αβ

then we keep its initial label f

. This implies

Lemma 1 A labeling f

corresponding to a cut C on

αβ

is one α-β swap away from the initial labeling f.

It is easy to show that a cut C s evers an n-link

{p,q}

between neighboring pixels on G

αβ

if and only

if C leaves the pixels p and q connected to diﬀerent

terminals. Formally

Property 1 For any cut C and for any n-link e

{p,q}

a) If t

, t

∈ C then e

{p,q}

6∈ C.

b) If t

, t

∈ C t hen e

{p,q}

6∈ C.

c) If t

, t

∈ C then e

{p,q}

∈ C.

d) If t

, t

∈ C then e

{p,q}

∈ C.

These properties are illustrated in ﬁgure 3. The next

lemma is a conseq uence of property 1 and equation 3.

Lemma 2 For any cu t C and for any n-link e

{p,q}

|C ∩ e

{p,q}

| = V

{p,q}

, f

Lemmas 1 and 2 plus property 1 yield

Theorem 1 There is a one to one correspondence be-

tween cuts C on G

αβ

and labelings that are one α-β

swap from f. Moreover, the cost of a cut C on G

αβ

|C| = E(f

) plus a constant .

Proof: The ﬁrst part follows from the fact that the

severed t-links uniquely determine the labels assigned

to pixels p and n-links that must to be cut. We now

compute the cost of a cut C, which is

|C| =

p∈P

αβ

|C ∩{t

, t

}| +

{p,q}∈N

{p,q}⊂P

αβ

|C ∩e

{p,q}

|. (4)

{p,q}

cut

{p,q}

cut

{p,q}

cut

Property 1(a) Proper ty 1(b) Property 1(c,d)

Figure 3: Properties of a cut C on G

αβ

for two pixels

p, q ∈ N co nnected by an n-link e

{p,q}

. Dotted lines

show the edges cut by C and solid lines show the edges

remaining in the induced graph G(C) = hV, E − Ci.

Note that for p ∈ P

αβ

we have

|C ∩ {t

, t

}| = D

) +

q∈N

q6∈P

αβ

{p,q}

, f

Lemma 2 g ives the second term in (4). Thus, the total

cost of a cut C is

|C| =

p∈P

αβ

) +

{p,q}∈N

p or q ∈P

αβ

{p,q}

, f

This can be rewritten as |C| = E(f

) − K where

K =

p6∈P

αβ

) +

{p,q}∈N

{p,q}∩P

αβ

=∅

{p,q}

, f

)

is the same constant for all cuts C.

Corollary 1 The optimal α-β swap from f is

f = f

where C is the minimum cut on G

αβ

4 Finding the optimal expansion move

Given an input labeling f (partition P) and a la-

bel α, we wish to ﬁnd a labeling

f that minimizes E

ove r all labelings within one α-expansion of f. This is

the critical step in the algorithm given at the bottom

of Figure 1. In this section we describe a technique

that solves the problem assuming that each V

{p,q}

a metric, and thus satisﬁes the triangle inequality (2).

Some important examples of metrics are given in the

introduction. Our technique is based on computing a

labeling corresponding to a minimum cut on a graph

= hV

, E

i. The structure of this graph is deter-

mined by the current partition P and by the label α.

Proceedings of “Internation Conference on Computer Vision”, Kerkyra, Greece, September 1999 vol.I, p.381

−

{p,a}

−

{q,r}

{a,q}

−

Figure 4: An example of G

for a 1D image. The set of

pixels in the image is P = {p, q, r, s} and the current

partition is P = {P

, P

} where P

= {p}, P

{q, r}, and P

= {s}. Two auxiliary nodes a = a

{p,q}

b = a

{r,s}

are introduced between neighbor ing pixels

separated in the current partition. Auxiliary nodes

are added at the boundary of sets P

As b e fo re, the graph dynamically changes after each

iteration.

This section is organized as follows. First we de-

scribe the construction of G

for a given f (or P)

and α. We show that cuts C on G

correspond in

a natural way to labelings f

which are within one

α-expansion move of f. Then, ba sed on a number of

simple properties, we deﬁne a class of elementary cuts.

Theorem 2 shows that elementary cuts are in one to

one correspondence with labelings that are within one

α-expansion of f, and also that the cost of an elemen-

tary cut is |C| = E(f

). A corollary from this theo-

rem states our main result that the desired labeling

equals f

where C is a minimum cut on G

The structure of the graph is illustrated in Figure 4.

For legibility, this ﬁgure shows the case of 1D image.

The set of vertices includes the two terminals α and ¯α,

as well as all image pixels p ∈ P. In addition, for each

pair of neighboring pixels {p, q} ∈ N separated in the

current partition (i.e. f

6= f

) we create an auxiliary

vertex a

{p,q}

. Auxiliary nodes are introduced at the

boundaries between partition sets P

for l ∈ L. Thus,

the set of vertices is

= { α , ¯α , P ,

[

{p,q}∈N

6=f

{p,q}

Each pixel p ∈ P is c onnected to the terminals α a nd

¯α by t-links t

and t

¯α

, correspondingly. Each pair

of neighboring pixels {p, q} ∈ N which are not sepa-

rated by the partition P (i.e. f

= f

) is connected by

an n-link e

{p,q}

. For ea ch pair of neighboring pixels

{p, q} ∈ N such that f

6= f

we create a triplet of

edges E

{p,q}



{p,a}

, e

{a,q}

, t

¯α



where a = a

{p,q}

is the corresponding auxiliary node. The edges e

{p,a}

and e

{a,q}

connect pixels p and q to a

{p,q}

and the

t-link t

¯α

connects the auxiliary node a

{p,q}

to the ter-

minal ¯α. So we can write the set of all edges as

= {

[

p∈P

, t

¯α

} ,

[

{p,q}∈N

6=f

{p,q}

[

{p,q}∈N

{p,q}

The weights assigned to the edges are

edge weight for

¯α

∞ p ∈ P

¯α

) p 6∈ P

(α) p ∈ P

{p,a}

{p,q}

, α)

{a,q}

{p,q}

(α, f

) {p, q} ∈ N , f

6= f

¯α

{p,q}

, f

)

{p,q}

, α) {p, q} ∈ N , f

= f

As in section 3, any cut C o n G

must sever (in-

clude) exac tly one t-link for a ny pixel p ∈ P. This

deﬁnes a natural labeling f

corresponding to a cut C

on G

. Formally,

(

α if t

∈ C

if t

¯α

∈ C

∀p ∈ P. (5)

In other words, a pixel p is assigned label α if the cut

C separ ates p from the terminal α a nd, p is assigned

its old label f

if C separates p from ¯α. Note that for

p 6∈ P

the terminal ¯α represents labels assigned to

pixels in the initial labeling f. Clearly we have

Lemma 3 A labeling f

corresponding to a cut C on

is one α-expansion away from the initial labeling f.

It is also easy to show tha t a cut C severs an n-

link e

{p,q}

between neighboring pixe ls {p, q} ∈ N such

that f

= f

if and only if C leaves the pixels p and

q connected to diﬀerent terminals. In other words,

Property 1 holds when we substitute “¯α” for “β”. We

will refer to this as Property 1(¯α). Analogously, we

can show that Property 1(¯α) and equation (5) estab-

lish Lemma 2 for the n-links e

{p,q}

on G

Fast approximate energy minimization via graph cuts

Figures

Citations

Going deeper with convolutions

Pattern Recognition and Machine Learning

Machine Learning : A Probabilistic Perspective

The Split Bregman Method for L1-Regularized Problems

Computer Vision: Algorithms and Applications

References

Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images

Snakes : Active Contour Models

Normalized cuts and image segmentation

Normalized cuts and image segmentation

Determining optical flow

Related Papers (5)

What energy functions can be minimized via graph cuts

An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision

Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images

Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images

A taxonomy and evaluation of dense two-frame stereo correspondence algorithms

Frequently Asked Questions (8)

Q1. What are the contributions in "Fast approximate energy minimization via graph cuts" ?

Q2. What is the problem with simulated annealing?

Q3. What is the definition of a vision problem?

Q4. What is the goal of the study?

Q5. What is the cost of an elementary cut?

Q6. What is the importance of non-Potts energy functions?

Q7. What is the tlink weight for a cut C on G?

Q8. What is the simplest algorithm to find f?