What energy functions can be minimized via graph cuts

doi:10.1109/TPAMI.2004.1262177

Journal Article•DOI•

What energy functions can be minimized via graph cuts

Vladimir Kolmogorov¹, R. Zabin¹•Institutions (1)

01 Jan 2004-Vol. 26, Iss: 2, pp 147-159

TL;DR: This work gives a precise characterization of what energy functions can be minimized using graph cuts, among the energy functions that can be written as a sum of terms containing three or fewer binary variables.

read less

Abstract: In the last few years, several new algorithms based on graph cuts have been developed to solve energy minimization problems in computer vision. Each of these techniques constructs a graph such that the minimum cut on the graph also minimizes the energy. Yet, because these graph constructions are complex and highly specific to a particular energy function, graph cuts have seen limited application to date. In this paper, we give a characterization of the energy functions that can be minimized by graph cuts. Our results are restricted to functions of binary variables. However, our work generalizes many previous constructions and is easily applicable to vision problems that involve large numbers of labels, such as stereo, motion, image restoration, and scene reconstruction. We give a precise characterization of what energy functions can be minimized using graph cuts, among the energy functions that can be written as a sum of terms containing three or fewer binary variables. We also provide a general-purpose construction to minimize such an energy function. Finally, we give a necessary condition for any energy function of binary variables to be minimized by graph cuts. Researchers who are considering the use of graph cuts to optimize a particular energy function can use our results to determine if this is possible and then follow our construction to create the appropriate graph. A software implementation is freely available.

...read moreread less

Summary (3 min read)

Jump to: [1 Introduction and summary of results] – [1.1 Summary of our results] – [1.2 Organization] – [2 Overview of graph cuts] – [3 Defining graph representability] – [4.1 Example: pixel-labeling via expansion moves] – [6.2 Proof of theorems 3 and 6: the constructive part] – [Functions of two variables Let E(x 1 , x 2 ) be a function of two variables represented by a table] – [Functions of three variables Now let us consider a regular function E of three variables. Let us represent it as a table] – [Functions of many variables Finally let us consider a regular function E] and [6.3 Proof of theorem 7]

1 Introduction and summary of results

Many of the problems that arise in early vision can be naturally expressed in terms of energy minimization.
Researchers typically use general purpose global optimization techniques such as simulated annealing [3, 11] , which is extremely slow in practice.
The experimental results produced by these algorithms are also quite good; for example, two recent evaluations of stereo algorithms using real imagery with ground truth found that a graph cut method gave the best overall performance [23, 25] .
Minimizing an energy function via graph cuts, however, remains a technically difficult problem.
The authors results provide a significant generalization of the energy minimization methods used in [4-6, 8, 13, 17, 24] , and show how to minimize an interesting new class of energy functions.

1.1 Summary of our results

The main result in this paper is a precise characterization of the functions in F 3 that can be minimized using graph cuts, together with a graph construction for minimizing such functions.
Note that in this paper the authors only consider binary-valued variables.
As an example, the authors will show in section 4.1 how to use their results to solve the pixel-labeling problem, even though the pixels have many possible labels.
The authors also identify an interesting class of class of energy functions that have not yet been minimized using graph cuts.
In the language of Markov Random Fields [11, 19] , these methods consider first-order MRF's.

1.2 Organization

Section 5 contains their main theorems for other classes.
Detailed proofs of their theorems, together with the graph constructions, are deferred to section 6.

2 Overview of graph cuts

The minimum s-t-cut problem is to find a cut C with the smallest cost.
Due to the theorem of Ford and Fulkerson [10] this is equivalent to computing the maximum flow from the source to sink.
There are many algorithms which solve this problem in polynomial time with small constants [1, 12] .

3 Defining graph representability

Each cut on G has some cost; therefore, G represents the energy function mapping from all cuts on G to the set of nonnegative real numbers.
Thus a natural question to ask is what is the class of energy functions for which the authors can construct a graph that represents it.
Above the authors used each node (except the source and the sink) for encoding one binary variable.
The authors will summarize graph constructions that they allow in the following definition.

4.1 Example: pixel-labeling via expansion moves

In this section the authors show how to apply this theorem to solve the pixel-labeling problem.
The authors will show how their method can be used to derive the expansion move algorithm developed in [8] .
Note that the key technical step in this algorithm can be naturally expressed as minimizing an energy function involving binary variables.
In their paper, it is not clear whether this is an accidental property of the construction (i.e., they leave open the possibility that a more clever graph cut construction may overcome this restriction).
Using their results, the authors can easily show this is not the case.

6.2 Proof of theorems 3 and 6: the constructive part

In this section the authors will give the constructive part of the proof: given a regular energy function from class F 3 they will show how to construct a graph which represents it.
First the authors will consider regular functions of two variables, then regular functions of three variables and finally regular functions of the form as in the theorem 6.
This will also prove the constructive part of the theorem 3.
Indeed, suppose a function is from the class F 2 and each term in the sum satisfies the condition given in the theorem 3 (i.e. regular).
Then each term is graph-implementable (as the authors will show in this section) and, hence, the function is graph-implementable as well according to the lemma 10.

Functions of two variables Let E(x 1 , x 2 ) be a function of two variables represented by a table

Now the authors can easily constuct a graph G which represents this function.
Note that the authors did not introduce any additional nodes for representing binary interactions of binary variables.
This is in contrast to the construction in [8] which added auxiliary nodes for representing energies that the authors just considered.
The authors construction yields a smaller graph and, thus, the minimum cut can potentially be computed faster.

Functions of three variables Now let us consider a regular function E of three variables. Let us represent it as a table

It's easy to check that these transformations preserve the functional π.
The authors need to show that all terms here are graph-representable, then lemma 10 will imply that E is graph-representable as well.
The first three terms are regular functions depending only on two variables and thus are graph-representable as was shown in the previous section.
The graph G that represents this term can be constructed as follows.

Functions of many variables Finally let us consider a regular function E

Each term in the sum need not necessarily be regular.
This can be done using the following lemma and a trivial induction argument.
Therefore the authors did not introduce any nonregular projections for these terms.

6.3 Proof of theorem 7

Hence, all terms E i,j are regular, i.e. they satisfy the condition in the theorem 3.
The following sequence of operations shows one possible way to push the maximum flow through this graph.

Did you find this useful? Give us your feedback

Content maybe subject to copyright Report

What Energy Functions can be Minimized via

Graph Cuts?

Vladimir Kolmogorov and Ramin Zabih

Computer Science Department, Cornell University, Ithaca, NY 14853

vnk@cs.cornell.edu, rdz@cs.cornell.edu

Abstract. Many problems in computer vision can be naturally phrased

in terms of energy minimization. In the last few years researchers have

developed a powerful class of energy minimization methods based on

graph cuts. These techniques construct a specialized graph, such that

the minimum cut on the graph also minimizes the energy. The mini-

mum cut in turn is eﬃciently computed by max ﬂow algorithms. Such

methods have been successfully applied to a number of important vision

problems, including image restoration, motion, stereo, voxel occupancy

and medical imaging. However, each graph construction to date has been

highly speciﬁc for a particular energy function. In this paper we address

a much broader problem, by characterizing the class of energy functions

that can be minimized by graph cuts, and by giving a general-purpose

construction that minimizes any energy function in this class. Our results

generalize several previous vision algorithms based on graph cuts, and

also show how to minimize an interesting new class of energy functions.

1 Introduction and summary of results

Many of the problems that arise in early vision can be naturally expressed in

terms of energy minimization. The computational task of minimizing the energy

is usually quite diﬃcult, as it generally requires minimizing a non-convex func-

tion in a space with thousands of dimensions. If the functions have a special

form they can be solved eﬃciently using dynamic programming [2]. However,

researchers typically use general purpose global optimization techniques such as

simulated annealing [3, 11], which is extremely slow in practice.

In the last few years, however, researchers have developed a new approach

based on graph cuts. The basic technique is to construct a specialized graph

for the energy function to be minimized, such that the minimum cut on the

graph in turn minimizes the energy. The minimum cut in turn can be computed

very eﬃciently by max ﬂow algorithms. These methods have been successfully

used for a wide variety of vision problems including image restoration [7, 8, 16,

13], stereo and motion [4, 7, 8, 15, 18, 21, 22], voxel occupancy [24] and medical

imaging [6, 5, 17]. The output of these algorithms is generally a solution with

some interesting theoretical quality guarantee. In some cases [7, 15, 16, 13, 21] it

is the global minimum, in other cases a local minimum in a strong sense [8] that is

within a known factor of the global minimum. The experimental results produced

by these algorithms are also quite good; for example, two recent evaluations of

stereo algorithms using real imagery with ground truth found that a graph cut

method gave the best overall performance [23, 25].

Minimizing an energy function via graph cuts, however, remains a techni-

cally diﬃcult problem. Each paper constructs its own graph speciﬁcally for its

individual energy function, and in some of these cases (especially [18, 8]) the

construction is fairly complex. One consequence is that researchers sometimes

use heuristic methods for optimization, even in situations where the exact global

minimum can be computed via graph cuts [14, 20, 9]. The goal of this paper is

to precisely characterize the class of energy functions that can be minimized via

graph cuts, and to give a general-purpose graph construction that minimizes any

energy function in this class. Our results provide a signiﬁcant generalization of

the energy minimization methods used in [4–6, 8, 13, 17, 24], and show how to

minimize an interesting new class of energy functions.

1.1 Summary of our results

In this paper we consider two classes of energy functions. Let {x

,...,x

}, x

∈

{0, 1} be a set of binary-valued variables. We deﬁne the class F

to be functions

of the form

E(x

,...,x



i<j

i,j

). (1)

We deﬁne the class F

to be functions of the form

E(x

,...,x



i<j

i,j



i<j<k

i,j,k

). (2)

Obviously, the class F

is a strict subset of the class F

The main result in this paper is a precise characterization of the functions in

that can be minimized using graph cuts, together with a graph construction

for minimizing such functions. Moreover, we give a necessary condition for all

other classes which must be met for a function to be minimized via graph cuts.

Note that in this paper we only consider binary-valued variables. Most of the

previous work with graph cuts cited above considers energy functions that involve

variables with more than 2 possible values. For example, the work on stereo, mo-

tion and image restoration described in [8] addresses the standard pixel-labeling

problem in early vision. In these pixel-labeling problems, the variables represent

individual pixels, and the possible values for an individual variable represent its

possible displacements or intensities. However, many of the graph cut methods

that handle multiple possible values actually consider a pair of labels at a time.

As a consequence, even though we only address binary-valued variables, our re-

sults generalize the algorithms given in [4–6, 8, 13, 17, 24]. As an example, we will

show in section 4.1 how to use our results to solve the pixel-labeling problem,

even though the pixels have many possible labels.

We also identify an interesting class of class of energy functions that have not

yet been minimized using graph cuts. All of the previous work with graph cuts

involves a neighborhood system that is deﬁned on pairs of pixels. In the language

of Markov Random Fields [11, 19], these methods consider ﬁrst-order MRF’s.

The associated energy functions lie in F

. Our results allow for the minimization

of energy functions in the larger class F

, and thus for neighborhood systems

involve triples of pixels.

1.2 Organization

The rest of the paper is organized as follows. In section 2 we give an overview of

graph cuts. In section 3 we formalize the problem that we want to solve. Section 4

contains our main theorem for the class of functions F

and shows how it can be

used. Section 5 contains our main theorems for other classes. Detailed proofs of

our theorems, together with the graph constructions, are deferred to section 6.

A summary of the actual graph constructions given in the appendix.

2 Overview of graph cuts

Suppose G =(V, E) is a directed graph with two special vertices (terminals),

namely the source s and the sink t.Ans-t-cut (or just a cut as we will refer to

it later) C = S, T is a partition of vertices in V into two disjoint sets S and T ,

such that s ∈ S and t ∈ T . The cost of the cut is the cut is the sum of costs of

all edges that go from S to T :

c(S, T )=



u∈S,v∈T,(u,v)∈E

c(u, v).

The minimum s-t-cut problem is to ﬁnd a cut C with the smallest cost. Due

to the theorem of Ford and Fulkerson [10] this is equivalent to computing the

maximum ﬂow from the source to sink. There are many algorithms which solve

this problem in polynomial time with small constants [1, 12].

It is convenient to denote a cut C = S, T by a labeling f mapping from the

set of the nodes V−{s, t} to {0, 1} where f(v)=0meansthatv ∈ S,and

f(v)=1meansthatv ∈ T . We will use this notation later.

3 Deﬁning graph representability

Let us consider a graph G =(V, E) with terminals s and t,thusV = {v

,...,v

,s,t}.

Each cut on G has some cost; therefore, G represents the energy function map-

ping from all cuts on G to the set of nonnegative real numbers. Any cut can be

described by n binary variables x

,...,x

corresponding to nodes in G (exclud-

ing the source and the sink): x

=0whenv

∈ S,andx

=1whenv

∈ T .

Therefore, the energy E that G represents can be viewed as a function of n

binary variables: E(x

,...,x

) is equal to the cost of the cut deﬁned by the

conﬁguration x

,...,x

∈{0, 1}).

We can eﬃciently minimize E by computing the minimum s-t-cut on G.Thus

a natural question to ask is what is the class of energy functions for which we

can construct a graph that represents it.

We can also generalize our construction. Above we used each node (except the

source and the sink) for encoding one binary variable. Instead we can specify a

subset V

= {v

,...,v

}⊂V−{s, t} and introduce variables only for the nodes

in this set. Then there may be several cuts corresponding to a conﬁguration

,...,x

. If we deﬁne the energy E(x

,...,x

) as the minimum among costs

of all such cuts then the minimum s-t-cut on G will again yield the conﬁguration

which minimizes E.

Finally, note that the conﬁguration that minimizes E will not change if we

add a constant to E.

We will summarize graph constructions that we allow in the following deﬁ-

nition.

Deﬁnition 1. AfunctionE of n binary variables is called graph-representable

if there exists a graph G =(V, E) with terminals s and t and a subset of nodes

= {v

,...,v

}⊂V−{s, t} such that for any conﬁguration x

,...,x

the value

of the energy E(x

,...,x

) is equal to a constant plus the cost of the minimum

s-t-cut among all cuts C = S, T in which v

∈ S,ifx

=0,andv

∈ T ,ifx

(1 ≤ i ≤ n). We say that E is exactly represented by G, V

if this constant is

zero.

The following lemma is an obvious consequence of this deﬁnition.

Lemma 2. Suppose the energy function E is graph-representable by a graph G

and a subset V

. Then it is possible to ﬁnd the exact minimum of E in polynomial

time by computing the minimum s-t-cut on G.

In this paper we will give a complete characterization of the classes F

and

in terms of graph representability, and show how to construct graphs for

minimizing graph-representable energies within these classes. Moreover, we will

give a necessary condition for all other classes which must be met for a function

to be graph-representable. Note that it would be suﬃce to consider only the class

since F

⊂F

. However, the condition for F

is simpler so we will consider

it separately.

4TheclassF

Our main result for the class F

is the following theorem.

Theorem 3. Let E be a function of n binary variables from the class F

, i.e.

itcanbewrittenasthesum

E(x

,...,x



i<j

i,j

Then E is graph-representable if and only if each term E

i,j

satisﬁes the inequality

i,j

(0, 0) + E

i,j

(1, 1) ≤ E

i,j

(0, 1) + E

i,j

(1, 0).

4.1 Example: pixel-lab eling via expansion moves

In this section we show how to apply this theorem to solve the pixel-labeling

problem. In this problem, are given the set of pixels P and the set of labels L.

The goal is to ﬁnd a labeling l (i.e. a mapping from the set of pixels to the set

of labels) which minimizes the energy

E(l)=



p∈P



p,q∈N

p,q

)

where N⊂P×Pis a neighborhood system on pixels. Without loss of generality

we can assume that N contains only ordered pairs p, q for which p<q(since

we can combine two terms V

p,q

and V

q,p

into one term). We will show how our

method can be used to derive the expansion move algorithm developed in [8].

This problem is NP-hard if |L| > 2 [8]. [8] gives an approximation algorithm

for minimizing this energy. A single step of this algorithm is an operation called

an α-expansion. Suppose that we have some current conﬁguration l

,andwe

are considering a label α ∈L. During the α-expansion operation a pixel p is

allowed either to keep its old label l

or to switch to a new label α: l

= l

= α. The key step in the approximation algorithm presented in [8] is to ﬁnd

the optimal expansion operation, i.e. the one that leads to the largest reduction

in the energy E. This step is repeated until there is no choice of α where the

optimal expansion operation reduces the energy.

[8] constructs a graph which contains nodes corresponding to pixels in P.

The following encoding is used: if f(p) = 0 (i.e., the node p is in the source set)

then l

= l

;iff (p) = 1 (i.e., the node p is in the sink set) then l

= α.

Note that the key technical step in this algorithm can be naturally expressed

as minimizing an energy function involving binary variables. The binary variables

correspond to pixels, and the energy we wish to minimize can be written formally

E(x

,...,x



p∈P

)) +



p,q∈N

p,q

),l

)), (3)

where

∀p ∈P l



α, x

=1.

We can demonstrate the power of our results by deriving an important re-

striction on this algorithm. In order for the graph cut construction of [8] to

work, the function V

p,q

is required to be a metric. In their paper, it is not clear

whether this is an accidental property of the construction (i.e., they leave open

HTML Viewer

What energy functions can be minimized via graph cuts

Summary (3 min read)

1 Introduction and summary of results

1.1 Summary of our results

1.2 Organization

2 Overview of graph cuts

3 Defining graph representability

4.1 Example: pixel-labeling via expansion moves

6.2 Proof of theorems 3 and 6: the constructive part

Functions of two variables Let E(x 1 , x 2 ) be a function of two variables represented by a table

Functions of three variables Now let us consider a regular function E of three variables. Let us represent it as a table

Functions of many variables Finally let us consider a regular function E

6.3 Proof of theorem 7

Citations

Cites background or methods from "What energy functions can be minimi..."

Cites methods from "What energy functions can be minimi..."

References

"What energy functions can be minimi..." refers background in this paper

"What energy functions can be minimi..." refers background in this paper

Related Papers (5)