Multi-label MRF Optimization

via a Least Squares s − t Cut

Ghassan Hamarneh

School of Computing Science, Simon Fraser University, Canada

Abstract. We approximate the k-label Markov random ﬁeld optimiza-

tion by a single binary (s−t) graph cut. Each vertex in the original graph

is replaced by only ceil(log

2

(k)) new vertices and the new edge weights

are obtained via a novel least squares solution approximating the origi-

nal data and label interaction penalties. The s − t cut produces a binary

“Gray” encoding that is unambiguously decoded into any of the original

k labels. We analyze the properties of the approximation and present

quantitative and qualitative image segmentation results, one of the sev-

eral computer vision applications of multi label-MRF optimization.

1 Introduction

Many visual computing tasks can be formulated as graph labeling problems,

e.g. segmentation and stereo-reconstruction [1], in which one out of k labels is

assigned to each graph vertex. This may be formulated as a k-way cut problem:

Given graph G(V,E)with|V | vertices v

j

∈ V and |E| edges e

v

i

,v

j

= e

ij

∈ E ⊆

V × V with weights w(e

ij

)=w

ij

> 0, ﬁnd an optimal k-cut C

∗

⊂ E with min-

imal cost |C

∗

| = argmin

C

|C|,where|C| =

e

ij

∈C

w

ij

, such that E\C breaks

the graph into k groups of labelled vertices. This k-cut formulation encodes the

semantics of the problem at hand (e.g. segmentation) into w

ij

. However, if the

optimal label assigned to a vertex depends on the labels assigned to other vertices

(e.g. to regularize the label ﬁeld), setting w

ij

∀i, j becomes less straightforward.

The Markov random ﬁeld (MRF) formulation captures this desired label inter-

action via an energy ξ(l) to be minimized with respect to the vertex labels l.

ξ(l)=

v

i

∈V

D

i

(l

i

)+λ

(v

i

,v

j

)∈E

V

ij

(l

i

,l

j

,d

i

,d

j

)(1)

where D

i

(l

i

) penalizes labeling v

i

with l

i

,andV

ij

, aka prior, penalizes assigning

labels (l

i

,l

j

) to neighboring vertices

1

. V

ij

may be inﬂuenced by the data value

d

i

at v

i

(e.g. image intensity). λ controls the relative importance of D

i

and V

ij

.

For labeling a P -pixel image, typically a graph G is constructed with |V | = P .

To encode D

i

(l

i

), G may be augmented with k new terminal vertices {t

j

}

k

j=1

;

each representing one of the k labels (Figure 2(a)) and w

v

i

,t

j

set inversely pro-

portional to D

i

(l

j

). When V

ij

= V

ij

(d

i

,d

j

), i.e. independent of l

i

and l

j

, V

ij

may be encoded by w

v

i

,v

j

∝ V

ij

(d

i

,d

j

). The random walker [2] globally solves a

1

Higher order priors, e.g. 3

rd

order V

ijk

(l

i

,l

j

,l

k

), are also possible.

G. Bebis et al. (Eds.): ISVC 2009, Part I, LNCS 5875, pp. 1055–1066, 2009.

c

Springer-Verlag Berlin Heidelberg 2009

1056 G. Hamarneh

labeling problem of this type, i.e. disregarding label interaction. Solving multi-

label MRF optimization for any interaction penalty remains an active research

area. In [3], the globally optimal binary (k=2) labeling is found using min-cut

max-ﬂow. For k>2 with convex prior, the global minimizer is attained by

replacing each single k-label variable with k [4] or by using k − 1 [5] boolean

variables. However, convex priors tend to over-smooth the label ﬁeld. For k>2

with metric or semi-metric priors, Boykov et al. performed range moves using

binary cuts to expand or swap labels [1]. Other range moves were proposed in

[6,7]. More recent approaches to multi-label MRF optimization were proposed

based on linear programming relaxation using primal-dual [8], message passing

or belief propagation [9], and partial optimality [10] (see [11] for a recent survey).

In this paper, we focus on optimal encoding of the k-label MRF energy solely

into the edge weights of a graph. We impose no restrictions on k, or on the order

(2

nd

or higher) or type (e.g. non-convex, non-metric, or spatially varying) of the

label interaction penalty. The calculated edge weights are optimal in the sense

that they minimize the least squares (LS) error when solving a linear system of

equations capturing the original MRF penalties. Further, we transform the multi-

labelling problem to a binary s−t cut, in which each vertex in the original graph is

replaced by the most compact boolean representation; only ceil(log

2

(k)) vertices

represent each k-label variable. In [12], a general framework for converting multi-

label problems to binary ones is presented. In contrast to our work, [12] solved a

system of equations to ﬁnd the boolean encoding function (not the edge weights),

they did not use LS, and their resulting binary problem can still include label

interaction. We perform a single (non-iterative and initialization-independent)

s − t cut to obtain a “Gray” binary encoding, which is then unambiguously

decoded into the k labels. Besides its optimality features, LS enables oﬄine pre-

computation of pseudoinverse matrices that can be re-used for diﬀerent graphs.

2Method

2.1 Reformulating the Multi-label MRF as an s − t Cut

Given a graph G(V,E), the objective is to label each vertex v

i

∈ V with a label

l

i

∈L

k

= {l

0

,l

1

, ..., l

k−1

}. Rather than labeling v

i

with l

i

∈L

k

, we replace v

i

with b vertices (v

ij

)

b

j=1

, and binary-label them with (l

ij

)

b

j=1

, i.e. l

ij

∈L

2

=

{l

0

,l

1

}. b is chosen such that 2

b

≥ k or b = ceil(log

2

(k)), i.e. alongenough

sequence of bits to be decoded into l

i

∈L

k

2

. To this end, we transform G(V,E)

into a new graph G

2

(V

2

,E

2

) with additional source s and sink t nodes, i.e.|V

2

| =

b|V | +2. E

2

includes terminal links E

tlinks

2

= E

t

2

∪ E

s

2

where |E

t

2

| = |E

s

2

| = |V

2

|;

neighborhood links E

nlinks

2

= E

ns

2

∪ E

nf

2

where |E

nlinks

2

| = b

2

|E|, |E

ns

2

| = b|E|,

and |E

nf

2

| =(b

2

− b)|E|; and intra-links E

intra

2

where |E

intra

2

| =

b

2

|V |.Figure

1 shows these diﬀerent types of edges. Following an s − t cut on G

2

, vertices v

ij

that remain connected to s are assigned label 0, and the remaining are connected

2

We distinguish between the decimal (base 10) and binary (base 2) encoding of the

labels using the notation (l

i

)

10

and (l

i

)

2

=(l

i1

,l

i2

, ··· ,l

ib

)

2

, respectively.

Multi-label MRF Optimization via a Least Squares s − t Cut 1057

t

s

v

21

v

22

v

23

v

24

v

11

v

12

v

13

v

14

v

41

v

42

v

43

v

44

v

31

v

32

v

33

v

34

v

51

v

52

v

53

v

54

E

2

t

E

2

s

E

2

intra

E

2

ns

v

61

v

62

v

63

v

64

v

71

v

72

v

73

v

74

E

2

nf

Fig. 1. Edge types in the s − t graph. Shown are seven groups of vertex quadruplets,

b=4, and only sample edges from E

t

2

,E

s

2

,E

ns

2

,E

nf

2

, and E

intra

2

.

t

s

t

0

t

1

t

2

t

3

v

1

v

2

v

3

v

4

v

5

v

21

v

22

v

31

v

32

v

41

v

42

v

51

v

52

l

0

l

1

l

2

l

3

v

1

v

2

v

3

v

4

v

5

00

01 10

11

v

11

v

12

(a)

(b)

(c)

l

0

l

1

l

2

l

3

Fig. 2. Reformulating the multi-label problem as an s − t cut. (a) Labeling vertices

{v

i

}

5

i=1

with labels {l

j

}

3

j=0

(only t-links are shown). (b) New graph with 2 terminal

nodes {s, t}, b =2newvertices(v

i1

and v

i2

inside the dashed circles) replacing each

v

i

in (a), and 2 terminal edges for each v

ij

.Ans − t cut on (b) is depicted as the green

curve. (c) Labeling v

i

in (a) is based on the s − t cut in (b): Pairs of (v

i1

,v

i2

) assigned

to (s, s) are labeled with binary string 00, (s, t) with 01, (t, s) with 10, and (t, t)with

11. The binary encodings {00,01,10,11} in turn reﬂect the original 4 labels {l

j

}

3

j=0

.

to t and assigned label 1. The string of b binary labels l

ij

∈L

2

assigned to v

ij

are

then decoded back into a decimal number indicating the label l

i

∈L

k

assigned

to v

i

(Figure 2).

It is important to set the edge weights of E

2

in such a way that decoding the

binary labels resulting from the s − t cut of G

2

results in optimal (or close to

optimal) labels for the original multi-label problem. To achieve this, we derive a

system of linear equations capturing the relation between the original multi-label

MRF penalties and the s − t cut cost incurred when generating diﬀerent label

conﬁgurations. We then calculate the weights of E

2

as the LS error solution to

these equations. The next sections expose the details.

2.2 Data Term Penalty: Severing T-Links and Intra-Links

The 1

st

order penalty D

i

(l

i

) in (1) is the cost of assigning l

i

to v

i

in G,which

entails assigning a corresponding sequence of binary labels (l

ij

)

b

j=1

to (v

ij

)

b

j=1

in G

2

. To assign (l

i

)

2

toastringofb vertices, appropriate terminal links must

be cut. To assign a 0 (resp. 1) label to v

ij

, the edge connecting v

ij

to t (resp.

1058 G. Hamarneh

11

01

10

00

100100

11

t

s

t

s

t

s

t

s

t

s

t

s

t

s

t

s

000 001 010 011 100 101 110 111

s

t

s

t

s

t

s

t

s

t

Fig. 3. The 2

b

ways of cutting through {v

ij

}

b

j=1

for b = 2 (left) and b = 3 (right) with

the resulting binary codes {00, 01, 10, 11} and {000, 001, ··· , 111}

s) must be severed (Figure 3). Therefore, the cost of severing t-links in G

2

to

assign l

i

to vertex v

i

in G is calculated as

D

tlinks

i

(l

i

)=

b

j=1

l

ij

w

v

ij

,s

+

¯

l

ij

w

v

ij

,t

(2)

where

¯

l

ij

denotes the unary complement (NOT) of l

ij

.TheG

2

s − t cut severing

the t-links, as per (2), will also result in severing edges in E

intra

2

(Figure 1). In

particular, e

im,in

∈ E

intra

2

will be severed iﬀ the s−t cut leaves v

im

connected to

one terminal, say s (resp. t), while v

in

remains connected to the other terminal

t (resp. s). If this condition holds, then w

v

im

,v

in

will contribute to the cost.

Therefore, the cost of severing intra-links in G

2

to assign l

i

to vertex v

i

in G is

D

intra

i

(l

i

)=

b

m=1

b

n=m+1

(l

im

⊕ l

in

) w

v

im

,v

in

(3)

where ⊕ denotes binary XOR. The total data penalty is the sum of (2) and (3),

D

i

(l

i

)=D

tlinks

i

(l

i

)+D

intra

i

(l

i

). (4)

2.3 Prior Term Penalty: Severing N-Links

The interaction penalty V

ij

(l

i

,l

j

,d

i

,d

j

) for assigning l

i

to v

i

and l

j

to neighboring

v

j

in G must equal the cost of assigning a sequence of binary labels (l

im

)

b

m=1

to

(v

im

)

b

m=1

and (l

jn

)

b

n=1

to (v

in

)

b

n=1

in G

2

. The cost of this cut can be calculated

as (Figure 4)

V

ij

(l

i

,l

j

,d

i

,d

j

)=

b

m=1

b

n=1

(l

im

⊕ l

jn

) w

v

im

,v

jn

. (5)

This eﬀectively adds the edge weight between v

im

and v

jn

to the cut cost iﬀ the

cut results in one vertex of the edge connected to one terminal (s or t) while

the other vertex connected to the other terminal (t or s). Note that we impose

no restrictions on the left hand side of (5), e.g. it could reﬂect non-convex or

non-metric priors, spatially-varying, or even higher order label interaction.

Multi-label MRF Optimization via a Least Squares s − t Cut 1059

v

i

v

j

v

i

v

j

v

i1

v

j1

v

i2

v

j2

v

i3

v

j3

v

i1

v

j1

v

i2

v

j2

00 00

000 000

01 10 11 10 11 11

011 100 111 110

Fig. 4. Severing n-links between neighboring vertices v

i

and v

j

for b = 2 (four examples

are shown in the top row) and b = 3 (three examples in the bottom row). The cut is

depicted as a red curve. In the last two examples for b = 3, the colored vertices are

translated while maintaining the n-links in order to clearly show that the severed n-links

for each case follow (5).

2.4 Edge Weight Approximation with Least Squares

Equations (4) and (5) dictate the relationship between the penalty terms (D

i

and V

ij

) of the original multi-label problem and the severed edge weights w

ij,mn

;

∀e

ij,mn

∈ E

2

of the s − t graph G

2

. What remains missing before applying the

s − t cut, however, is to ﬁnd these edge weights.

Edge weights of t-links and intra-links. For b =1(i.e. binary labelling),

(3) simpliﬁes to D

intra

i

(l

i

) = 0 and (4) simpliﬁes to D

i

(l

i

)=l

i1

w

v

i1

,s

+

¯

l

i1

w

v

i1

,t

.

With l

i

= l

i1

for b = 1, substituting the two possible values for l

i

= {l

0

,l

1

},we

obtain

l

i

= l

0

⇒ D

i

(l

0

)=l

0

w

v

i1

,s

+

¯

l

0

w

v

i1

,t

=0w

v

i1

,s

+1w

v

i1

,t

l

i

= l

1

⇒ D

i

(l

1

)=l

1

w

v

i1

,s

+

¯

l

1

w

v

i1

,t

=1w

v

i1

,s

+0w

v

i1

,t

(6)

which can be written in matrix form A

1

X

i

1

= B

i

1

as

01

10

w

v

i1

,s

w

v

i1

,t

=

D

i

(l

0

)

D

i

(l

1

)

where X

i

1

is the vector of unknown edge weights connecting vertex v

i1

to s and t,

B

i

1

is the data penalty for v

i

,andA

1

is the matrix of coeﬃcients. The subscript

1inA

1

,X

i

1

, and B

i

1

indicates that this matrix equation is for b = 1. Clearly, the

solution is trivial and expected: w

v

i1

,s

= D

i

(l

1

)andw

v

i1

,t

= D

i

(l

0

)

For b = 2, we address multi-label problems of k = {3, 4},or2

b−1

=2<k≤

2

b

= 4 labels. Substituting the 2

b

= 4 possible label values, ((0,0),(0,1),(1,0),

and (1,1)), of (l

i

)

2

=(l

i1

,l

i2

) in (4) we obtain

(0, 0) ⇒ D

i

(l

0

)=0w

v

i1

,s

+1w

v

i1

,t

+0w

v

i2

,s

+1w

v

i2

,t

+0w

v

i1

,v

i2

(0, 1) ⇒ D

i

(l

1

)=0w

v

i1

,s

+1w

v

i1

,t

+1w

v

i2

,s

+0w

v

i2

,t

+1w

v

i1

,v

i2

(1, 0) ⇒ D

i

(l

2

)=1w

v

i1

,s

+0w

v

i1

,t

+0w

v

i2

,s

+1w

v

i2

,t

+1w

v

i1

,v

i2

(1, 1) ⇒ D

i

(l

3

)=1w

v

i1

,s

+0w

v

i1

,t

+1w

v

i2

,s

+0w

v

i2

,t

+0w

v

i1

,v

i2

(7)

which can be written in matrix form A

2

X

i

2

= B

i

2

as