What future works have the authors mentioned in the paper "Multi-label mrf optimization via a least squares s − t cut" ?

More elaborate analysis of the algorithm ( e. g. error bounds, value of the minimized energy, computational complexity, running times ) and comparison with state-of-the-art approaches on standard benchmarks is left for future work.

What is the LS error in edge weights?

The LS error in edge weights induces error in the s − t cut or binary labeling, which is decoded into a suboptimal solution to the multi-label problem.

how much is a severing edge in g2?

the cost of severing intra-links in G2 to assign li to vertex vi in G isDintrai (li) =b∑m=1b∑n=m+1(lim ⊕ lin) wvim,vin (3)where ⊕ denotes binary XOR.

what is the solution to the original multi-label problem?

every sequence of b binary labels (vij)bj=1 is decoded to a decimal label li ∈ Lk = {l0, l1, ..., lk−1}, ∀vi ∈ V , i.e. the solution to the original multi-label MRF problem.

What is the noise level of the edge weights?

The authors then construct GLSE , a noisy version of G, by adding uniformly distributed noise with support [0, noise level] to the edge weights.

What is the purpose of this paper?

the authors are exploring the use of non-negative least squares (e.g. Chapter 23 in [16]) to guarantee non-negative edge weights as well as quantifying the benefits of the Gray encoding.

what is the way to label a graph?

The Markov random field (MRF) formulation captures this desired label interaction via an energy ξ(l) to be minimized with respect to the vertex labels l.ξ(l) = ∑vi∈V Di(li) + λ∑(vi,vj)∈E Vij(li, lj , di, dj) (1)where Di(li) penalizes labeling vi with li, and Vij , aka prior, penalizes assigning labels (li, lj) to neighboring vertices1.

what is the simplest solution to the multi-label problem?

If Vij(li, lj , di, dj) = Vij(di, dj), i.e. label-independent, the authors can simply ignore the outcome of li ⊕ lj by setting it to a constant.

what is the cost of severing a tlink in g2?

(4)The interaction penalty Vij(li, lj, di, dj) for assigning li to vi and lj to neighboring vj in G must equal the cost of assigning a sequence of binary labels (lim)bm=1 to (vim)bm=1 and (ljn) b n=1 to (vin) b n=1 in G2.

what is the cost of severing t-links in g2?

the cost of severing t-links in G2 to assign li to vertex vi in G is calculated asDtlinksi (li) =b∑j=1lijwvij ,s + l̄ijwvij ,t (2)where l̄ij denotes the unary complement (NOT) of lij .

how many pairs of labels can be substituted?

For b = 2, (5) simplifies toVij(li, lj , di, dj) = (li1 ⊕ lj1)wvi1,vj1 + (li1 ⊕ lj2)wvi1,vj2+ (li2 ⊕ lj1)wvi2,vj1 + (li2 ⊕ lj2)wvi2,vj2 (15)The authors can now substitute all possible 2b2b = 22b = 16 combinations of the pairs of interacting labels (li, lj)∈{l0, l1, l2, l3}×{l0, l1, l2, l3}, or equivalently, ((li)2, (lj)2) ∈ {00, 01, 10, 11}× {00, 01, 10, 11}.

What is the solution to the multi-label problem?

in the general case when Vij depends on the labels li and lj of the neighboring vertices vi and vj , a single edge weight is insufficient to capture such elaborate label interactions, intuitively, because wi,j needs to take on a different value for every pair of labels.

(Open Access) Multi-label MRF Optimization via a Least Squares s - t Cut (2009) | Ghassan Hamarneh

Q: What contributions have the authors mentioned in the paper "Multi-label mrf optimization via a least squares s − t cut" ?

The authors analyze the properties of the approximation and present quantitative and qualitative image segmentation results, one of the several computer vision applications of multi label-MRF optimization.

Q: What is the simplest way to decode labels?

The authors perform a single (non-iterative and initialization-independent) s − t cut to obtain a “Gray” binary encoding, which is then unambiguously decoded into the k labels.

Multi-label MRF Optimization

via a Least Squares s − t Cut

Ghassan Hamarneh

School of Computing Science, Simon Fraser University, Canada

Abstract. We approximate the k-label Markov random ﬁeld optimiza-

tion by a single binary (s−t) graph cut. Each vertex in the original graph

is replaced by only ceil(log

(k)) new vertices and the new edge weights

are obtained via a novel least squares solution approximating the origi-

nal data and label interaction penalties. The s − t cut produces a binary

“Gray” encoding that is unambiguously decoded into any of the original

k labels. We analyze the properties of the approximation and present

quantitative and qualitative image segmentation results, one of the sev-

eral computer vision applications of multi label-MRF optimization.

1 Introduction

Many visual computing tasks can be formulated as graph labeling problems,

e.g. segmentation and stereo-reconstruction [1], in which one out of k labels is

assigned to each graph vertex. This may be formulated as a k-way cut problem:

Given graph G(V,E)with|V | vertices v

∈ V and |E| edges e

= e

∈ E ⊆

V × V with weights w(e

)=w

> 0, ﬁnd an optimal k-cut C

∗

⊂ E with min-

imal cost |C

∗

| = argmin

|C|,where|C| =



∈C

, such that E\C breaks

the graph into k groups of labelled vertices. This k-cut formulation encodes the

semantics of the problem at hand (e.g. segmentation) into w

. However, if the

optimal label assigned to a vertex depends on the labels assigned to other vertices

(e.g. to regularize the label ﬁeld), setting w

∀i, j becomes less straightforward.

The Markov random ﬁeld (MRF) formulation captures this desired label inter-

action via an energy ξ(l) to be minimized with respect to the vertex labels l.

ξ(l)=



∈V

)+λ



)∈E

)(1)

where D

) penalizes labeling v

with l

,andV

, aka prior, penalizes assigning

labels (l

) to neighboring vertices

. V

may be inﬂuenced by the data value

at v

(e.g. image intensity). λ controls the relative importance of D

and V

For labeling a P -pixel image, typically a graph G is constructed with |V | = P .

To encode D

), G may be augmented with k new terminal vertices {t

}

j=1

;

each representing one of the k labels (Figure 2(a)) and w

set inversely pro-

portional to D

). When V

= V

), i.e. independent of l

and l

, V

may be encoded by w

∝ V

). The random walker [2] globally solves a

Higher order priors, e.g. 3

order V

ijk

), are also possible.

G. Bebis et al. (Eds.): ISVC 2009, Part I, LNCS 5875, pp. 1055–1066, 2009.

 Springer-Verlag Berlin Heidelberg 2009

1056 G. Hamarneh

labeling problem of this type, i.e. disregarding label interaction. Solving multi-

label MRF optimization for any interaction penalty remains an active research

area. In [3], the globally optimal binary (k=2) labeling is found using min-cut

max-ﬂow. For k>2 with convex prior, the global minimizer is attained by

replacing each single k-label variable with k [4] or by using k − 1 [5] boolean

variables. However, convex priors tend to over-smooth the label ﬁeld. For k>2

with metric or semi-metric priors, Boykov et al. performed range moves using

binary cuts to expand or swap labels [1]. Other range moves were proposed in

[6,7]. More recent approaches to multi-label MRF optimization were proposed

based on linear programming relaxation using primal-dual [8], message passing

or belief propagation [9], and partial optimality [10] (see [11] for a recent survey).

In this paper, we focus on optimal encoding of the k-label MRF energy solely

into the edge weights of a graph. We impose no restrictions on k, or on the order

or higher) or type (e.g. non-convex, non-metric, or spatially varying) of the

label interaction penalty. The calculated edge weights are optimal in the sense

that they minimize the least squares (LS) error when solving a linear system of

equations capturing the original MRF penalties. Further, we transform the multi-

labelling problem to a binary s−t cut, in which each vertex in the original graph is

replaced by the most compact boolean representation; only ceil(log

(k)) vertices

represent each k-label variable. In [12], a general framework for converting multi-

label problems to binary ones is presented. In contrast to our work, [12] solved a

system of equations to ﬁnd the boolean encoding function (not the edge weights),

they did not use LS, and their resulting binary problem can still include label

interaction. We perform a single (non-iterative and initialization-independent)

s − t cut to obtain a “Gray” binary encoding, which is then unambiguously

decoded into the k labels. Besides its optimality features, LS enables oﬄine pre-

computation of pseudoinverse matrices that can be re-used for diﬀerent graphs.

2Method

2.1 Reformulating the Multi-label MRF as an s − t Cut

Given a graph G(V,E), the objective is to label each vertex v

∈ V with a label

∈L

= {l

, ..., l

k−1

}. Rather than labeling v

with l

∈L

, we replace v

with b vertices (v

)

j=1

, and binary-label them with (l

)

j=1

, i.e. l

∈L

}. b is chosen such that 2

≥ k or b = ceil(log

(k)), i.e. alongenough

sequence of bits to be decoded into l

∈L

. To this end, we transform G(V,E)

into a new graph G

) with additional source s and sink t nodes, i.e.|V

| =

b|V | +2. E

includes terminal links E

tlinks

= E

∪ E

where |E

| = |E

| = |V

neighborhood links E

nlinks

= E

∪ E

where |E

nlinks

| = b

|E|, |E

| = b|E|,

and |E

| =(b

− b)|E|; and intra-links E

intra

where |E

intra

| =





|V |.Figure

1 shows these diﬀerent types of edges. Following an s − t cut on G

, vertices v

that remain connected to s are assigned label 0, and the remaining are connected

We distinguish between the decimal (base 10) and binary (base 2) encoding of the

labels using the notation (l

)

and (l

)

=(l

, ··· ,l

)

, respectively.

Multi-label MRF Optimization via a Least Squares s − t Cut 1057

intra

Fig. 1. Edge types in the s − t graph. Shown are seven groups of vertex quadruplets,

b=4, and only sample edges from E

, and E

intra

01 10

(a)

(b)

(c)

Fig. 2. Reformulating the multi-label problem as an s − t cut. (a) Labeling vertices

}

i=1

with labels {l

}

j=0

(only t-links are shown). (b) New graph with 2 terminal

nodes {s, t}, b =2newvertices(v

and v

inside the dashed circles) replacing each

in (a), and 2 terminal edges for each v

.Ans − t cut on (b) is depicted as the green

curve. (c) Labeling v

in (a) is based on the s − t cut in (b): Pairs of (v

) assigned

to (s, s) are labeled with binary string 00, (s, t) with 01, (t, s) with 10, and (t, t)with

11. The binary encodings {00,01,10,11} in turn reﬂect the original 4 labels {l

}

j=0

to t and assigned label 1. The string of b binary labels l

∈L

assigned to v

are

then decoded back into a decimal number indicating the label l

∈L

assigned

to v

(Figure 2).

It is important to set the edge weights of E

in such a way that decoding the

binary labels resulting from the s − t cut of G

results in optimal (or close to

optimal) labels for the original multi-label problem. To achieve this, we derive a

system of linear equations capturing the relation between the original multi-label

MRF penalties and the s − t cut cost incurred when generating diﬀerent label

conﬁgurations. We then calculate the weights of E

as the LS error solution to

these equations. The next sections expose the details.

2.2 Data Term Penalty: Severing T-Links and Intra-Links

The 1

order penalty D

) in (1) is the cost of assigning l

to v

in G,which

entails assigning a corresponding sequence of binary labels (l

)

j=1

to (v

)

j=1

in G

. To assign (l

)

toastringofb vertices, appropriate terminal links must

be cut. To assign a 0 (resp. 1) label to v

, the edge connecting v

to t (resp.

1058 G. Hamarneh

100100

000 001 010 011 100 101 110 111

Fig. 3. The 2

ways of cutting through {v

}

j=1

for b = 2 (left) and b = 3 (right) with

the resulting binary codes {00, 01, 10, 11} and {000, 001, ··· , 111}

s) must be severed (Figure 3). Therefore, the cost of severing t-links in G

assign l

to vertex v

in G is calculated as

tlinks



j=1

(2)

where

denotes the unary complement (NOT) of l

.TheG

s − t cut severing

the t-links, as per (2), will also result in severing edges in E

intra

(Figure 1). In

particular, e

im,in

∈ E

intra

will be severed iﬀ the s−t cut leaves v

connected to

one terminal, say s (resp. t), while v

remains connected to the other terminal

t (resp. s). If this condition holds, then w

will contribute to the cost.

Therefore, the cost of severing intra-links in G

to assign l

to vertex v

in G is

intra



m=1



n=m+1

⊕ l

) w

(3)

where ⊕ denotes binary XOR. The total data penalty is the sum of (2) and (3),

)=D

tlinks

)+D

intra

). (4)

2.3 Prior Term Penalty: Severing N-Links

The interaction penalty V

) for assigning l

to v

and l

to neighboring

in G must equal the cost of assigning a sequence of binary labels (l

)

m=1

)

m=1

and (l

)

n=1

to (v

)

n=1

in G

. The cost of this cut can be calculated

as (Figure 4)



m=1



n=1

⊕ l

) w

. (5)

This eﬀectively adds the edge weight between v

and v

to the cut cost iﬀ the

cut results in one vertex of the edge connected to one terminal (s or t) while

the other vertex connected to the other terminal (t or s). Note that we impose

no restrictions on the left hand side of (5), e.g. it could reﬂect non-convex or

non-metric priors, spatially-varying, or even higher order label interaction.

Multi-label MRF Optimization via a Least Squares s − t Cut 1059

00 00

000 000

01 10 11 10 11 11

011 100 111 110

Fig. 4. Severing n-links between neighboring vertices v

and v

for b = 2 (four examples

are shown in the top row) and b = 3 (three examples in the bottom row). The cut is

depicted as a red curve. In the last two examples for b = 3, the colored vertices are

translated while maintaining the n-links in order to clearly show that the severed n-links

for each case follow (5).

2.4 Edge Weight Approximation with Least Squares

Equations (4) and (5) dictate the relationship between the penalty terms (D

and V

) of the original multi-label problem and the severed edge weights w

ij,mn

;

∀e

ij,mn

∈ E

of the s − t graph G

. What remains missing before applying the

s − t cut, however, is to ﬁnd these edge weights.

Edge weights of t-links and intra-links. For b =1(i.e. binary labelling),

(3) simpliﬁes to D

intra

) = 0 and (4) simpliﬁes to D

)=l

With l

= l

for b = 1, substituting the two possible values for l

= {l

},we

obtain

= l

⇒ D

)=l

=0w

+1w

= l

⇒ D

)=l

=1w

+0w

(6)

which can be written in matrix form A

= B











)



where X

is the vector of unknown edge weights connecting vertex v

to s and t,

is the data penalty for v

,andA

is the matrix of coeﬃcients. The subscript

1inA

, and B

indicates that this matrix equation is for b = 1. Clearly, the

solution is trivial and expected: w

= D

)andw

= D

)

For b = 2, we address multi-label problems of k = {3, 4},or2

b−1

=2<k≤

= 4 labels. Substituting the 2

= 4 possible label values, ((0,0),(0,1),(1,0),

and (1,1)), of (l

)

=(l

) in (4) we obtain

(0, 0) ⇒ D

)=0w

+1w

+0w

+1w

+0w

(0, 1) ⇒ D

)=0w

+1w

+0w

+1w

(1, 0) ⇒ D

)=1w

+0w

+1w

(1, 1) ⇒ D

)=1w

+0w

+1w

+0w

(7)

which can be written in matrix form A

= B

Multi-label MRF Optimization via a Least Squares s - t Cut

Figures

Citations

Local optimization based segmentation of spatially-recurring, multi-region objects with part configuration constraints.

Exhaustive family of energies minimizable exactly by a graph cut

References

Matrix computations

Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images

Normalized cuts and image segmentation

Measures of the Amount of Ecologic Association Between Species

Fast approximate energy minimization via graph cuts

Related Papers (5)

Pyramidal Connected Component Labeling by Irregular Graph Pyramid

Binary image scrambling evaluation method based on the mean square deviation and the bipartite graph

Adaptive Split-and-Merge for Image Analysis and Coding

Connection information normalization system based on adjacent matrix, graph feature extraction system and graph classification system and method

Graph-theoretical analysis of the fractal transform

Frequently Asked Questions (16)

Q1. What contributions have the authors mentioned in the paper "Multi-label mrf optimization via a least squares s − t cut" ?

Q2. What future works have the authors mentioned in the paper "Multi-label mrf optimization via a least squares s − t cut" ?

Q3. What is the LS error in edge weights?

Q4. What is the way to label a graph?

Q5. What is the simplest way to decode labels?

Q6. how much is a severing edge in g2?

Q7. what is the solution to the original multi-label problem?

Q8. What is the noise level of the edge weights?

Q9. What is the purpose of this paper?

Q10. what is the way to label a graph?

Q11. what is the simplest solution to the multi-label problem?

Q12. What is the way to minimize the LS error in a linear system of equations?

Q13. what is the cost of severing a tlink in g2?

Q14. what is the cost of severing t-links in g2?

Q15. how many pairs of labels can be substituted?

Q16. What is the solution to the multi-label problem?