scispace - formally typeset

Proceedings ArticleDOI

A Multiscale Variable-Grouping Framework for MRF Energy Minimization

07 Dec 2015-pp 1805-1813

TL;DR: A multiscale approach for minimizing the energy associated with Markov Random Fields with energy functions that include arbitrary pairwise potentials, which is evaluated on real-world datasets, achieving competitive performance in relatively short run-times.

AbstractWe present a multiscale approach for minimizing the energy associated with Markov Random Fields (MRFs) with energy functions that include arbitrary pairwise potentials. The MRF is represented on a hierarchy of successively coarser scales, where the problem on each scale is itself an MRF with suitably defined potentials. These representations are used to construct an efficient multiscale algorithm that seeks a minimal-energy solution to the original problem. The algorithm is iterative and features a bidirectional crosstalk between fine and coarse representations. We use consistency criteria to guarantee that the energy is nonincreasing throughout the iterative process. The algorithm is evaluated on real-world datasets, achieving competitive performance in relatively short run-times.

Summary (3 min read)

1. Introduction

  • Furthermore, it is generally agreed that coarse-to-fine schemes are less sensitive to local minima and can produce higher-quality label assignments [4, 11, 16].
  • The authors approach uses existing inference algorithms together with a variable grouping procedure referred to as coarsening, which is aimed at producing a hierarchy of successively coarser representations of the MRF problem, in order to efficiently explore relevant subsets of the space of possible label assignments.
  • The method can efficiently incorporate any initializeable inference algorithm that can deal with general pairwise potentials, e.g., QPBO-I [20] and LSA-TR [10], yielding significantly lower energy values than those obtained with standard use of these methods.
  • Furthermore, the authors suggest to group variables based on the magnitude of their statistical correlation, regardless of whether the variables are assumed to take the same label at the minimum energy.

2.1. The coarsening procedure

  • The authors denote the MRF (or its graph) whose energy they aim to minimize, and its corresponding search space by G(0)(V(0), E(0), φ(0)) and X (0), respectively, and use a shorthand notation G(0) to refer to these elements.
  • Then, in each such group the authors select one vertex to be the “seed variable” (or seed vertex) of the group.
  • Next, the authors eliminate all but the seed vertex in each group and define the coarser graph, G(t+1), whose vertices correspond to the seed vertices of the fine graph G(t).
  • The first term sums up the unary potentials of variables in [ṽ], and the second term takes into account the energy of pairwise potentials of all internal pairs u,w ∈ [ṽ].
  • It is readily seen that consistency is satisfied by the coarsening procedure, by substituting a labeling assignment of G(t+1) into Eqs. (4) and (5) to verify that the energy at scale t of the interpolated labeling is equal to the coarsescale energy for any interpolation rule.

2.2. The multiscale algorithm

  • The key ingredient of this paper is the multiscale algorithm which takes after the classical V-cycle employed in multigrid numerical solvers for partial differential equations [5, 21].
  • This process comprises a single iteration or cycle.
  • Coarsening halts when the number of variables is sufficiently small, say |V(t)| < N , and an exact solution can be easily recovered, e.g., via exhaustive search.
  • Computational complexity and choice of inference module.
  • Note that the inference algorithm should not be run until convergence, because its goal is not to find a global optimum of the search sub-space; rather, a small number inference module coarsening interpolation finest scale coarsets scale Figure 2.

2.3. Monotonicity

  • The multiscale framework described so far is not monotonic, due to the fact that the initial state at a coarse level may incur a higher energy than that of the fine state from which it is derived.
  • To see this, let x(t) denote the state at level t, right before the coarsening stage of a V-cycle.
  • As noted above, coarse-scale variables inherit the current state of seed variables.
  • If the energy associated with x(t+1) happens to be higher than the energy associated with x(t) then monotonicity is compromised.
  • To avoid this undesirable behavior the authors modify the interpolation rule such that if x(t+1) was inherited from x(t) then x(t+1) will be mapped back to x(t) by the interpolation.

2.4. Variable-grouping by conditional entropy

  • The authors next describe their approach for variable-grouping and the selection of a seed variable in each group.
  • Heuristically, the authors would like v to be a seed variable, whose labeling determines that of u via the interpolation, if they are relatively confident of what the label of u should be, given just the label of v. Conditional entropy measures the uncertainty in the state of one random variable given the state of another random variable [7].
  • The authors then proceed with the variable-grouping procedure; for each variable they must determine its status, namely whether it is a seed variable or an interpolated variable whose seed must be determined.
  • This is achieved by examining directed edges one-by-one according to the order by which they are stored in the binned-score list.
  • The process terminates when the status of all the variables has been set.

3. Evaluation

  • The algorithm was implemented in the framework of OpenGM [1], a C++ template library that offers several inference algorithms and a collection of datasets to evaluate on.
  • The authors use QPBO-I [20] and LSA-TR [10] for binary models and Swap/Expand-QPBO (αβ-swap/α-expand with a QPBO-I binary step) and Lazy-Flipper with a search depth of 2 [2] for multilabel models.
  • Unless otherwise indicated, 3 V-cycles were applied on “hard” energy models (Sec. 3.1) and a single V-cycle on Potts models (Sec. 3.2).
  • Hence, the authors resort to comparing multiscale to single-scale inference for algorithms which can be applied in their framework without modifications.
  • For each dataset the authors report also the “Ace” inference method for that dataset, where algorithms are ranked according to the percentage of instances on which they achieve the best energy and by their run-time.

3.1. Hard energies

  • Concretely, the datasets are split into 3 categories: those for which (all/some/none) of the instances are solved to optimality.
  • The authors follow these notions when they refer to hard models, with special attention to the type of pairwise interaction.
  • Detailed results are presented in Table 2.
  • The Scribble dataset [3] is an image segmentation task with a user-interactive interface, in which the user is asked to mark boundaries of objects in the scene (see Fig. 4).

4. Discussion

  • The authors have presented a multiscale framework for MRF energy minimization that uses variable grouping to form coarser levels of the problem.
  • The authors demonstrated these concepts with an algorithm that groups variables based on a local approximation of their conditional entropy, namely based on an estimate of their statistical correlation.
  • The algorithm was evaluated on a collection of datasets and results indicate that it is beneficial to apply existing single-scale methods within the presented multiscale algorithm.
  • There are many possible directions for further developments, beginning with the interpolation rule.
  • Indeed, even the set of labels can be expanded on a coarse scale to enrich the coarse search sub-space.

Did you find this useful? Give us your feedback

...read more

Content maybe subject to copyright    Report

A Multiscale Variable-grouping Framework for MRF Energy Minimization
Omer Meir Meirav Galun Stav Yagev Ronen Basri
Weizmann Institute of Science
Rehovot, Israel
{omerm, meirav.galun, stav, ronen.basri}@weizmann.ac.il
Irad Yavneh
Technion
Haifa, Israel
irad@cs.technion.ac.il
Abstract
We present a multiscale approach for minimizing the en-
ergy associated with Markov Random Fields (MRFs) with
energy functions that include arbitrary pairwise potentials.
The MRF is represented on a hierarchy of successively
coarser scales, where the problem on each scale is itself
an MRF with suitably defined potentials. These representa-
tions are used to construct an efficient multiscale algorithm
that seeks a minimal-energy solution to the original prob-
lem. The algorithm is iterative and features a bidirectional
crosstalk between fine and coarse representations. We use
consistency criteria to guarantee that the energy is non-
increasing throughout the iterative process. The algorithm
is evaluated on real-world datasets, achieving competitive
performance in relatively short run-times.
1. Introduction
In recent years Markov random fields (MRFs) have be-
come an increasingly popular tool for image modeling, with
applications ranging from image denoising, inpainting and
segmentation, to stereo matching and optical flow estima-
tion, and many more. An MRF is commonly constructed by
modeling the pixels (or regional “superpixels”) of an image
as variables that take values in a discrete label space, and by
formulating an energy function that suits the application.
An expressive model used often is the pairwise model,
which is specified by an energy function E : X R,
E(x) =
X
v ∈V
φ
v
(x
v
) +
X
(u,v )∈E
φ
uv
(x
u
, x
v
), (1)
where x X is a label assignment of all variables v V.
The first term in this function is the sum of unary potentials,
φ
v
(x
v
), which reflect the cost of assigning label x
v
to vari-
able v. The second term is the sum of pairwise potentials,
This research was supported in part by the Israel Science Foundation
Grant No. 1265/14.
φ
uv
(x
u
, x
v
), which model the interaction between pairs of
variables by reflecting the cost of assigning labels x
u
, x
v
to
variables u, v, respectively. Here V denotes the set of vari-
ables and E is the set of pairs of interacting variables. The
inference task then is to find a label assignment that mini-
mizes the energy.
Considerable research has been reported in the literature
on approximating (1) in a coarse-to-fine framework [4, 6, 9,
11, 14, 15, 16, 17, 18]. Coarse-to-fine methods have been
shown to be beneficial in terms of running time [6, 9, 16,
18]. Furthermore, it is generally agreed that coarse-to-fine
schemes are less sensitive to local minima and can produce
higher-quality label assignments [4, 11, 16]. These benefits
follow from the fact that, although only local interactions
are encoded, the model is global in nature, and by working
at multiple scales information is propagated more efficiently
[9, 16].
Until recently, coarse-to-fine schemes were confined to
geometric structures, i.e., grouping together square patches
of variables in a grid [9, 11]. Recent works [4, 14] suggest
to group variables which are likely to end up with the same
label in the minimum energy solution, rather than by ad-
hering to a geometric structure. Such grouping may lead to
over-smoothing [6, 15]. Methods for dealing with the prob-
lem of over-smoothing include applying a multi-resolution
scheme in areas around boundaries at a segmentation task
[15, 18], or pruning the label space of fine scales with a
pre-trained classifier [6].
In this paper we present a multiscale framework for solv-
ing MRFs with multi-label arbitrary pairwise potentials.
The algorithm has been designed with the intention of opti-
mizing “hard” energies which may arise, for example, when
the parameters of the model are learned from data or when
the model accounts for negative affinities between neigh-
boring variables. Our approach uses existing inference al-
gorithms together with a variable grouping procedure re-
ferred to as coarsening, which is aimed at producing a hi-
erarchy of successively coarser representations of the MRF
problem, in order to efficiently explore relevant subsets of
the space of possible label assignments. Our method is it-

erative and monotonic, that is, the energy is guaranteed not
to increase at any point during the iterative process. The
method can efficiently incorporate any initializeable infer-
ence algorithm that can deal with general pairwise poten-
tials, e.g., QPBO-I [20] and LSA-TR [10], yielding signif-
icantly lower energy values than those obtained with stan-
dard use of these methods.
Unlike existing multiscale methods, which employ only
coarse-to-fine strategies, our framework features a bidi-
rectional crosstalk between fine and coarse representations
of the optimization problem. Furthermore, we suggest to
group variables based on the magnitude of their statistical
correlation, regardless of whether the variables are assumed
to take the same label at the minimum energy. The method
is evaluated on real-world datasets yielding promising re-
sults in relatively short run-times.
2. The multiscale framework
An inference algorithm is one which seeks a labeling of
the variables that minimizes the energy of Eq. (1). We re-
fer to the set of possible label assignments, X , as a search
space, and note that this set is of exponential size in the
number of variables. In our approach we construct a hi-
erarchy of n additional search sub-spaces of successively
lower cardinality by coarsening the graphical model. We
denote the complete search space X by X
(0)
and the hi-
erarchy of auxiliary search sub-spaces by X
(1)
, ..., X
(n)
,
with |X
(t+1)
| < |X
(t)
| for all t = 0, 1, ..., n 1.
We furthermore associate energies with the search spaces,
E
(0)
, E
(1)
, ..., E
(n)
, with E
(t)
: X
(t)
R, and E
(0)
= E.
The hierarchy of search spaces is employed to efficiently
seek lower energy assignments in X .
2.1. The coarsening procedure
The coarsening procedure is a fundamental module in
the construction of coarse scales and we now describe it
in detail, see Alg. 1 for an overview. We denote the MRF
(or its graph) whose energy we aim to minimize, and its
corresponding search space by G
(0)
(V
(0)
, E
(0)
, φ
(0)
) and
X
(0)
, respectively, and use a shorthand notation G
(0)
to re-
fer to these elements. Scales of the hierarchy are denoted
by G
(t)
, t = 0, 1, 2, ..., n, such that a larger t corresponds
to a coarser scale, with a smaller graph and search space.
We next define the construction of a coarser graph
G
(t+1)
from a finer graph G
(t)
, relate between their search
spaces, and define the energy on the coarse graph. To sim-
plify notations we use u, v to denote variables (or vertices)
at level t, i.e., u, v V
(t)
, and ˜u, ˜v to denote variables at
level t + 1. A label assignment to variable u (or ˜u) is de-
noted x
u
(respectively x
˜u
). An assignment to all variables
at levels t, t + 1 is denoted by x X
(t)
and ˜x X
(t+1)
,
respectively.
Algorithm 1 [G
(t+1)
, x
(t+1)
] = COARSENING(G
(t)
, x
(t)
)
Input: Graphical model G
(t)
, optional initial labels x
(t)
Output: Coarse-scale graphical model and labels G
(t+1)
, x
(t+1)
1: x
(t+1)
2: select a variable-grouping (Sec. 2.4)
3: set an interpolation rule f
(t+1)
: X
(t+1)
X
(t)
(Eq. (2),(3))
4: if x
(t)
is initialized then
5: modify f
(t+1)
to ensure monotonicity (Sec. 2.3)
6: x
(t+1)
inherits
1
x
(t)
7: end if
8: define the coarse potentials φ
(t+1)
˜u
, φ
(t+1)
˜u˜v
(Eq. (4),(5))
9: return G
(t+1)
, x
(t+1)
Variable-grouping and graph-coarsening. To derive
G
(t+1)
from G
(t)
, we begin by partitioning the variables of
G
(t)
into a disjoint set of groups. Then, in each such group
we select one vertex to be the “seed variable” (or seed ver-
tex) of the group. As explained below, it is necessary that
the seed vertex be connected by an edge to each of the other
vertices in its group. Next, we eliminate all but the seed
vertex in each group and define the coarser graph, G
(t+1)
,
whose vertices correspond to the seed vertices of the fine
graph G
(t)
. To set the notation, let ˜v V
(t+1)
represent a
variable of G
(t+1)
, and denote by [˜v] V
(t)
the subset of
variables of V
(t)
which were grouped to form ˜v. The col-
lection of subsets [˜v] V
(t)
forms a partitioning of the set
V
(t)
, i.e., V
(t)
=
˜v∈V
(t+1)
[˜v] and [˜u][˜v] = , ˜u 6= ˜v. In
Subsection 2.4, we shall provide details on how the group-
ings are selected.
Once the groups have been selected, the coarse-scale
graph topology is determined as follows. An edge is in-
troduced between vertices ˜u and ˜v in G
(t+1)
if there exists
at least one pair of fine-scale vertices, one in [˜u] and the
other in [˜v], that are connected by an edge. See Fig. 1 for an
illustration of variable grouping.
Interpolation rule. Next we define a coarse-to-fine inter-
polation rule, f
(t+1)
: X
(t+1)
X
(t)
, which maps each
labeling assignment of the coarser scale t + 1 to a labeling
of the finer scale t. We consider here only a simple inter-
polation rule, where the label of any fine variable is com-
pletely determined by the coarse-scale variable of its group
and is independent of all other coarse variables. That is,
if x = f
(t+1)
(˜x) and x
˜v
denotes the label associated with
coarse variable ˜v, then for any fine variable u [˜v], x
u
depends only on x
˜v
. More specifically, if s [˜v] is the
seed variable of the group [˜v], then the interpolation rule is
defined as follows:
i. The label assigned to the seed variable s is equal to that
1
For any coarse-scale variable ˜v, x
˜v
x
s
, where x
s
is the label of
the seed variable of the group v].

Figure 1. An illustration of variable-grouping, with seed variables denoted by black disks. Note that each seed variable is connected by an
edge to each other variable in its group, as required by the interpolation rule. Right panel: the coarse graph, whose vertices correspond to
the fine-scale seed vertices, and their coarse unary potentials account for all the internal energy potentials in their group. Edges connect
pairs of coarse-vertices according to topology at the fine scale.
of the coarse variable,
x
s
x
˜v
. (2)
ii. Once the seed label has been assigned, the label as-
signed to any other variable in the group, u [˜v], is
that which minimizes the local energy generated by the
pair (u, s):
x
u
arg min
x
{φ
(t)
u
(x) + φ
(t)
us
(x, x
s
)}. (3)
A single exception to this rule is elaborated in Sec. 2.3. As
we produce the coarse graph we store these interpolation as-
signments (3) in a lookup table so that they can be retrieved
when we return to the finer scale. Finally, we henceforth
use the shorthand notation x
u
|x
˜v
to denote that the interpo-
lation rule assigns the label x
u
to u [˜v] when the label of
˜v is x
˜v
.
Coarse-scale energy. The energy associated with G
(t+1)
,
denoted E
(t+1)
(˜x), depends on the interpolation rule. We
require that, for any coarse-scale labeling, the coarse-scale
energy be equal to the fine-scale energy obtained after in-
terpolation, that is, E
(t+1)
(˜x) = E
(t)
(f
(t+1)
(˜x)). We call
this consistency.
To ensure consistency, all fine potentials are accounted
for exactly once; see Fig. 1. We define the unary potential
of a coarse variable ˜v V
(t+1)
to reflect the internal fine-
scale energy of its group [˜v], according to the interpolation
rule:
φ
(t+1)
˜v
(x
˜v
) =
X
u[˜v]
φ
(t)
u
(x
u
|x
˜v
) + (4)
X
u,w[˜v]
φ
(t)
uw
(x
u
|x
˜v
, x
w
|x
˜v
).
Note that all the energy potentials of the subgraph induced
by [˜v] are accounted for in Eq. (4). The first term sums up
the unary potentials of variables in [˜v], and the second term
takes into account the energy of pairwise potentials of all
internal pairs u, w [˜v].
The pairwise potential of a coarse pair ˜u, ˜v V
(t+1)
accounts for all finer-scale pairwise potentials that have one
variable in [˜u] V
(t)
and the other variable in [˜v] V
(t)
,
φ
(t+1)
˜u˜v
(x
˜u
, x
˜v
) =
X
u[˜u]
v [˜v]
φ
(t)
uv
(x
u
|x
˜u
, x
v
|x
˜v
). (5)
Note that the definition in Eq. (5) is consistent with the
topology of the graph, as was previously defined. The
coarse scale energy E
(t+1)
(˜x) is obtained by summing the
unary (4) and pairwise (5) potentials for all coarse variables
˜v V
(t+1)
and coarse pairs ˜u, ˜v V
(t+1)
.
It is readily seen that consistency is satisfied by the coars-
ening procedure, by substituting a labeling assignment of
G
(t+1)
into Eqs. (4) and (5) to verify that the energy at
scale t of the interpolated labeling is equal to the coarse-
scale energy for any interpolation rule. Consistency guar-
antees that if we reduce the coarse-scale energy, say by
applying an inference algorithm to the coarse-scale MRF,
then this reduction is translated to an equal reduction in the
fine-scale energy via interpolation. Indeed if we minimize
the coarse-scale energy and apply interpolation to this so-
lution, then we will have minimized the fine-scale energy
over the subset of fine-scale labeling assignments that are
in the range of the interpolation, i.e., all label assignments
in f
(t+1)
(X
(t+1)
) X
(t)
.
2.2. The multiscale algorithm
The key ingredient of this paper is the multiscale al-
gorithm which takes after the classical V-cycle employed
in multigrid numerical solvers for partial differential equa-
tions [5, 21]. We describe it informally first. Our V-cycle
is driven by a standard inference algorithm, which is em-
ployed at all scales of the hierarchy, beginning with the

finest level (t = 0), traversing down to the coarsest level
(t = n), and back up to the finest level. This process com-
prises a single iteration or cycle. The cycle begins at level
t = 0 with a given label assignment. One or more itera-
tions of the inference algorithm are applied. Then, a coars-
ening step is performed: the variables are partitioned into
groups, a seed variable is selected for each group, a coarse
graph is defined and its variables inherit
1
the labels of the
seed variables. This routine of inference iterations followed
by coarsening is repeated on the next-coarser level, and so
on. Coarsening halts when the number of variables is suffi-
ciently small, say |V
(t)
| < N , and an exact solution can be
easily recovered, e.g., via exhaustive search. The solution
of the coarse scale is interpolated to scale n 1, replac-
ing the previous solution of that scale, and some number
of inference iterations is performed. This routine of inter-
polation followed by inference is repeated to the next-finer
scale, and so on, until we return to scale 0. As noted above,
this completes a single iteration or V-cycle; see illustration
in Fig. 2. A formal description in the form of a recursive
algorithm appears in Alg. 2. Some remarks follow.
Initialization. The algorithm can be warm-started with
any choice of label assignment, x
(0)
, and with the modifi-
cations described in Sec. 2.3 it is guaranteed to maintain or
improve its energy. If an initial guess is unavailable our
algorithm readily computes a labeling in a coarse-to-fine
manner, similarly to existing works. This is done by skip-
ping the inference module that precedes a coarsening step
in the V-cycle, i.e., by skipping Step 6 of Alg. 2.
Computational complexity and choice of inference
module. The complexity of the algorithm is governed
largely by the complexity of the method used as the in-
ference module employed in Steps 6 and 11. Note that
the inference algorithm should not be run until conver-
gence, because its goal is not to find a global optimum of
the (restricted) search sub-space; rather, a small number
inference module
coarsening
interpolation
finest scale
coarsets scale
Figure 2. The multiscale V-cycle. Starting at the finest scale G
(0)
(top left circle), a label assignment is improved by an inference
algorithm (Step 6 in Alg. 2) and the graph is coarsened (denoted
by an arrow pointing downwards). This repeats until the number
of variables is sufficiently small and exact solution can be easily
recovered. The labeling is then interpolated to the next finer scale
(denoted by an arrow pointing upwards), followed by an applica-
tion of the inference algorithm. Interpolation and inference repeat
until the finest scale is reached. This process, referred to as a V-
cycle, constitutes a single iteration.
Algorithm 2 x = V-CYCLE(G
(t)
, x
(t)
, t)
Input: Graphical model G
(t)
, optional initial labels x
(t)
, t 0
Output: x
(t)
, a label assignment for all v V
(t)
1: if |V
(t)
| < N then
2: compute minimum-energy solution x
(t)
.
3: return x
(t)
4: end if
5: if x
(t)
is initialized then
6: x
(t)
inference on G
(t)
, x
(t)
7: end if
8: G
(t+1)
, x
(t+1)
COARSENING(G
(t)
, x
(t)
) (Alg. 1)
9: x
(t+1)
V-CYCLE(G
(t+1)
, x
(t+1)
, t + 1) (recursive call)
10: x
(t)
interpolate x
(t+1)
(Sec. 2.1)
11: x
(t)
inference on G
(t)
, x
(t)
12: return x
(t)
of iterations suffice in order to obtain a label assignment
for which the interpolation rule heuristic is useful and for
which a coarsening step is therefore efficient. The inference
method must satisfy two requirements. The first is that the
method can be warm-started, otherwise each scale would
be solved from scratch without utilizing information passed
from other scales of the hierarchy, so we would lose the
ability to improve iteratively. Second, the method must be
applicable to models with general potentials. Even when
the potentials of an initial problem are of a specific type
(e.g. submodular, semi-metric), it is not guaranteed that this
property is conserved in coarser scales due to the construc-
tion of the interpolation rule (3) and to the definition of
coarse potentials (5). Subject to these limitations we use
QPBO-I [20] and LSA-TR [10] for binary models. For mul-
tilabel models we use Swap/Expand-QPBO (αβ-swap/α-
expand with a QPBO-I binary step) [20] and Lazy-Flipper
with a search depth 2 [2].
Remark. Evidently, the search space of each MRF in the
hierarchy corresponds, via the interpolation, to a search sub-
space of the next finer MRF. When coarsening, we strive to
eliminate the less likely label assignments from the search
space, whereas more likely solutions should be represented
on the coarser scale. The locally minimal energy (3) is
chosen with this purpose in mind. It is assumed heuris-
tically that a neighboring variable of the seed variable is
more likely to end up with a label that minimizes the en-
ergy associated with this pair than with any other choice.
The approximation is exact if the subgraph is a star graph.
2.3. Monotonicity
The multiscale framework described so far is not mono-
tonic, due to the fact that the initial state at a coarse level
may incur a higher energy than that of the fine state from
which it is derived. To see this, let x
(t)
denote the state at
level t, right before the coarsening stage of a V-cycle. As

noted above, coarse-scale variables inherit the current state
of seed variables. When we interpolate x
(t+1)
back to level
t, it may well be the case that the state which we get back
to is different from x
(t)
, i.e. f
(t+1)
(x
(t+1)
) 6= x
(t)
. If the
energy associated with x
(t+1)
happens to be higher than the
energy associated with x
(t)
then monotonicity is compro-
mised.
To avoid this undesirable behavior we modify the inter-
polation rule such that if x
(t+1)
was inherited from x
(t)
then
x
(t+1)
will be mapped back to x
(t)
by the interpolation.
Specifically, assume we are given a labeling
ˆ
x
(t)
at level
t. Consider every seed variable s V
(t)
and let ˜s denote its
corresponding variable in G
(t+1)
. Then, for all u [˜s] we
reset the interpolation rule (3),
x
u
|(x
˜s
= ˆx
˜s
) ˆx
u
, (6)
and coarse energy potentials are updated accordingly to re-
flect those changes. Now, consistency ensures that the initial
energy at level t +1, E
(t+1)
(
ˆ
x
(t+1)
), is equal to the energy
of
ˆ
x
(t)
at level t. Consequently, assuming that the inference
algorithm which is employed at every level of a V-cycle is
monotonic, the energy is non-increasing.
2.4. Variable-grouping by conditional entropy
We next describe our approach for variable-grouping and
the selection of a seed variable in each group. Heuristically,
we would like v to be a seed variable, whose labeling de-
termines that of u via the interpolation, if we are relatively
confident of what the label of u should be, given just the
label of v.
Conditional entropy measures the uncertainty in the state
of one random variable given the state of another random
variable [7]. We use conditional entropy to gauge our con-
fidence in the interpolation rule (3). Exact calculation of
conditional entropy,
H(u|v) =
X
x
u
,x
v
P
uv
(x
u
, x
v
) · log
P
v
(x
v
)
P
uv
(x
u
, x
v
)
, (7)
involves having access to the marginal probabilities of the
variables and marginalization is in general NP-hard. In-
stead, we use an approximation of the marginal probabil-
ities for pairs of variables by defining the local energy of
two variables u, v,
E
uv
(x
u
, x
v
) = φ
u
(x
u
) + φ
v
(x
v
) + φ
uv
(x
u
, x
v
). (8)
The local energy is used for the approximation of marginal
probabilities by applying the relation P r(x
u
, x
v
) =
1
Z
uv
·
exp {−E
uv
(x
u
, x
v
)}, where Z
uv
is a normalization factor,
ensuring that probabilities sum to 1.
Algorithm 3 VARIABLE-GROUPING(G
(t)
)
Input: Graphical model G
(t)
at scale t
Output: A variable-grouping of G
(t)
1: initialize: SCORES = , SEEDS = , VARS = V
(t)
2: for each edge (u, v) E
(t)
do
3: calculate H(u|v) for (v, u), and H(v|u) for (u, v)
4: store the (directed) pair at the respective bin in SCORES
5: end for
6: while VARS 6= do
7: pop the next edge (u, v) SCORES
// check if we can define u to be vs seed
8: if (v VARS & u VARS SEEDS) then
9: set u to be v
0
s seed
10: SEEDS SEEDS {u}
11: VARS VARS \ {u, v}
12: end if
13: end while
Our algorithm for selecting subgraphs and their respec-
tive seed variable is described below, see also Alg. 3. First,
the local conditional entropy is calculated for all edges in
both directions. A directed edge, whose direction deter-
mines that of the interpolation, is binned in a score-list ac-
cording to its local conditional entropy score. We then pro-
ceed with the variable-grouping procedure; for each vari-
able we must determine its status, namely whether it is a
seed variable or an interpolated variable whose seed must be
determined. This is achieved by examining directed edges
one-by-one according to the order by which they are stored
in the binned-score list. For a directed edge (u, v) we ver-
ify that the intended-to-be interpolated variable v has not
been set as a seed variable nor grouped with a seed vari-
able. Similarly, we ensure that the status of the designated
seed variable u is either undetermined or that u has already
been declared a seed variable. The process terminates when
the status of all the variables has been set. As a remark,
we point to the fact that the score-list’s range is known in
advance (it is the range of feasible entropy scores) and that
no ordering is maintained within its bins. The motivation to
use a binned-score list is twofold: refrain from sorting the
score-list and thus maintain a linear complexity in the num-
ber of edges, and introduce randomization to the variable-
grouping procedure. In our experiments we fixed the num-
ber of bins to 20.
3. Evaluation
The algorithm was implemented in the framework of
OpenGM [1], a C++ template library that offers several
inference algorithms and a collection of datasets to eval-
uate on. We use QPBO-I [20] and LSA-TR [10] for bi-
nary models and Swap/Expand-QPBO (αβ-swap/α-expand
with a QPBO-I binary step) and Lazy-Flipper with a search

Figures (8)
Citations
More filters

Proceedings ArticleDOI
01 Jul 2017
TL;DR: An algorithm that alternates between message passing and efficient separation of cycle-and odd-wheel inequalities is defined, which is more efficient than state-of-the-art algorithms based on linear programming.
Abstract: We propose a dual decomposition and linear program relaxation of the NP-hard minimum cost multicut problem. Unlike other polyhedral relaxations of the multicut polytope, it is amenable to efficient optimization by message passing. Like other polyhedral relaxations, it can be tightened efficiently by cutting planes. We define an algorithm that alternates between message passing and efficient separation of cycle-and odd-wheel inequalities. This algorithm is more efficient than state-of-the-art algorithms based on linear programming, including algorithms written in the framework of leading commercial software, as we show in experiments with large instances of the problem from applications in computer vision, biomedical image analysis and data mining.

19 citations


Cites background or methods from "A Multiscale Variable-Grouping Fram..."

  • ...We thank anonymous reviewers for pointing out the references [37, 11]....

    [...]

  • ...Also, the multicut problem can be transformed into a Markov random field and solved with primal heuristics there, as done for the “scribbles” dataset in [37, 11]....

    [...]


References
More filters

Book
01 Jan 1991
TL;DR: The author examines the role of entropy, inequality, and randomness in the design of codes and the construction of codes in the rapidly changing environment.
Abstract: Preface to the Second Edition. Preface to the First Edition. Acknowledgments for the Second Edition. Acknowledgments for the First Edition. 1. Introduction and Preview. 1.1 Preview of the Book. 2. Entropy, Relative Entropy, and Mutual Information. 2.1 Entropy. 2.2 Joint Entropy and Conditional Entropy. 2.3 Relative Entropy and Mutual Information. 2.4 Relationship Between Entropy and Mutual Information. 2.5 Chain Rules for Entropy, Relative Entropy, and Mutual Information. 2.6 Jensen's Inequality and Its Consequences. 2.7 Log Sum Inequality and Its Applications. 2.8 Data-Processing Inequality. 2.9 Sufficient Statistics. 2.10 Fano's Inequality. Summary. Problems. Historical Notes. 3. Asymptotic Equipartition Property. 3.1 Asymptotic Equipartition Property Theorem. 3.2 Consequences of the AEP: Data Compression. 3.3 High-Probability Sets and the Typical Set. Summary. Problems. Historical Notes. 4. Entropy Rates of a Stochastic Process. 4.1 Markov Chains. 4.2 Entropy Rate. 4.3 Example: Entropy Rate of a Random Walk on a Weighted Graph. 4.4 Second Law of Thermodynamics. 4.5 Functions of Markov Chains. Summary. Problems. Historical Notes. 5. Data Compression. 5.1 Examples of Codes. 5.2 Kraft Inequality. 5.3 Optimal Codes. 5.4 Bounds on the Optimal Code Length. 5.5 Kraft Inequality for Uniquely Decodable Codes. 5.6 Huffman Codes. 5.7 Some Comments on Huffman Codes. 5.8 Optimality of Huffman Codes. 5.9 Shannon-Fano-Elias Coding. 5.10 Competitive Optimality of the Shannon Code. 5.11 Generation of Discrete Distributions from Fair Coins. Summary. Problems. Historical Notes. 6. Gambling and Data Compression. 6.1 The Horse Race. 6.2 Gambling and Side Information. 6.3 Dependent Horse Races and Entropy Rate. 6.4 The Entropy of English. 6.5 Data Compression and Gambling. 6.6 Gambling Estimate of the Entropy of English. Summary. Problems. Historical Notes. 7. Channel Capacity. 7.1 Examples of Channel Capacity. 7.2 Symmetric Channels. 7.3 Properties of Channel Capacity. 7.4 Preview of the Channel Coding Theorem. 7.5 Definitions. 7.6 Jointly Typical Sequences. 7.7 Channel Coding Theorem. 7.8 Zero-Error Codes. 7.9 Fano's Inequality and the Converse to the Coding Theorem. 7.10 Equality in the Converse to the Channel Coding Theorem. 7.11 Hamming Codes. 7.12 Feedback Capacity. 7.13 Source-Channel Separation Theorem. Summary. Problems. Historical Notes. 8. Differential Entropy. 8.1 Definitions. 8.2 AEP for Continuous Random Variables. 8.3 Relation of Differential Entropy to Discrete Entropy. 8.4 Joint and Conditional Differential Entropy. 8.5 Relative Entropy and Mutual Information. 8.6 Properties of Differential Entropy, Relative Entropy, and Mutual Information. Summary. Problems. Historical Notes. 9. Gaussian Channel. 9.1 Gaussian Channel: Definitions. 9.2 Converse to the Coding Theorem for Gaussian Channels. 9.3 Bandlimited Channels. 9.4 Parallel Gaussian Channels. 9.5 Channels with Colored Gaussian Noise. 9.6 Gaussian Channels with Feedback. Summary. Problems. Historical Notes. 10. Rate Distortion Theory. 10.1 Quantization. 10.2 Definitions. 10.3 Calculation of the Rate Distortion Function. 10.4 Converse to the Rate Distortion Theorem. 10.5 Achievability of the Rate Distortion Function. 10.6 Strongly Typical Sequences and Rate Distortion. 10.7 Characterization of the Rate Distortion Function. 10.8 Computation of Channel Capacity and the Rate Distortion Function. Summary. Problems. Historical Notes. 11. Information Theory and Statistics. 11.1 Method of Types. 11.2 Law of Large Numbers. 11.3 Universal Source Coding. 11.4 Large Deviation Theory. 11.5 Examples of Sanov's Theorem. 11.6 Conditional Limit Theorem. 11.7 Hypothesis Testing. 11.8 Chernoff-Stein Lemma. 11.9 Chernoff Information. 11.10 Fisher Information and the Cram-er-Rao Inequality. Summary. Problems. Historical Notes. 12. Maximum Entropy. 12.1 Maximum Entropy Distributions. 12.2 Examples. 12.3 Anomalous Maximum Entropy Problem. 12.4 Spectrum Estimation. 12.5 Entropy Rates of a Gaussian Process. 12.6 Burg's Maximum Entropy Theorem. Summary. Problems. Historical Notes. 13. Universal Source Coding. 13.1 Universal Codes and Channel Capacity. 13.2 Universal Coding for Binary Sequences. 13.3 Arithmetic Coding. 13.4 Lempel-Ziv Coding. 13.5 Optimality of Lempel-Ziv Algorithms. Compression. Summary. Problems. Historical Notes. 14. Kolmogorov Complexity. 14.1 Models of Computation. 14.2 Kolmogorov Complexity: Definitions and Examples. 14.3 Kolmogorov Complexity and Entropy. 14.4 Kolmogorov Complexity of Integers. 14.5 Algorithmically Random and Incompressible Sequences. 14.6 Universal Probability. 14.7 Kolmogorov complexity. 14.9 Universal Gambling. 14.10 Occam's Razor. 14.11 Kolmogorov Complexity and Universal Probability. 14.12 Kolmogorov Sufficient Statistic. 14.13 Minimum Description Length Principle. Summary. Problems. Historical Notes. 15. Network Information Theory. 15.1 Gaussian Multiple-User Channels. 15.2 Jointly Typical Sequences. 15.3 Multiple-Access Channel. 15.4 Encoding of Correlated Sources. 15.5 Duality Between Slepian-Wolf Encoding and Multiple-Access Channels. 15.6 Broadcast Channel. 15.7 Relay Channel. 15.8 Source Coding with Side Information. 15.9 Rate Distortion with Side Information. 15.10 General Multiterminal Networks. Summary. Problems. Historical Notes. 16. Information Theory and Portfolio Theory. 16.1 The Stock Market: Some Definitions. 16.2 Kuhn-Tucker Characterization of the Log-Optimal Portfolio. 16.3 Asymptotic Optimality of the Log-Optimal Portfolio. 16.4 Side Information and the Growth Rate. 16.5 Investment in Stationary Markets. 16.6 Competitive Optimality of the Log-Optimal Portfolio. 16.7 Universal Portfolios. 16.8 Shannon-McMillan-Breiman Theorem (General AEP). Summary. Problems. Historical Notes. 17. Inequalities in Information Theory. 17.1 Basic Inequalities of Information Theory. 17.2 Differential Entropy. 17.3 Bounds on Entropy and Relative Entropy. 17.4 Inequalities for Types. 17.5 Combinatorial Bounds on Entropy. 17.6 Entropy Rates of Subsets. 17.7 Entropy and Fisher Information. 17.8 Entropy Power Inequality and Brunn-Minkowski Inequality. 17.9 Inequalities for Determinants. 17.10 Inequalities for Ratios of Determinants. Summary. Problems. Historical Notes. Bibliography. List of Symbols. Index.

42,928 citations


"A Multiscale Variable-Grouping Fram..." refers background in this paper

  • ...Conditional entropy measures the uncertainty in the state of one random variable given the state of another random variable [7]....

    [...]


Journal ArticleDOI
Abstract: The boundary-value problem is discretized on several grids (or finite-element spaces) of widely different mesh sizes. Interactions between these levels enable us (i) to solve the possibly nonlinear system of n discrete equations in 0(n) operations (40n additions and shifts for Poisson problems); (ii) to conveniently adapt the discretization (the local mesh size, local order of approximation, etc.) to the evolving solution in a nearly optimal way, obtaining \"°°-order\" approximations and low n, even when singularities are present. General theoretical analysis of the numerical process. Numerical experiments with linear and nonlinear, elliptic and mixed-type (transonic flow) problemsconfirm theoretical predictions. Similar techniques for initial-value problems are briefly

2,923 citations


Journal ArticleDOI
TL;DR: Algorithmic techniques are presented that substantially improve the running time of the loopy belief propagation approach and reduce the complexity of the inference algorithm to be linear rather than quadratic in the number of possible labels for each pixel, which is important for problems such as image restoration that have a large label set.
Abstract: Markov random field models provide a robust and unified framework for early vision problems such as stereo and image restoration. Inference algorithms based on graph cuts and belief propagation have been found to yield accurate results, but despite recent advances are often too slow for practical use. In this paper we present some algorithmic techniques that substantially improve the running time of the loopy belief propagation approach. One of the techniques reduces the complexity of the inference algorithm to be linear rather than quadratic in the number of possible labels for each pixel, which is important for problems such as image restoration that have a large label set. Another technique speeds up and reduces the memory requirements of belief propagation on grid graphs. A third technique is a multi-grid method that makes it possible to obtain good results with a small fixed number of message passing iterations, independent of the size of the input images. Taken together these techniques speed up the standard algorithm by several orders of magnitude. In practice we obtain results that are as accurate as those of other global methods (e.g., using the Middlebury stereo benchmark) while being nearly as fast as purely local methods.

1,540 citations


Proceedings ArticleDOI
19 Jul 2004
TL;DR: New algorithmic techniques are presented that substantially improve the running time of the belief propagation approach and reduce the complexity of the inference algorithm to be linear rather than quadratic in the number of possible labels for each pixel.
Abstract: Markov random field models provide a robust and unified framework for early vision problems such as stereo, optical flow and image restoration. Inference algorithms based on graph cuts and belief propagation yield accurate results, but despite recent advances are often still too slow for practical use. In this paper we present new algorithmic techniques that substantially improve the running time of the belief propagation approach. One of our techniques reduces the complexity of the inference algorithm to be linear rather than quadratic in the number of possible labels for each pixel, which is important for problems such as optical flow or image restoration that have a large label set. A second technique makes it possible to obtain good results with a small fixed number of message passing iterations, independent of the size of the input images. Taken together these techniques speed up the standard algorithm by several orders of magnitude. In practice we obtain stereo, optical flow and image restoration algorithms that are as accurate as other global methods (e.g., using the Middlebury stereo benchmark) while being as fast as local techniques.

860 citations


"A Multiscale Variable-Grouping Fram..." refers background in this paper

  • ...These benefits follow from the fact that, although only local interactions are encoded, the model is global in nature, and by working at multiple scales information is propagated more efficiently [9, 16]....

    [...]

  • ...Considerable research has been reported in the literature on approximating (1) in a coarse-to-fine framework [4, 6, 9, 11, 14, 15, 16, 17, 18]....

    [...]

  • ..., grouping together square patches of variables in a grid [9, 11]....

    [...]

  • ...Coarse-to-fine methods have been shown to be beneficial in terms of running time [6, 9, 16, 18]....

    [...]


Proceedings ArticleDOI
17 Jun 2007
TL;DR: An efficient implementation of the "probing" technique is discussed, which simplifies the MRF while preserving the global optimum, and a new technique which takes an arbitrary input labeling and tries to improve its energy is presented.
Abstract: Many computer vision applications rely on the efficient optimization of challenging, so-called non-submodular, binary pairwise MRFs. A promising graph cut based approach for optimizing such MRFs known as "roof duality" was recently introduced into computer vision. We study two methods which extend this approach. First, we discuss an efficient implementation of the "probing" technique introduced recently by Bows et al. (2006). It simplifies the MRF while preserving the global optimum. Our code is 400-700 faster on some graphs than the implementation of the work of Bows et al. (2006). Second, we present a new technique which takes an arbitrary input labeling and tries to improve its energy. We give theoretical characterizations of local minima of this procedure. We applied both techniques to many applications, including image segmentation, new view synthesis, super-resolution, diagram recognition, parameter learning, texture restoration, and image deconvolution. For several applications we see that we are able to find the global minimum very efficiently, and considerably outperform the original roof duality approach. In comparison to existing techniques, such as graph cut, TRW, BP, ICM, and simulated annealing, we nearly always find a lower energy.

505 citations


"A Multiscale Variable-Grouping Fram..." refers methods in this paper

  • ...We use QPBO-I [20] and LSA-TR [10] for binary models and Swap/Expand-QPBO (αβ-swap/α-expand with a QPBO-I binary step) and Lazy-Flipper with a search...

    [...]

  • ...Subject to these limitations we use QPBO-I [20] and LSA-TR [10] for binary models....

    [...]

  • ...For multilabel models we use Swap/Expand-QPBO (αβ-swap/αexpand with a QPBO-I binary step) [20] and Lazy-Flipper with a search depth 2 [2]....

    [...]

  • ..., QPBO-I [20] and LSA-TR [10], yielding significantly lower energy values than those obtained with standard use of these methods....

    [...]

  • ...The method can efficiently incorporate any initializeable inference algorithm that can deal with general pairwise potentials, e.g., QPBO-I [20] and LSA-TR [10], yielding significantly lower energy values than those obtained with standard use of these methods....

    [...]