TL;DR: A multiscale approach for minimizing the energy associated with Markov Random Fields with energy functions that include arbitrary pairwise potentials, which is evaluated on real-world datasets, achieving competitive performance in relatively short run-times.
Abstract: We present a multiscale approach for minimizing the energy associated with Markov Random Fields (MRFs) with energy functions that include arbitrary pairwise potentials. The MRF is represented on a hierarchy of successively coarser scales, where the problem on each scale is itself an MRF with suitably defined potentials. These representations are used to construct an efficient multiscale algorithm that seeks a minimal-energy solution to the original problem. The algorithm is iterative and features a bidirectional crosstalk between fine and coarse representations. We use consistency criteria to guarantee that the energy is nonincreasing throughout the iterative process. The algorithm is evaluated on real-world datasets, achieving competitive performance in relatively short run-times.
Furthermore, it is generally agreed that coarse-to-fine schemes are less sensitive to local minima and can produce higher-quality label assignments [4, 11, 16].
The authors approach uses existing inference algorithms together with a variable grouping procedure referred to as coarsening, which is aimed at producing a hierarchy of successively coarser representations of the MRF problem, in order to efficiently explore relevant subsets of the space of possible label assignments.
The method can efficiently incorporate any initializeable inference algorithm that can deal with general pairwise potentials, e.g., QPBO-I [20] and LSA-TR [10], yielding significantly lower energy values than those obtained with standard use of these methods.
Furthermore, the authors suggest to group variables based on the magnitude of their statistical correlation, regardless of whether the variables are assumed to take the same label at the minimum energy.
2.1. The coarsening procedure
The authors denote the MRF (or its graph) whose energy they aim to minimize, and its corresponding search space by G(0)(V(0), E(0), φ(0)) and X (0), respectively, and use a shorthand notation G(0) to refer to these elements.
Then, in each such group the authors select one vertex to be the “seed variable” (or seed vertex) of the group.
Next, the authors eliminate all but the seed vertex in each group and define the coarser graph, G(t+1), whose vertices correspond to the seed vertices of the fine graph G(t).
The first term sums up the unary potentials of variables in [ṽ], and the second term takes into account the energy of pairwise potentials of all internal pairs u,w ∈ [ṽ].
It is readily seen that consistency is satisfied by the coarsening procedure, by substituting a labeling assignment of G(t+1) into Eqs. (4) and (5) to verify that the energy at scale t of the interpolated labeling is equal to the coarsescale energy for any interpolation rule.
2.2. The multiscale algorithm
The key ingredient of this paper is the multiscale algorithm which takes after the classical V-cycle employed in multigrid numerical solvers for partial differential equations [5, 21].
This process comprises a single iteration or cycle.
Coarsening halts when the number of variables is sufficiently small, say |V(t)| < N , and an exact solution can be easily recovered, e.g., via exhaustive search.
Computational complexity and choice of inference module.
Note that the inference algorithm should not be run until convergence, because its goal is not to find a global optimum of the search sub-space; rather, a small number inference module coarsening interpolation finest scale coarsets scale Figure 2.
2.3. Monotonicity
The multiscale framework described so far is not monotonic, due to the fact that the initial state at a coarse level may incur a higher energy than that of the fine state from which it is derived.
To see this, let x(t) denote the state at level t, right before the coarsening stage of a V-cycle.
As noted above, coarse-scale variables inherit the current state of seed variables.
If the energy associated with x(t+1) happens to be higher than the energy associated with x(t) then monotonicity is compromised.
To avoid this undesirable behavior the authors modify the interpolation rule such that if x(t+1) was inherited from x(t) then x(t+1) will be mapped back to x(t) by the interpolation.
2.4. Variable-grouping by conditional entropy
The authors next describe their approach for variable-grouping and the selection of a seed variable in each group.
Heuristically, the authors would like v to be a seed variable, whose labeling determines that of u via the interpolation, if they are relatively confident of what the label of u should be, given just the label of v. Conditional entropy measures the uncertainty in the state of one random variable given the state of another random variable [7].
The authors then proceed with the variable-grouping procedure; for each variable they must determine its status, namely whether it is a seed variable or an interpolated variable whose seed must be determined.
This is achieved by examining directed edges one-by-one according to the order by which they are stored in the binned-score list.
The process terminates when the status of all the variables has been set.
3. Evaluation
The algorithm was implemented in the framework of OpenGM [1], a C++ template library that offers several inference algorithms and a collection of datasets to evaluate on.
The authors use QPBO-I [20] and LSA-TR [10] for binary models and Swap/Expand-QPBO (αβ-swap/α-expand with a QPBO-I binary step) and Lazy-Flipper with a search depth of 2 [2] for multilabel models.
Unless otherwise indicated, 3 V-cycles were applied on “hard” energy models (Sec. 3.1) and a single V-cycle on Potts models (Sec. 3.2).
Hence, the authors resort to comparing multiscale to single-scale inference for algorithms which can be applied in their framework without modifications.
For each dataset the authors report also the “Ace” inference method for that dataset, where algorithms are ranked according to the percentage of instances on which they achieve the best energy and by their run-time.
3.1. Hard energies
Concretely, the datasets are split into 3 categories: those for which (all/some/none) of the instances are solved to optimality.
The authors follow these notions when they refer to hard models, with special attention to the type of pairwise interaction.
Detailed results are presented in Table 2.
The Scribble dataset [3] is an image segmentation task with a user-interactive interface, in which the user is asked to mark boundaries of objects in the scene (see Fig. 4).
4. Discussion
The authors have presented a multiscale framework for MRF energy minimization that uses variable grouping to form coarser levels of the problem.
The authors demonstrated these concepts with an algorithm that groups variables based on a local approximation of their conditional entropy, namely based on an estimate of their statistical correlation.
The algorithm was evaluated on a collection of datasets and results indicate that it is beneficial to apply existing single-scale methods within the presented multiscale algorithm.
There are many possible directions for further developments, beginning with the interpolation rule.
Indeed, even the set of labels can be expanded on a coarse scale to enrich the coarse search sub-space.
TL;DR: An algorithm that alternates between message passing and efficient separation of cycle-and odd-wheel inequalities is defined, which is more efficient than state-of-the-art algorithms based on linear programming.
Abstract: We propose a dual decomposition and linear program relaxation of the NP-hard minimum cost multicut problem. Unlike other polyhedral relaxations of the multicut polytope, it is amenable to efficient optimization by message passing. Like other polyhedral relaxations, it can be tightened efficiently by cutting planes. We define an algorithm that alternates between message passing and efficient separation of cycle-and odd-wheel inequalities. This algorithm is more efficient than state-of-the-art algorithms based on linear programming, including algorithms written in the framework of leading commercial software, as we show in experiments with large instances of the problem from applications in computer vision, biomedical image analysis and data mining.
26 citations
Cites background or methods from "A Multiscale Variable-Grouping Fram..."
...We thank anonymous reviewers for pointing out the references [37, 11]....
[...]
...Also, the multicut problem can be transformed into a Markov random field and solved with primal heuristics there, as done for the “scribbles” dataset in [37, 11]....
TL;DR: The author examines the role of entropy, inequality, and randomness in the design of codes and the construction of codes in the rapidly changing environment.
Abstract: Preface to the Second Edition. Preface to the First Edition. Acknowledgments for the Second Edition. Acknowledgments for the First Edition. 1. Introduction and Preview. 1.1 Preview of the Book. 2. Entropy, Relative Entropy, and Mutual Information. 2.1 Entropy. 2.2 Joint Entropy and Conditional Entropy. 2.3 Relative Entropy and Mutual Information. 2.4 Relationship Between Entropy and Mutual Information. 2.5 Chain Rules for Entropy, Relative Entropy, and Mutual Information. 2.6 Jensen's Inequality and Its Consequences. 2.7 Log Sum Inequality and Its Applications. 2.8 Data-Processing Inequality. 2.9 Sufficient Statistics. 2.10 Fano's Inequality. Summary. Problems. Historical Notes. 3. Asymptotic Equipartition Property. 3.1 Asymptotic Equipartition Property Theorem. 3.2 Consequences of the AEP: Data Compression. 3.3 High-Probability Sets and the Typical Set. Summary. Problems. Historical Notes. 4. Entropy Rates of a Stochastic Process. 4.1 Markov Chains. 4.2 Entropy Rate. 4.3 Example: Entropy Rate of a Random Walk on a Weighted Graph. 4.4 Second Law of Thermodynamics. 4.5 Functions of Markov Chains. Summary. Problems. Historical Notes. 5. Data Compression. 5.1 Examples of Codes. 5.2 Kraft Inequality. 5.3 Optimal Codes. 5.4 Bounds on the Optimal Code Length. 5.5 Kraft Inequality for Uniquely Decodable Codes. 5.6 Huffman Codes. 5.7 Some Comments on Huffman Codes. 5.8 Optimality of Huffman Codes. 5.9 Shannon-Fano-Elias Coding. 5.10 Competitive Optimality of the Shannon Code. 5.11 Generation of Discrete Distributions from Fair Coins. Summary. Problems. Historical Notes. 6. Gambling and Data Compression. 6.1 The Horse Race. 6.2 Gambling and Side Information. 6.3 Dependent Horse Races and Entropy Rate. 6.4 The Entropy of English. 6.5 Data Compression and Gambling. 6.6 Gambling Estimate of the Entropy of English. Summary. Problems. Historical Notes. 7. Channel Capacity. 7.1 Examples of Channel Capacity. 7.2 Symmetric Channels. 7.3 Properties of Channel Capacity. 7.4 Preview of the Channel Coding Theorem. 7.5 Definitions. 7.6 Jointly Typical Sequences. 7.7 Channel Coding Theorem. 7.8 Zero-Error Codes. 7.9 Fano's Inequality and the Converse to the Coding Theorem. 7.10 Equality in the Converse to the Channel Coding Theorem. 7.11 Hamming Codes. 7.12 Feedback Capacity. 7.13 Source-Channel Separation Theorem. Summary. Problems. Historical Notes. 8. Differential Entropy. 8.1 Definitions. 8.2 AEP for Continuous Random Variables. 8.3 Relation of Differential Entropy to Discrete Entropy. 8.4 Joint and Conditional Differential Entropy. 8.5 Relative Entropy and Mutual Information. 8.6 Properties of Differential Entropy, Relative Entropy, and Mutual Information. Summary. Problems. Historical Notes. 9. Gaussian Channel. 9.1 Gaussian Channel: Definitions. 9.2 Converse to the Coding Theorem for Gaussian Channels. 9.3 Bandlimited Channels. 9.4 Parallel Gaussian Channels. 9.5 Channels with Colored Gaussian Noise. 9.6 Gaussian Channels with Feedback. Summary. Problems. Historical Notes. 10. Rate Distortion Theory. 10.1 Quantization. 10.2 Definitions. 10.3 Calculation of the Rate Distortion Function. 10.4 Converse to the Rate Distortion Theorem. 10.5 Achievability of the Rate Distortion Function. 10.6 Strongly Typical Sequences and Rate Distortion. 10.7 Characterization of the Rate Distortion Function. 10.8 Computation of Channel Capacity and the Rate Distortion Function. Summary. Problems. Historical Notes. 11. Information Theory and Statistics. 11.1 Method of Types. 11.2 Law of Large Numbers. 11.3 Universal Source Coding. 11.4 Large Deviation Theory. 11.5 Examples of Sanov's Theorem. 11.6 Conditional Limit Theorem. 11.7 Hypothesis Testing. 11.8 Chernoff-Stein Lemma. 11.9 Chernoff Information. 11.10 Fisher Information and the Cram-er-Rao Inequality. Summary. Problems. Historical Notes. 12. Maximum Entropy. 12.1 Maximum Entropy Distributions. 12.2 Examples. 12.3 Anomalous Maximum Entropy Problem. 12.4 Spectrum Estimation. 12.5 Entropy Rates of a Gaussian Process. 12.6 Burg's Maximum Entropy Theorem. Summary. Problems. Historical Notes. 13. Universal Source Coding. 13.1 Universal Codes and Channel Capacity. 13.2 Universal Coding for Binary Sequences. 13.3 Arithmetic Coding. 13.4 Lempel-Ziv Coding. 13.5 Optimality of Lempel-Ziv Algorithms. Compression. Summary. Problems. Historical Notes. 14. Kolmogorov Complexity. 14.1 Models of Computation. 14.2 Kolmogorov Complexity: Definitions and Examples. 14.3 Kolmogorov Complexity and Entropy. 14.4 Kolmogorov Complexity of Integers. 14.5 Algorithmically Random and Incompressible Sequences. 14.6 Universal Probability. 14.7 Kolmogorov complexity. 14.9 Universal Gambling. 14.10 Occam's Razor. 14.11 Kolmogorov Complexity and Universal Probability. 14.12 Kolmogorov Sufficient Statistic. 14.13 Minimum Description Length Principle. Summary. Problems. Historical Notes. 15. Network Information Theory. 15.1 Gaussian Multiple-User Channels. 15.2 Jointly Typical Sequences. 15.3 Multiple-Access Channel. 15.4 Encoding of Correlated Sources. 15.5 Duality Between Slepian-Wolf Encoding and Multiple-Access Channels. 15.6 Broadcast Channel. 15.7 Relay Channel. 15.8 Source Coding with Side Information. 15.9 Rate Distortion with Side Information. 15.10 General Multiterminal Networks. Summary. Problems. Historical Notes. 16. Information Theory and Portfolio Theory. 16.1 The Stock Market: Some Definitions. 16.2 Kuhn-Tucker Characterization of the Log-Optimal Portfolio. 16.3 Asymptotic Optimality of the Log-Optimal Portfolio. 16.4 Side Information and the Growth Rate. 16.5 Investment in Stationary Markets. 16.6 Competitive Optimality of the Log-Optimal Portfolio. 16.7 Universal Portfolios. 16.8 Shannon-McMillan-Breiman Theorem (General AEP). Summary. Problems. Historical Notes. 17. Inequalities in Information Theory. 17.1 Basic Inequalities of Information Theory. 17.2 Differential Entropy. 17.3 Bounds on Entropy and Relative Entropy. 17.4 Inequalities for Types. 17.5 Combinatorial Bounds on Entropy. 17.6 Entropy Rates of Subsets. 17.7 Entropy and Fisher Information. 17.8 Entropy Power Inequality and Brunn-Minkowski Inequality. 17.9 Inequalities for Determinants. 17.10 Inequalities for Ratios of Determinants. Summary. Problems. Historical Notes. Bibliography. List of Symbols. Index.
45,034 citations
"A Multiscale Variable-Grouping Fram..." refers background in this paper
...Conditional entropy measures the uncertainty in the state of one random variable given the state of another random variable [7]....
TL;DR: In this paper, the boundary value problem is discretized on several grids (or finite-element spaces) of widely different mesh sizes, and interactions between these levels enable us to solve the possibly nonlinear system of n discrete equations in 0(n) operations (40n additions and shifts for Poisson problems); and conveniently adapt the discretization (the local mesh size, local order of approximation, etc.) to the evolving solution in a nearly optimal way, obtaining "°°-order" approximations and low n, even when singularities are present.
Abstract: The boundary-value problem is discretized on several grids (or finite-element spaces) of widely different mesh sizes. Interactions between these levels enable us (i) to solve the possibly nonlinear system of n discrete equations in 0(n) operations (40n additions and shifts for Poisson problems); (ii) to conveniently adapt the discretization (the local mesh size, local order of approximation, etc.) to the evolving solution in a nearly optimal way, obtaining \"°°-order\" approximations and low n, even when singularities are present. General theoretical analysis of the numerical process. Numerical experiments with linear and nonlinear, elliptic and mixed-type (transonic flow) problemsconfirm theoretical predictions. Similar techniques for initial-value problems are briefly
TL;DR: Algorithmic techniques are presented that substantially improve the running time of the loopy belief propagation approach and reduce the complexity of the inference algorithm to be linear rather than quadratic in the number of possible labels for each pixel, which is important for problems such as image restoration that have a large label set.
Abstract: Markov random field models provide a robust and unified framework for early vision problems such as stereo and image restoration. Inference algorithms based on graph cuts and belief propagation have been found to yield accurate results, but despite recent advances are often too slow for practical use. In this paper we present some algorithmic techniques that substantially improve the running time of the loopy belief propagation approach. One of the techniques reduces the complexity of the inference algorithm to be linear rather than quadratic in the number of possible labels for each pixel, which is important for problems such as image restoration that have a large label set. Another technique speeds up and reduces the memory requirements of belief propagation on grid graphs. A third technique is a multi-grid method that makes it possible to obtain good results with a small fixed number of message passing iterations, independent of the size of the input images. Taken together these techniques speed up the standard algorithm by several orders of magnitude. In practice we obtain results that are as accurate as those of other global methods (e.g., using the Middlebury stereo benchmark) while being nearly as fast as purely local methods.
TL;DR: New algorithmic techniques are presented that substantially improve the running time of the belief propagation approach and reduce the complexity of the inference algorithm to be linear rather than quadratic in the number of possible labels for each pixel.
Abstract: Markov random field models provide a robust and unified framework for early vision problems such as stereo, optical flow and image restoration. Inference algorithms based on graph cuts and belief propagation yield accurate results, but despite recent advances are often still too slow for practical use. In this paper we present new algorithmic techniques that substantially improve the running time of the belief propagation approach. One of our techniques reduces the complexity of the inference algorithm to be linear rather than quadratic in the number of possible labels for each pixel, which is important for problems such as optical flow or image restoration that have a large label set. A second technique makes it possible to obtain good results with a small fixed number of message passing iterations, independent of the size of the input images. Taken together these techniques speed up the standard algorithm by several orders of magnitude. In practice we obtain stereo, optical flow and image restoration algorithms that are as accurate as other global methods (e.g., using the Middlebury stereo benchmark) while being as fast as local techniques.
889 citations
"A Multiscale Variable-Grouping Fram..." refers background in this paper
...These benefits follow from the fact that, although only local interactions are encoded, the model is global in nature, and by working at multiple scales information is propagated more efficiently [9, 16]....
[...]
...Considerable research has been reported in the literature on approximating (1) in a coarse-to-fine framework [4, 6, 9, 11, 14, 15, 16, 17, 18]....
[...]
..., grouping together square patches of variables in a grid [9, 11]....
[...]
...Coarse-to-fine methods have been shown to be beneficial in terms of running time [6, 9, 16, 18]....
TL;DR: An efficient implementation of the "probing" technique is discussed, which simplifies the MRF while preserving the global optimum, and a new technique which takes an arbitrary input labeling and tries to improve its energy is presented.
Abstract: Many computer vision applications rely on the efficient optimization of challenging, so-called non-submodular, binary pairwise MRFs. A promising graph cut based approach for optimizing such MRFs known as "roof duality" was recently introduced into computer vision. We study two methods which extend this approach. First, we discuss an efficient implementation of the "probing" technique introduced recently by Bows et al. (2006). It simplifies the MRF while preserving the global optimum. Our code is 400-700 faster on some graphs than the implementation of the work of Bows et al. (2006). Second, we present a new technique which takes an arbitrary input labeling and tries to improve its energy. We give theoretical characterizations of local minima of this procedure. We applied both techniques to many applications, including image segmentation, new view synthesis, super-resolution, diagram recognition, parameter learning, texture restoration, and image deconvolution. For several applications we see that we are able to find the global minimum very efficiently, and considerably outperform the original roof duality approach. In comparison to existing techniques, such as graph cut, TRW, BP, ICM, and simulated annealing, we nearly always find a lower energy.
518 citations
"A Multiscale Variable-Grouping Fram..." refers methods in this paper
...We use QPBO-I [20] and LSA-TR [10] for binary models and Swap/Expand-QPBO (αβ-swap/α-expand with a QPBO-I binary step) and Lazy-Flipper with a search...
[...]
...Subject to these limitations we use QPBO-I [20] and LSA-TR [10] for binary models....
[...]
...For multilabel models we use Swap/Expand-QPBO (αβ-swap/αexpand with a QPBO-I binary step) [20] and Lazy-Flipper with a search depth 2 [2]....
[...]
..., QPBO-I [20] and LSA-TR [10], yielding significantly lower energy values than those obtained with standard use of these methods....
[...]
...The method can efficiently incorporate any initializeable inference algorithm that can deal with general pairwise potentials, e.g., QPBO-I [20] and LSA-TR [10], yielding significantly lower energy values than those obtained with standard use of these methods....