Contour Detection and Hierarchical Image Segmentation
Summary (6 min read)
1 INTRODUCTION
- This paper presents a unified approach to contour detection and image segmentation.
- This benchmark operates by comparing machine generated contours to human ground-truth data and allows evaluation of segmentations in the same framework by regarding region boundaries as contours.
- The authors introduced the gPb and gPb-owtucm algorithms in [3] and [4], respectively.
- To produce high-quality image segmentations, the authors link this contour detector with a generic grouping algorithm described in Section 4 and consisting of two steps.
- Average agreement between human subjects is indicated by the green dot.
2 PREVIOUS WORK
- The problems of contour detection and segmentation are related, but not identical.
- The BSDS300 consists of 200 training and 100 test images, each with multiple ground-truth segmentations.
- Historically, however, there have been different lines of approach to these two problems, which the authors now review.
2.1 Contours
- Early approaches to contour detection aim at quantifying the presence of a boundary at a given image location through local measurements.
- The Roberts [17], Sobel [18], and Prewitt [19] operators detect edges by convolving a grayscale image with local derivative filters.
- Additional localization and relative contrast cues, defined in terms of the multiscale detector output, are fed to the boundary classifier.
- The simplest such algorithms link together high-gradient edge fragments in order to identify extended, smooth contours [40], [41], [42].
- Zhu et al. [24] also start with the output of [2] and create a weighted edgel graph, where the weights measure directed collinearity between neighboring edgels.
2.2 Regions
- A broad family of approaches to segmentation involve integrating features such as brightness, color, or texture over local image patches and then clustering those features based on, e.g., fitting mixture models [7], [44], mode-finding [34], or graph partitioning [32], [45], [46], [47].
- The graph based region merging algorithm advocated by Felzenszwalb and Huttenlocher (Felz-Hutt) [32] attempts to partition image pixels into components such that the resulting segmentation is neither too coarse nor too fine.
- The fact that W must be sparse, in order to avoid a prohibitively expensive computation, limits the naive implementation to using only local pixel affinities.
- Cour et al. solve this limitation by computing sparse affinity matrices at multiple scales, setting up cross-scale constraints, and deriving a new eigenproblem for this constrained multiscale cut.
- Recently, Pock et al. [60] proposed to solve a convex relaxation of (4), thus obtaining robustness to initialization.
2.3 Benchmarks
- The standard for evaluating segmentations algorithms is less clear.
- One option is to regard the segment boundaries as contours and evaluate them as such.
- A methodology that directly measures the quality of the segments is also desirable.
- The authors therefore also consider various region-based metrics.
2.3.1 Variation of Information
- The Variation of Information metric was introduced for the purpose of clustering comparison [6].
- It measures the distance between two segmentations in terms of their average conditional entropy given by: V I(S, S′) = H(S) + H(S′)− 2I(S, S′) (5) where H and I represent respectively the entropies and mutual information between two clusterings of data S and S′.
- In their case, these clusterings are test and groundtruth segmentations.
- Its perceptual meaning and applicability in the presence of several ground-truth segmentations remains unclear.
2.3.2 Rand Index
- Originally, the Rand Index [62] was introduced for general clustering evaluation.
- The Rand Index between test and groundtruth segmentations S and G is given by the sum of the number of pairs of pixels that have the same label in S and G and those that have different labels in both segmentations, divided by the total number of pairs of pixels.
- Variants of the Rand Index have been proposed [5], [7] for dealing with the case of multiple ground-truth segmentations.
- Using the sample mean to estimate pij , (6) amounts to averaging the Rand Index among different ground-truth segmentations.
- The PRI has been reported to suffer from a small dynamic range [5], [7], and its values across images and algorithms are often similar.
2.3.3 Segmentation Covering
- Similarly, the covering of a machine segmentation S by a family of ground-truth segmentations {Gi} is defined by first covering S separately with each human segmentation Gi, and then averaging over the different humans.
- To achieve perfect covering the machine segmentation must explain all of the human data.
3 CONTOUR DETECTION
- As a starting point for contour detection, the authors consider the work of Martin et al. [2], who define a function Pb(x, y, θ) that predicts the posterior probability of a boundary with orientation θ at each image pixel (x, y) by measuring the difference in local image brightness, color, and texture channels.
- The authors review these cues, introduce their own multiscale version of the Pb detector, and describe the new globalization method they run on top of this multiscale local detector.
3.1 Brightness, Color, Texture Gradients
- This is equivalent to fitting a cylindrical parabola, whose axis is orientated along direction θ, to a local 2D window surrounding each pixel and replacing the response at the pixel with that estimated by the fit.
- The first three correspond to the channels of the CIE Lab colorspace, which the authors refer to 6 as the brightness, color a, and color b channels.
- Each pixel is associated with a (17-dimensional) vector of responses, containing one entry for each filter.
- The cluster centers define a set of image-specific textons and each pixel is assigned the integer id in [1, K] of the closest cluster center.
- On this image, the authors compute differences of histograms in oriented half-discs in the same manner as for the brightness and color channels.
3.2 Multiscale Cue Combination
- The authors now introduce their own multiscale extension of the Pb detector reviewed above.
- Note that Ren [28] introduces a different, more complicated, and similarly performing multiscale extension in work contemporaneous with their own [3], and also suggests possible reasons Martin et al. [2] did not see performance improvements in their original multiscale experiments, including their use of smaller images and their choice of scales.
- Figure 6 shows an example of the oriented gradients obtained for each channel.
- The parameters αi,s weight the relative contribution of each gradient signal.
- Taking the maximum response over orientations yields a measure of boundary strength at each pixel: mPb(x, y) = max θ {mPb(x, y, θ)} (11) An optional non-maximum suppression step [22] produces thinned, real-valued contours.
3.3 Globalization
- Spectral clustering lies at the heart of their globalization machinery.
- At this point, the standard Normalized Cuts approach associates with each pixel a length n descriptor formed from entries of the n eigenvectors and uses a clustering algorithm such as K-means to create a hard partition of the image.
- To circumvent this difficulty, the authors observe that the eigenvectors themselves carry contour information.
- Taking derivatives in this manner ignores the smooth variations that previously lead to errors.
- As with mPb (10), the weights βi,s and γ are learned by gradient ascent on the F-measure using the BSDS training images.
3.4 Results
- Qualitatively, the combination of the multiscale cues with their globalization machinery translates into a reduction of clutter edges and completion of contours in the output, as shown in Figure 9.
- Figure 10 breaks down the contributions of the multiscale and spectral signals to the performance of gPb.
- These precision-recall curves show that the reduction of false positives due to the use of global information in sPb is concentrated in the high thresholds, while gPb takes the best of both worlds, relying on sPb in the high precision regime and on mPb in the high recall regime.
- Looking again at the comparison of contour detectors on the BSDS300 benchmark in Figure 1, the mean improvement in precision of gPb with respect to the single scale Pb is 10% in the recall range [0.1, 0.9].
4 SEGMENTATION
- The nonmax suppressed gPb contours produced in the previous section are often not closed and hence do not partition the image into regions.
- Regions come with their own scale estimates and provide natural domains for computing features used in recognition.
- The authors show how to recover closed contours, while preserving the gains in boundary quality achieved in the previous section.
- The authors algorithm, first reported in [4], builds a hierarchical segmentation by exploiting the information in the contour signal.
4.1 Oriented Watershed Transform
- Using the contour signal, the authors first construct a finest partition for the hierarchy, an over-segmentation whose regions determine the highest level of detail considered.
- The catchment basins of the minima, denoted P0, provide the regions of the finest partition and the corresponding watershed arcs, K0, the possible locations of the boundaries.
- A pixel could lie near but not on a strong vertical contour.
- Several such cases can be seen in Figure 11.
4.2 Ultrametric Contour Map
- One can interpret the boundary strength assigned to an arc by the Oriented Watershed Transform (OWT) of the previous section as an estimate of the probability of that arc being a true contour.
- One possibility, which the authors exploit here, is the Ultrametric Contour Map (UCM) [35] which defines a duality between closed, non-selfintersecting weighted contours and a hierarchy of regions.
- Upper levels of the hierarchy respect only strong contours, resulting in an under-segmentation.
- Specifically: 1) Select minimum weight contour: C∗ = argminC∈K0W (C).
- Hence, the constructed region tree has the structure of an indexed hierarchy and can be described by a dendrogram, where the height H(R) of each region R is the value of the dissimilarity at which it first appears.
4.3 Results
- While the OWT-UCM algorithm can use any source of contours for the input E(x, y, θ) signal (e.g. the Canny edge detector before thresholding), the authors obtain best results by employing the gPb detector [3] introduced in Section 3.
- The authors report experiments using both gPb as well as the baseline Canny detector, and refer to the resulting segmentation algorithms as gPb-owt-ucm and Cannyowt-ucm, respectively.
- Since the OWT-UCM algorithm produces hierarchical region trees, obtaining a single segmentation as output involves a choice of scale.
- One possibility is to use a fixed threshold for all images in the dataset, calibrated to provide optimal performance on the training set.
4.4 Evaluation
- To provide a basis of comparison for the OWT-UCM algorithm, the authors make use of the region merging [32], Mean Shift [34], Multiscale NCuts [33], and SWA [31] segmentation methods reviewed in Section 2.2.
- The authors evaluate each method using the boundary-based precision-recall framework of [2], as well as the Variation of Information, Probabilistic Rand Index, and segment covering criteria discussed in Section 2.3.
- The BSDS serves as ground-truth for both the boundary and region quality measures, since the human-drawn boundaries are closed and hence are also segmentations.
4.4.1 Boundary Quality
- Remember that the evaluation methodology developed by [2] measures detector performance in terms of precision, the fraction of true positives, and recall, the fraction of ground-truth boundary pixels detected.
- The global Fmeasure, or harmonic mean of precision and recall at the optimal detector threshold, provides a summary score.
- Figures 2 and 17 display the full precision-recall curves on the BSDS300 and BSDS500 datasets, respectively.
- The authors find retraining on the BSDS500 to be unnecessary and use the same parameters learned on the BSDS300.
- Of particular note in Figure 17 are pairs of curves corresponding to contour detector output and regions produced by running the OWT-UCM algorithm on that output.
4.4.2 Region Quality
- Table 2 presents region benchmarks on the BSDS.
- For a family of machine segmentations {Si}, associated with different scales of a hierarchical algorithm or different sets of parameters, the authors report three scores for the covering of the ground-truth by segments in {Si}.
- These correspond to selecting covering regions from the segmentation at a universal fixed scale (ODS), a fixed scale per image (OIS), or from any level of the hierarchy or collection {Si} (Best).
- The authors also report the Probabilistic Rand Index and Variation of Information benchmarks.
- While the relative ranking of segmentation algorithms remains fairly consistent across different benchmark criteria, the boundary benchmark appears most capable of discriminating performance.
4.4.4 Summary
- The gPb-owt-ucm segmentation algorithm offers the best performance on every dataset and for every benchmark criterion the authors tested.
- In addition, it is straight-forward, fast, has no parameters to tune, and, as discussed in the following sections, can be adapted for use with topdown knowledge sources.
5 INTERACTIVE SEGMENTATION
- Until now, the authors have only discussed fully automatic image segmentation.
- Human assisted segmentation is relevant for many applications, and recent approaches rely on the graph-cuts formalism [72], [73], [74] or other energy minimization procedure [75] to extract foreground regions.
- The unary potentials encode agreement with estimated foreground or background region models and the pairwise potentials bias neighboring pixels not separated by a strong boundary to have the same label.
- User-specified hard labeling constraints are enforced by connecting a pixel to the source or sink with sufficiently large weight.
- Each unlabeled region receives the label of the first labeled region merged with it.
6 MULTISCALE FOR OBJECT ANALYSIS
- The authors contour detection and segmentation algorithms capture multiscale information by combining local gradient cues computed at three different scales, as described in Section 3.2.
- Note that this procedure does not prevent the object detector itself from using multiscale information, but rather provides the correct central scale.
- Martin et al. [2] suggest ways to speed up this computation, including incremental updating of the histograms as the disc is swept across the image.
- Moreover, in this case, no approximation is required as these operations are equivalent up to the numerical accuracy of the interpolation done when rotating the image.
- Catanzaro et al. [77] have created a parallel GPU implementation of their gPb contour detector.
Did you find this useful? Give us your feedback
Citations
30,811 citations
30,462 citations
Additional excerpts
...The Berkeley Segmentation Data Set (BSDS500) [37] has been used extensively to evaluate both segmentation and edge detection algorithms....
[...]
...The Berkeley Segmentation Data Set (BSDS500) [37]...
[...]
5,843 citations
Cites background or methods from "Contour Detection and Hierarchical ..."
...They first generate a set of part hypotheses using a grouping method based on Arbelaez et al. [3]....
[...]
...the fast method of Felzenszwalb and Huttenlocher [13], which [3] found well-suited for such purpose....
[...]
...This is most naturally addressed by using a hierarchical partitioning, as done for example by Arbelaez et al. [3]....
[...]
...We compare with the segmentation of [3] and with the object hypothesis regions of both [4, 9]....
[...]
...In contrast to the segmentation of [4, 9], instead of focusing on the best segmentation algorithm [3], we use a variety of strategies to deal with as many image conditions as possible, thereby severely reducing computational costs while potentially capturing more objects accurately....
[...]
4,862 citations
4,146 citations
Cites background from "Contour Detection and Hierarchical ..."
...Since the literature on image segmentation is so vast, a good way to get a handle on some of the better performing algorithms is to look at experimental comparisons on human-labeled databases (Arbeláez et al. 2010)....
[...]
...2008), as well as grouping contours into likely regions (Arbeláez et al. 2010) and wide-baseline correspondence (Meltzer and Soatto 2008)....
[...]
References
28,073 citations
"Contour Detection and Hierarchical ..." refers background or methods in this paper
...The Canny detector [22] also models edges as sharp discontinuities in the brightness channel, adding nonmaximum suppression and hysteresis thresholding steps....
[...]
...Comparing boundaries to human ground truth allows us to evaluate contour detectors [3], [22] (dotted lines) and segmentation algorithms [4], [32], [33], [34] (solid lines) in the same framework....
[...]
...Our gPb detector [3] performs significantly better than other algorithms [2], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26], [27], [28] across almost the entire operating regime....
[...]
...An optional nonmaximum suppression step [22] produces thinned, real-valued contours....
[...]
17,427 citations
"Contour Detection and Hierarchical ..." refers methods in this paper
...We then apply second-order Savitzky-Golay filtering [ 63 ] to enhance local maxima and smooth out multiple detection peaks in the direction orthogonal to � . This is equivalent to fitting a cylindrical parabola, whose axis is orientated along direction � , to a local 2D window surrounding each pixel and replacing the response at the pixel with that estimated by the fit....
[...]
14,948 citations
13,789 citations
13,647 citations
Related Papers (5)
Frequently Asked Questions (15)
Q2. What is the basic building block of the Pb contour detector?
The basic building block of the Pb contour detector is the computation of an oriented gradient signal G(x, y, θ) from an intensity image The author.
Q3. What is the advantage of this approach?
An advantage of this approach is that it may be possible to handle cues such as parallelism and completion in the initial classification stage.
Q4. What is the way to find the optimal line segment labeling?
Finding the optimal line segment labeling then translates into a general weighted min-cover problem in which the elements being covered are the line segments themselves and the objects covering them are drawn from the set of all possible curves and all possible background line segments.
Q5. What is the graph based region merging algorithm?
The graph based region merging algorithm advocated by Felzenszwalb and Huttenlocher (Felz-Hutt) [32] attempts to partition image pixels into components such that the resulting segmentation is neither too coarse nor too fine.
Q6. What is the weight of the contour currently being removed?
Since at every step of the algorithm all remaining contours must have strength greater than or equal to those previously removed, the weight of the contour currently being removed cannot decrease during the merging process.
Q7. What is the simplest way to partition pixels into components?
Given a graph in which pixels are nodes and edge weights measure the dissimilarity between nodes (e.g. color differences), each node is initially placed in its own component.
Q8. What is the argument that the boundary benchmark favors contour detectors over segmentation methods?
One might argue that the boundary benchmark favors contour detectors over segmentation methods, since the former are not burdened with the constraint of producing closed curves.
Q9. What is the naive implementation of W?
The fact that W must be sparse, in order to avoid a prohibitively expensive computation, limits the naive implementation to using only local pixel affinities.
Q10. What is the way to measure the boundary and region quality?
The BSDS serves as ground-truth for both the boundary and region quality measures, since the human-drawn boundaries are closed and hence are also segmentations.
Q11. What is the procedure for extracting closed contours?
This procedure extracts both closed contours and smooth curves, as edgel chains are allowed to loop back at their termination points.
Q12. What are the recent local approaches?
More recent local approaches take into account color and texture information and make use of learning techniques for cue combination [2], [26], [27].
Q13. What is the general setting for the contour detector?
To describe their algorithm in the most general setting, the authors now consider an arbitrary contour detector, whose output E(x, y, θ) predicts the probability of an image boundary at location (x, y) and orientation θ.
Q14. What is the process of generating a tree of regions?
This process produces a tree of regions, where the leaves are the initial elements of P0, the root is the entire image, and the regions are ordered by the inclusion relation.
Q15. How do the authors use the gPb contour detector?
To produce high-quality image segmentations, the authors link this contour detector with a generic grouping algorithm described in Section 4 and consisting of two steps.