Journal Article•DOI•

Contour Detection and Hierarchical Image Segmentation

Q: How do the authors compute gradients for each of the channels?

In order to detect fine as well as coarse structures, the authors consider gradients at three scales: [σ2 , σ, 2σ] for each of the brightness, color, and texture channels.

Q: What is the basic building block of the Pb contour detector?

The basic building block of the Pb contour detector is the computation of an oriented gradient signal G(x, y, θ) from an intensity image The author.

Q: What is the weight of the contour currently being removed?

Since at every step of the algorithm all remaining contours must have strength greater than or equal to those previously removed, the weight of the contour currently being removed cannot decrease during the merging process.

Pablo Arbeláez¹, Michael Maire², Charless C. Fowlkes³, Jitendra Malik¹•Institutions (3)

University of California, Berkeley¹, California Institute of Technology², University of California, Irvine³

01 May 2011-IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE)-Vol. 33, Iss: 5, pp 898-916

TL;DR: This paper investigates two fundamental problems in computer vision: contour detection and image segmentation and presents state-of-the-art algorithms for both of these tasks.

read less

Abstract: This paper investigates two fundamental problems in computer vision: contour detection and image segmentation. We present state-of-the-art algorithms for both of these tasks. Our contour detector combines multiple local cues into a globalization framework based on spectral clustering. Our segmentation algorithm consists of generic machinery for transforming the output of any contour detector into a hierarchical region tree. In this manner, we reduce the problem of image segmentation to that of contour detection. Extensive experimental evaluation demonstrates that both our contour detection and segmentation methods significantly outperform competing algorithms. The automatically generated hierarchical segmentations can be interactively refined by user-specified annotations. Computation at multiple image resolutions provides a means of coupling our system to recognition applications.

...read moreread less

Summary (6 min read)

Jump to: [1 INTRODUCTION] – [2 PREVIOUS WORK] – [2.1 Contours] – [2.2 Regions] – [2.3 Benchmarks] – [2.3.1 Variation of Information] – [2.3.2 Rand Index] – [2.3.3 Segmentation Covering] – [3 CONTOUR DETECTION] – [3.1 Brightness, Color, Texture Gradients] – [3.2 Multiscale Cue Combination] – [3.3 Globalization] – [3.4 Results] – [4 SEGMENTATION] – [4.1 Oriented Watershed Transform] – [4.2 Ultrametric Contour Map] – [4.3 Results] – [4.4 Evaluation] – [4.4.1 Boundary Quality] – [4.4.2 Region Quality] – [4.4.4 Summary] – [5 INTERACTIVE SEGMENTATION] and [6 MULTISCALE FOR OBJECT ANALYSIS]

1 INTRODUCTION

This paper presents a unified approach to contour detection and image segmentation.
This benchmark operates by comparing machine generated contours to human ground-truth data and allows evaluation of segmentations in the same framework by regarding region boundaries as contours.
The authors introduced the gPb and gPb-owtucm algorithms in [3] and [4], respectively.
To produce high-quality image segmentations, the authors link this contour detector with a generic grouping algorithm described in Section 4 and consisting of two steps.
Average agreement between human subjects is indicated by the green dot.

2 PREVIOUS WORK

The problems of contour detection and segmentation are related, but not identical.
The BSDS300 consists of 200 training and 100 test images, each with multiple ground-truth segmentations.
Historically, however, there have been different lines of approach to these two problems, which the authors now review.

2.1 Contours

Early approaches to contour detection aim at quantifying the presence of a boundary at a given image location through local measurements.
The Roberts [17], Sobel [18], and Prewitt [19] operators detect edges by convolving a grayscale image with local derivative filters.
Additional localization and relative contrast cues, defined in terms of the multiscale detector output, are fed to the boundary classifier.
The simplest such algorithms link together high-gradient edge fragments in order to identify extended, smooth contours [40], [41], [42].
Zhu et al. [24] also start with the output of [2] and create a weighted edgel graph, where the weights measure directed collinearity between neighboring edgels.

2.2 Regions

A broad family of approaches to segmentation involve integrating features such as brightness, color, or texture over local image patches and then clustering those features based on, e.g., fitting mixture models [7], [44], mode-finding [34], or graph partitioning [32], [45], [46], [47].
The graph based region merging algorithm advocated by Felzenszwalb and Huttenlocher (Felz-Hutt) [32] attempts to partition image pixels into components such that the resulting segmentation is neither too coarse nor too fine.
The fact that W must be sparse, in order to avoid a prohibitively expensive computation, limits the naive implementation to using only local pixel affinities.
Cour et al. solve this limitation by computing sparse affinity matrices at multiple scales, setting up cross-scale constraints, and deriving a new eigenproblem for this constrained multiscale cut.
Recently, Pock et al. [60] proposed to solve a convex relaxation of (4), thus obtaining robustness to initialization.

2.3 Benchmarks

The standard for evaluating segmentations algorithms is less clear.
One option is to regard the segment boundaries as contours and evaluate them as such.
A methodology that directly measures the quality of the segments is also desirable.
The authors therefore also consider various region-based metrics.

2.3.1 Variation of Information

The Variation of Information metric was introduced for the purpose of clustering comparison [6].
It measures the distance between two segmentations in terms of their average conditional entropy given by: V I(S, S′) = H(S) + H(S′)− 2I(S, S′) (5) where H and I represent respectively the entropies and mutual information between two clusterings of data S and S′.
In their case, these clusterings are test and groundtruth segmentations.
Its perceptual meaning and applicability in the presence of several ground-truth segmentations remains unclear.

2.3.2 Rand Index

Originally, the Rand Index [62] was introduced for general clustering evaluation.
The Rand Index between test and groundtruth segmentations S and G is given by the sum of the number of pairs of pixels that have the same label in S and G and those that have different labels in both segmentations, divided by the total number of pairs of pixels.
Variants of the Rand Index have been proposed [5], [7] for dealing with the case of multiple ground-truth segmentations.
Using the sample mean to estimate pij , (6) amounts to averaging the Rand Index among different ground-truth segmentations.
The PRI has been reported to suffer from a small dynamic range [5], [7], and its values across images and algorithms are often similar.

2.3.3 Segmentation Covering

Similarly, the covering of a machine segmentation S by a family of ground-truth segmentations {Gi} is defined by first covering S separately with each human segmentation Gi, and then averaging over the different humans.
To achieve perfect covering the machine segmentation must explain all of the human data.

3 CONTOUR DETECTION

As a starting point for contour detection, the authors consider the work of Martin et al. [2], who define a function Pb(x, y, θ) that predicts the posterior probability of a boundary with orientation θ at each image pixel (x, y) by measuring the difference in local image brightness, color, and texture channels.
The authors review these cues, introduce their own multiscale version of the Pb detector, and describe the new globalization method they run on top of this multiscale local detector.

3.1 Brightness, Color, Texture Gradients

This is equivalent to fitting a cylindrical parabola, whose axis is orientated along direction θ, to a local 2D window surrounding each pixel and replacing the response at the pixel with that estimated by the fit.
The first three correspond to the channels of the CIE Lab colorspace, which the authors refer to 6 as the brightness, color a, and color b channels.
Each pixel is associated with a (17-dimensional) vector of responses, containing one entry for each filter.
The cluster centers define a set of image-specific textons and each pixel is assigned the integer id in [1, K] of the closest cluster center.
On this image, the authors compute differences of histograms in oriented half-discs in the same manner as for the brightness and color channels.

3.2 Multiscale Cue Combination

The authors now introduce their own multiscale extension of the Pb detector reviewed above.
Note that Ren [28] introduces a different, more complicated, and similarly performing multiscale extension in work contemporaneous with their own [3], and also suggests possible reasons Martin et al. [2] did not see performance improvements in their original multiscale experiments, including their use of smaller images and their choice of scales.
Figure 6 shows an example of the oriented gradients obtained for each channel.
The parameters αi,s weight the relative contribution of each gradient signal.
Taking the maximum response over orientations yields a measure of boundary strength at each pixel: mPb(x, y) = max θ {mPb(x, y, θ)} (11) An optional non-maximum suppression step [22] produces thinned, real-valued contours.

3.3 Globalization

Spectral clustering lies at the heart of their globalization machinery.
At this point, the standard Normalized Cuts approach associates with each pixel a length n descriptor formed from entries of the n eigenvectors and uses a clustering algorithm such as K-means to create a hard partition of the image.
To circumvent this difficulty, the authors observe that the eigenvectors themselves carry contour information.
Taking derivatives in this manner ignores the smooth variations that previously lead to errors.
As with mPb (10), the weights βi,s and γ are learned by gradient ascent on the F-measure using the BSDS training images.

3.4 Results

Qualitatively, the combination of the multiscale cues with their globalization machinery translates into a reduction of clutter edges and completion of contours in the output, as shown in Figure 9.
Figure 10 breaks down the contributions of the multiscale and spectral signals to the performance of gPb.
These precision-recall curves show that the reduction of false positives due to the use of global information in sPb is concentrated in the high thresholds, while gPb takes the best of both worlds, relying on sPb in the high precision regime and on mPb in the high recall regime.
Looking again at the comparison of contour detectors on the BSDS300 benchmark in Figure 1, the mean improvement in precision of gPb with respect to the single scale Pb is 10% in the recall range [0.1, 0.9].

4 SEGMENTATION

The nonmax suppressed gPb contours produced in the previous section are often not closed and hence do not partition the image into regions.
Regions come with their own scale estimates and provide natural domains for computing features used in recognition.
The authors show how to recover closed contours, while preserving the gains in boundary quality achieved in the previous section.
The authors algorithm, first reported in [4], builds a hierarchical segmentation by exploiting the information in the contour signal.

4.1 Oriented Watershed Transform

Using the contour signal, the authors first construct a finest partition for the hierarchy, an over-segmentation whose regions determine the highest level of detail considered.
The catchment basins of the minima, denoted P0, provide the regions of the finest partition and the corresponding watershed arcs, K0, the possible locations of the boundaries.
A pixel could lie near but not on a strong vertical contour.
Several such cases can be seen in Figure 11.

4.2 Ultrametric Contour Map

One can interpret the boundary strength assigned to an arc by the Oriented Watershed Transform (OWT) of the previous section as an estimate of the probability of that arc being a true contour.
One possibility, which the authors exploit here, is the Ultrametric Contour Map (UCM) [35] which defines a duality between closed, non-selfintersecting weighted contours and a hierarchy of regions.
Upper levels of the hierarchy respect only strong contours, resulting in an under-segmentation.
Specifically: 1) Select minimum weight contour: C∗ = argminC∈K0W (C).
Hence, the constructed region tree has the structure of an indexed hierarchy and can be described by a dendrogram, where the height H(R) of each region R is the value of the dissimilarity at which it first appears.

4.3 Results

While the OWT-UCM algorithm can use any source of contours for the input E(x, y, θ) signal (e.g. the Canny edge detector before thresholding), the authors obtain best results by employing the gPb detector [3] introduced in Section 3.
The authors report experiments using both gPb as well as the baseline Canny detector, and refer to the resulting segmentation algorithms as gPb-owt-ucm and Cannyowt-ucm, respectively.
Since the OWT-UCM algorithm produces hierarchical region trees, obtaining a single segmentation as output involves a choice of scale.
One possibility is to use a fixed threshold for all images in the dataset, calibrated to provide optimal performance on the training set.

4.4 Evaluation

To provide a basis of comparison for the OWT-UCM algorithm, the authors make use of the region merging [32], Mean Shift [34], Multiscale NCuts [33], and SWA [31] segmentation methods reviewed in Section 2.2.
The authors evaluate each method using the boundary-based precision-recall framework of [2], as well as the Variation of Information, Probabilistic Rand Index, and segment covering criteria discussed in Section 2.3.
The BSDS serves as ground-truth for both the boundary and region quality measures, since the human-drawn boundaries are closed and hence are also segmentations.

4.4.1 Boundary Quality

Remember that the evaluation methodology developed by [2] measures detector performance in terms of precision, the fraction of true positives, and recall, the fraction of ground-truth boundary pixels detected.
The global Fmeasure, or harmonic mean of precision and recall at the optimal detector threshold, provides a summary score.
Figures 2 and 17 display the full precision-recall curves on the BSDS300 and BSDS500 datasets, respectively.
The authors find retraining on the BSDS500 to be unnecessary and use the same parameters learned on the BSDS300.
Of particular note in Figure 17 are pairs of curves corresponding to contour detector output and regions produced by running the OWT-UCM algorithm on that output.

4.4.2 Region Quality

Table 2 presents region benchmarks on the BSDS.
For a family of machine segmentations {Si}, associated with different scales of a hierarchical algorithm or different sets of parameters, the authors report three scores for the covering of the ground-truth by segments in {Si}.
These correspond to selecting covering regions from the segmentation at a universal fixed scale (ODS), a fixed scale per image (OIS), or from any level of the hierarchy or collection {Si} (Best).
The authors also report the Probabilistic Rand Index and Variation of Information benchmarks.
While the relative ranking of segmentation algorithms remains fairly consistent across different benchmark criteria, the boundary benchmark appears most capable of discriminating performance.

4.4.4 Summary

The gPb-owt-ucm segmentation algorithm offers the best performance on every dataset and for every benchmark criterion the authors tested.
In addition, it is straight-forward, fast, has no parameters to tune, and, as discussed in the following sections, can be adapted for use with topdown knowledge sources.

5 INTERACTIVE SEGMENTATION

Until now, the authors have only discussed fully automatic image segmentation.
Human assisted segmentation is relevant for many applications, and recent approaches rely on the graph-cuts formalism [72], [73], [74] or other energy minimization procedure [75] to extract foreground regions.
The unary potentials encode agreement with estimated foreground or background region models and the pairwise potentials bias neighboring pixels not separated by a strong boundary to have the same label.
User-specified hard labeling constraints are enforced by connecting a pixel to the source or sink with sufficiently large weight.
Each unlabeled region receives the label of the first labeled region merged with it.

6 MULTISCALE FOR OBJECT ANALYSIS

The authors contour detection and segmentation algorithms capture multiscale information by combining local gradient cues computed at three different scales, as described in Section 3.2.
Note that this procedure does not prevent the object detector itself from using multiscale information, but rather provides the correct central scale.
Martin et al. [2] suggest ways to speed up this computation, including incremental updating of the histograms as the disc is swept across the image.
Moreover, in this case, no approximation is required as these operations are equivalent up to the numerical accuracy of the interpolation done when rotating the image.
Catanzaro et al. [77] have created a parallel GPU implementation of their gPb contour detector.

Did you find this useful? Give us your feedback

Figures (22)

Fig. 5. Filters for creating textons. We use 8 oriented even- and odd-symmetric Gaussian derivative filters and a center-surround (difference of Gaussians) filter.

Fig. 6. Multiscale Pb. Left Column, Top to Bottom: The brightness and color a and b channels of Lab color space, and the texton channel computed using image-specific textons, followed by the input image. Rows: Next to each channel, we display the oriented gradient of histograms (as outlined in Figure 4) for θ = 0 and θ = π2 (horizontal and vertical), and the maximum response over eight orientations in [0, π) (right column). Beside the original image, we display the combination of oriented gradients across all four channels and across three scales. The lower right panel (outlined in red) shows mPb, the final output of the multiscale contour detector.

Fig. 14. Hierarchical segmentation from contours. Far Left: Image. Left: Maximal response of contour detector gPb over orientations. Middle Left: Weighted contours resulting from the Oriented Watershed Transform - Ultrametric Contour Map (OWT-UCM) algorithm using gPb as input. This single weighted image encodes the entire hierarchical segmentation. By construction, applying any threshold to it is guaranteed to yield a set of closed contours (the ones with weights above the threshold), which in turn define a segmentation. Moreover, the segmentations are nested. Increasing the threshold is equivalent to removing contours and merging the regions they separated. Middle Right: The initial oversegmentation corresponding to the finest level of the UCM, with regions represented by their mean color. Right and Far Right: Contours and corresponding segmentation obtained by thresholding the UCM at level 0.5.

Fig. 19. Evaluating regions on the BSDS300. Contour detector influence on segmentation quality is evident when benchmarking the regions of the resulting hierarchical segmentation. Left: Probabilistic Rand Index. Right: Variation of Information.

Fig. 17. Boundary benchmark on the BSDS500. Comparing boundaries to human ground-truth allows us to evaluate contour detectors [3], [22] (dotted lines) and segmentation algorithms [4], [32], [33], [34] (solid lines) in the same framework. Performance is consistent when going from the BSDS300 (Figures 1 and 2) to the BSDS500 (above). Furthermore, the OWT-UCM algorithm preserves contour detector quality. For both gPb and Canny, comparing the resulting segment boundaries to the original contours shows that our OWT-UCM algorithm constructs hierarchical segmentations from contours without losing performance on the boundary benchmark.

TABLE 3. Region benchmarks on MSRC and PASCAL 2008. Shown are scores for the segment covering criteria.

Fig. 18. Pairwise comparison of segmentation algorithms on the BSDS300. The coordinates of the red dots are the boundary benchmark scores (F-measures) at the optimal image scale for each of the two methods compared on single images. Boxed totals indicate the number of images where one algorithm is better. For example, the top-left shows gPb-owt-ucm outscores NCuts on 99/100 images. When comparing with SWA, we further restrict the output of the second method to match the number of regions produced by SWA. All differences are statistically significant except between NCuts and Mean Shift.

Fig. 8. Eigenvectors carry contour information. Left: Image and maximum response of spectral Pb over orientations, sPb(x, y) = maxθ{sPb(x, y, θ)}. Right Top: First four generalized eigenvectors, v1, ...,v4, used in creating sPb. Right Bottom: Maximum gradient response over orientations, maxθ{∇θvk(x, y)}, for each eigenvector.

Fig. 7. Spectral Pb. Left: Image.Middle Left: The thinned non-max suppressed multiscale Pb signal defines a sparse affinity matrix connecting pixels within a fixed radius. Pixels i and j have a low affinity as a strong boundary separates them, whereas i and k have high affinity.Middle: First four generalized eigenvectors resulting from spectral clustering. Middle Right: Partitioning the image by running K-means clustering on the eigenvectors erroneously breaks smooth regions. Right: Instead, we compute gradients of the eigenvectors, transforming them back into a contour signal.

Fig. 12. Oriented Watershed Transform. Left: Input boundary signal E(x, y) = maxθ E(x, y, θ). Middle Left: Watershed arcs computed from E(x, y). Note that thin regions give rise to artifacts. Middle: Watershed arcs with an approximating straight line segment subdivision overlaid. We compute this subdivision in a scale-invariant manner by recursively breaking an arc at the point maximally distant from the straight line segment connecting its endpoints, as shown in Figure 13. Subdivision terminates when the distance from the line segment to every point on the arc is less than a fixed fraction of the segment length.Middle Right: Oriented boundary strength E(x, y, θ) for four orientations θ. In practice, we sample eight orientations. Right: Watershed arcs reweighted according to E at the orientation of their associated line segments. Artifacts, such as the horizontal contours breaking the long skinny regions, are suppressed as their orientations do not agree with the underlying E(x, y, θ) signal.

Fig. 13. Contour subdivision. Left: Initial arcs colorcoded. If the distance from any point on an arc to the straight line segment connecting its endpoints is greater than a fixed fraction of the segment length, we subdivide the arc at the maximally distant point. An example is shown for one arc, with the dashed segments indicating the new subdivision. Middle: The final set of arcs resulting from recursive application of the scale-invariant subdivision procedure. Right: Approximating straight line segments overlaid on the subdivided arcs.

Fig. 16. Additional hierarchical segmentation results on the BSDS500. From Top to Bottom: Image, UCM produced by gPb-owt-ucm, and ODS and OIS segmentations. All images are from the test set.

Fig. 23. Comparison of half-disc and rectangular regions for computing the oriented gradient of histograms. Top Row: Results of using the O(Nr2) time algorithm to compute the difference of histograms in oriented half-discs at each pixel. Shown is the output for processing the brightness channel displayed in Figure 22 using a disc of radius r = 10 pixels at four distinct orientations (one per column). N is the total number of pixels. Bottom Row: Approximating each half-disc with a single rectangle (of height 9 pixels so that the rectangle area best matches the disc area), as shown in Figure 22, and using integral histograms allows us to compute nearly identical results in only O(N) time. In both cases, we show the raw histogram difference output before application of a smoothing filter in order to clearly demonstrate the similarity of the results.

Fig. 22. Efficient computation of the oriented gradient of histograms. Left: The two half-discs of interest can be approximated arbitrarily well by a series of rectangular boxes. We found a single box of equal area to the half-disc to be a sufficient approximation. Middle: Replacing the circular disc of Figure 4 with the approximation reduces the problem to computing the histograms within rectangular regions. Right: Instead of rotating the rectangles, rotate the image and use the integral image trick to compute the histograms efficiently. Rotate the final result to map it back to the original coordinate frame.

Fig. 15. Hierarchical segmentation results on the BSDS500. From Left to Right: Image, Ultrametric Contour Map (UCM) produced by gPb-owt-ucm, and segmentations obtained by thresholding at the optimal dataset scale (ODS) and optimal image scale (OIS). All images are from the test set.

Fig. 10. Globalization improves contour detection. The spectral Pb detector (sPb), derived from the eigenvectors of a spectral partitioning algorithm, improves the precision of the local multiscale Pb signal (mPb) used as input. Global Pb (gPb), a learned combination of the two, provides uniformly better performance. Also note the benefit of using multiple scales (mPb) over single scale Pb. Results shown on the BSDS300.

Fig. 9. Benefits of globalization. When compared with the local detector Pb, our detector gPb reduces clutter and completes contours. The thresholds shown correspond to the points of maximal F-measure on the curves in Figure 1.

Fig. 21. Multiscale segmentation for object detection. Top: Images from the PASCAL 2008 dataset, with objects outlined at ground-truth locations. Detailed Views: For each window, we show the boundaries obtained by running the entire gPb-owt-ucm segmentation algorithm at multiple scales. Scale increases by factors of 2 moving from left to right (and top to bottom for the blue window). The total scale range is thus larger than the three scales used internally for each segmentation. Highlighted Views: The highlighted scale best captures the target object’s boundaries. Note the link between this scale and the absolute size of the object in the image. For example, the small sailboat (red outline) is correctly segmented only at the finest scale. In other cases (e.g. parrot, magenta outline), bounding contours appear across several scales, but informative internal contours are scale sensitive. A window-based object detector could learn and exploit an optimal coupling between object size and segmentation scale.

TABLE 1. Boundary benchmarks on the BSDS. Results for six different segmentation methods (upper table) and two contour detectors (lower table) are given. Shown are the F-measures when choosing an optimal scale for the entire dataset (ODS) or per image (OIS), as well as the average precision (AP). Figures 1, 2, and 17 show the full precision-recall curves for these algorithms.

TABLE 2. Region benchmarks on the BSDS. For each segmentation method, the leftmost three columns report the score of the covering of ground-truth segments according to optimal dataset scale (ODS), optimal image scale (OIS), or Best covering criteria. The rightmost four columns compare the segmentation methods against ground-truth using the Probabilistic Rand Index (PRI) and Variation of Information (VI) benchmarks, respectively.

Fig. 11. Watershed Transform. Left: Image. Middle Left: Boundary strength E(x, y). We regard E(x, y) as a topographic surface and flood it from its local minima. Middle Right: This process partitions the image into catchment basins P0 and arcs K0. There is exactly one basin per local minimum and the arcs coincide with the locations where the floods originating from distinct minima meet. Local minima are marked with red dots. Right: Each arc weighted by the mean value of E(x, y) along it. This weighting scheme produces artifacts, such as the strong horizontal contours in the small gap between the two statues.

Content maybe subject to copyright Report

Contour Detection and

Hierarchical Image Segmentation

Pablo Arbel

aez, Member, IEEE, Michael Maire, Member, IEEE,

Charless Fowlkes, Member, IEEE, and Jitendra Malik, Fellow, IEEE.

Abstract—This paper investigates two fundamental problems in computer vision: contour detection and image segmentation. We

present state-of-the-ar t algorithms for both of these tasks. Our contour detector combines multiple local cues into a globalization

framework based on spectral clustering. Our segmentation algorithm consists of generic machinery for transforming the output of

any contour detector into a hierarchical region tree. In this manner, we reduce the problem of image segmentation to that of contour

detection. Extensive experimental evaluation demonstrates that both our contour detection and segmentation methods signiﬁcantly

outperform competing algorithms. The automatically generated hierarchical segmentations can be interactively reﬁned by user-

speciﬁed annotations. Computation at multiple image resolutions provides a means of coupling our system to recognition applications.

1INTRODUCTION

This paper presents a uniﬁed approach to contour de-

tection and image segmentation. Contributions include:

• A high performance contour detector, combining

local and global image information.

• A method to transform any contour signal into a hi-

erarchy of regions while preserving contour quality.

• Extensive quantitative evaluation and the release of

a new annotated dataset.

Figures 1 and 2 summarize our main results. The

two Figures represent the evaluation of multiple con-

tour detection (Figure 1) and image segmentation (Fig-

ure 2) algorithms on the Berkeley Segmentation Dataset

(BSDS300) [1], using the precision-recall framework in-

troduced in [2]. This benchmark operates by compar-

ing machine generated contours to human ground-truth

data (Figure 3) and allows evaluation of segmentations

in the same framework by regarding region boundaries

as contours.

Especially noteworthy in Figure 1 is the contour de-

tector gPb, which compares favorably with other leading

techniques, providing equal or better precision for most

choices of recall. In Figure 2, gPb-owt-ucm provides

universally better performance than alternative segmen-

tation algorithms. We introduced the gPb and gPb-owt-

ucm algorithms in [3] and [4], respectively. This paper

offers comprehensive versions of these algorithms, mo-

tivation behind their design, and additional experiments

which support our basic claims.

We begin with a review of the extensive literature on

contour detection and image segmentation in Section 2.

• P. Arbel´aez and J. Malik are with the Department of Electrical Engineering

and Computer Science, University of California at Berkeley, Berkeley, CA

94720. E-mail: {arbelaez,malik}@eecs.berkeley.edu

• M. Maire is with the Department of Electrical Engineering, California

Institute of Technology, Pasadena, CA 91125. E-mail: mmaire@caltech.edu

• C. Fowlkes is with the Department of Computer Science, University of

California at Irvine, Irvine, CA 92697. E-mail: fowlkes@ics.uci.edu

Section 3 covers the development of the gPb contour

detector. We couple multiscale local brightness, color,

and texture cues to a powerful globalization framework

using spectral clustering. The local cues, computed by

applying oriented gradient operators at every location

in the image, deﬁne an afﬁnity matrix representing the

similarity between pixels. From this matrix, we derive

a generalized eigenproblem and solve for a ﬁxed num-

ber of eigenvectors which encode contour information.

Using a classiﬁer to recombine this signal with the local

cues, we obtain a large improvement over alternative

globalization schemes built on top of similar cues.

To produce high-quality image segmentations, we link

this contour detector with a generic grouping algorithm

described in Section 4 and consisting of two steps. First,

we introduce a new image transformation called the

Oriented Watershed Transform for constructing a set of

initial regions from an oriented contour signal. Second,

using an agglomerative clustering procedure, we form

these regions into a hierarchy which can be represented

by an Ultrametric Contour Map, the real-valued image

obtained by weighting each boundary by its scale of

disappearance. We provide experiments on the BSDS300

as well as the BSDS500, a superset newly released here.

Although the precision-recall framework [2] has found

widespread use for evaluating contour detectors, con-

siderable effort has also gone into developing metrics

to directly measure the quality of regions produced by

segmentation algorithms. Noteworthy examples include

the Probabilistic Rand Index, introduced in this context

by [5], the Variation of Information [6], [7], and the

Segmentation Covering criteria used in the PASCAL

challenge [8]. We consider all of these metrics and

demonstrate that gPb-owt-ucm delivers an across-the-

board improvement over existing algorithms.

Sections 5 and 6 explore ways of connecting our

purely bottom-up contour and segmentation machinery

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

iso−F

Recall

Precision

[F = 0.79] Human

[F = 0.70] gPb

[F = 0.68] Multiscale − Ren (2008)

[F = 0.66] BEL − Dollar, Tu, Belongie (2006)

[F = 0.66] Mairal, Leordeanu, Bach, Herbert, Ponce (2008)

[F = 0.65] Min Cover − Felzenszwalb, McAllester (2006)

[F = 0.65] Pb − Martin, Fowlkes, Malik (2004)

[F = 0.64] Untangling Cycles − Zhu, Song, Shi (2007)

[F = 0.64] CRF − Ren, Fowlkes, Malik (2005)

[F = 0.58] Canny (1986)

[F = 0.56] Perona, Malik (1990)

[F = 0.50] Hildreth, Marr (1980)

[F = 0.48] Prewitt (1970)

[F = 0.48] Sobel (1968)

[F = 0.47] Roberts (1965)

Fig. 1. Evaluation of contour detectors on the Berke-

ley Segmentation Dataset (BSDS300) Benchmark [2].

Leading contour detection approaches are ranked ac-

cording to their maximum F-measure (

2·P recision·Recall

P recision+Recall

)

with respect to human ground-truth boundaries. Iso-F

curves are shown in green. Our gPb detector [3] performs

signiﬁcantly better than other algorithms [2], [17], [18],

[19], [20], [21], [22], [23], [24], [25], [26], [27], [28] across

almost the entire operating regime. Average agreement

between human subjects is indicated by the green dot.

to sources of top-down knowledge. In Section 5, this

knowledge source is a human. Our hierarchical region

trees serve as a natural starting point for interactive

segmentation. With minimal annotation, a user can cor-

rect errors in the automatic segmentation and pull out

objects of interest from the image. In Section 6, we target

top-down object detection algorithms and show how to

create multiscale contour and region output tailored to

match the scales of interest to the object detector.

Though much remains to be done to take full advan-

tage of segmentation as an intermediate processing layer,

recent work has produced payoffs from this endeavor

[9], [10], [11], [12], [13]. In particular, our gPb-owt-ucm

segmentation algorithm has found use in optical ﬂow

[14] and object recognition [15], [16] applications.

2PREVIOUS WORK

The problems of contour detection and segmentation are

related, but not identical. In general, contour detectors

offer no guarantee that they will produce closed contours

and hence do not necessarily provide a partition of the

image into regions. But, one can always recover closed

contours from regions in the form of their boundaries.

As an accomplishment here, Section 4 shows how to do

the reverse and recover regions from a contour detector.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

iso−F

Recall

Precision

[F = 0.79] Human

[F = 0.71] gPb−owt−ucm

[F = 0.67] UCM − Arbelaez (2006)

[F = 0.63] Mean Shift − Comaniciu, Meer (2002)

[F = 0.62] Normalized Cuts − Cour, Benezit, Shi (2005)

[F = 0.58] Canny−owt−ucm

[F = 0.58] Felzenszwalb, Huttenlocher (2004)

[F = 0.58] Av. Diss. − Bertelli, Sumengen, Manjunath, Gibou (2008)

[F = 0.56] SWA − Alpert, Galun, Basri, Brandt (2007)

[F = 0.55] ChanVese − Bertelli, Sumengen, Manjunath, Gibou (2008)

[F = 0.55] Donoser, Urschler, Hirzer, Bischof (2009)

[F = 0.53] Yang, Wright, Ma, Sastry (2007)

Fig. 2. Evaluation of segmentation algorithms on

the BSDS300 Benchmark. Paired with our gPb contour

detector as input, our hierarchical segmentation algorithm

gPb-owt-ucm [4] produces regions whose boundaries

match ground-truth better than those produced by other

methods [7], [29], [30], [31], [32], [33], [34], [35].

Fig. 3. Berkeley Segmentation Dataset [1]. Top to Bot-

tom: Image and ground-truth segment boundaries hand-

drawn by three different human subjects. The BSDS300

consists of 200 training and 100 test images, each with

multiple ground-truth segmentations. The BSDS500 uses

the BSDS300 as training and adds 200 new test images.

Historically, however, there have been different lines of

approach to these two problems, which we now review.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

2.1 Contours

Early approaches to contour detection aim at quantifying

the presence of a boundary at a given image location

through local measurements. The Roberts [17], Sobel

[18], and Prewitt [19] operators detect edges by convolv-

ing a grayscale image with local derivative ﬁlters. Marr

and Hildreth [20] use zero crossings of the Laplacian of

Gaussian operator. The Canny detector [22] also models

edges as sharp discontinuities in the brightness chan-

nel, adding non-maximum suppression and hysteresis

thresholding steps. A richer description can be obtained

by considering the response of the image to a family of

ﬁlters of different scales and orientations. An example

is the Oriented Energy approach [21], [36], [37], which

uses quadrature pairs of even and odd symmetric ﬁlters.

Lindeberg [38] proposes a ﬁlter-based method with an

automatic scale selection mechanism.

More recent local approaches take into account color

and texture information and make use of learning tech-

niques for cue combination [2], [26], [27]. Martin et al.

[2] deﬁne gradient operators for brightness, color, and

texture channels, and use them as input to a logistic

regression classiﬁer for predicting edge strength. Rather

than rely on such hand-crafted features, Dollar et al. [27]

propose a Boosted Edge Learning (BEL) algorithm which

attempts to learn an edge classiﬁer in the form of a

probabilistic boosting tree [39] from thousands of simple

features computed on image patches. An advantage of

this approach is that it may be possible to handle cues

such as parallelism and completion in the initial classi-

ﬁcation stage. Mairal et al. [26] create both generic and

class-speciﬁc edge detectors by learning discriminative

sparse representations of local image patches. For each

class, they learn a discriminative dictionary and use the

reconstruction error obtained with each dictionary as

feature input to a ﬁnal classiﬁer.

The large range of scales at which objects may ap-

pear in the image remains a concern for these modern

local approaches. Ren [28] ﬁnds beneﬁt in combining

information from multiple scales of the local operators

developed by [2]. Additional localization and relative

contrast cues, deﬁned in terms of the multiscale detector

output, are fed to the boundary classiﬁer. For each scale,

the localization cue captures the distance from a pixel

to the nearest peak response. The relative contrast cue

normalizes each pixel in terms of the local neighborhood.

An orthogonal line of work in contour detection fo-

cuses primarily on another level of processing, globaliza-

tion, that utilizes local detector output. The simplest such

algorithms link together high-gradient edge fragments

in order to identify extended, smooth contours [40],

[41], [42]. More advanced globalization stages are the

distinguishing characteristics of several of the recent

high-performance methods benchmarked in Figure 1,

including our own, which share as a common feature

their use of the local edge detection operators of [2].

Ren et al. [23] use the Conditional Random Fields

(CRF) framework to enforce curvilinear continuity of

contours. They compute a constrained Delaunay triangu-

lation (CDT) on top of locally detected contours, yielding

a graph consisting of the detected contours along with

the new “completion” edges introduced by the trian-

gulation. The CDT is scale-invariant and tends to ﬁll

short gaps in the detected contours. By associating a

random variable with each contour and each completion

edge, they deﬁne a CRF with edge potentials in terms

of detector response and vertex potentials in terms of

junction type and continuation smoothness. They use

loopy belief propagation [43] to compute expectations.

Felzenszwalb and McAllester [25] use a different strat-

egy for extracting salient smooth curves from the output

of a local contour detector. They consider the set of

short oriented line segments that connect pixels in the

image to their neighboring pixels. Each such segment is

either part of a curve or is a background segment. They

assume curves are drawn from a Markov process, the

prior distribution on curves favors few per scene, and

detector responses are conditionally independent given

the labeling of line segments. Finding the optimal line

segment labeling then translates into a general weighted

min-cover problem in which the elements being covered

are the line segments themselves and the objects cover-

ing them are drawn from the set of all possible curves

and all possible background line segments. Since this

problem is NP-hard, an approximate solution is found

using a greedy “cost per pixel” heuristic.

Zhu et al. [24] also start with the output of [2] and

create a weighted edgel graph, where the weights mea-

sure directed collinearity between neighboring edgels.

They propose detecting closed topological cycles in this

graph by considering the complex eigenvectors of the

normalized random walk matrix. This procedure extracts

both closed contours and smooth curves, as edgel chains

are allowed to loop back at their termination points.

2.2 Regions

A broad family of approaches to segmentation involve

integrating features such as brightness, color, or tex-

ture over local image patches and then clustering those

features based on, e.g., ﬁtting mixture models [7], [44],

mode-ﬁnding [34], or graph partitioning [32], [45], [46],

[47]. Three algorithms in this category appear to be

the most widely used as sources of image segments in

recent applications, due to a combination of reasonable

performance and publicly available implementations.

The graph based region merging algorithm advocated

by Felzenszwalb and Huttenlocher (Felz-Hutt) [32] at-

tempts to partition image pixels into components such

that the resulting segmentation is neither too coarse nor

too ﬁne. Given a graph in which pixels are nodes and

edge weights measure the dissimilarity between nodes

(e.g. color differences), each node is initially placed in

its own component. Deﬁne the internal difference of a

component Int(R) as the largest weight in the minimum

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

spanning tree of R. Considering edges in non-decreasing

order by weight, each step of the algorithm merges

components R

and R

connected by the current edge if

the edge weight is less than:

min(Int(R

)+τ(R

),Int(R

)+τ(R

)) (1)

where τ (R)=k/|R|. k is a scale parameter that can be

used to set a preference for component size.

The Mean Shift algorithm [34] offers an alternative

clustering framework. Here, pixels are represented in

the joint spatial-range domain by concatenating their

spatial coordinates and color values into a single vector.

Applying mean shift ﬁltering in this domain yields a

convergence point for each pixel. Regions are formed by

grouping together all pixels whose convergence points

are closer than h

in the spatial domain and h

in the

range domain, where h

and h

are respective bandwidth

parameters. Additional merging can also be performed

to enforce a constraint on minimum region area.

Spectral graph theory [48], and in particular the Nor-

malized Cuts criterion [45], [46], provides a way of

integrating global image information into the grouping

process. In this framework, given an afﬁnity matrix W

whose entries encode the similarity between pixels, one

deﬁnes diagonal matrix D



and solves for the

generalized eigenvectors of the linear system:

(D − W )v = λDv (2)

Traditionally, after this step, K-means clustering is

applied to obtain a segmentation into regions. This ap-

proach often breaks uniform regions where the eigenvec-

tors have smooth gradients. One solution is to reweight

the afﬁnity matrix [47]; others have proposed alternative

graph partitioning formulations [49], [50], [51].

A recent variant of Normalized Cuts for image seg-

mentation is the Multiscale Normalized Cuts (NCuts)

approach of Cour et al. [33]. The fact that W must

be sparse, in order to avoid a prohibitively expensive

computation, limits the naive implementation to using

only local pixel afﬁnities. Cour et al. solve this limitation

by computing sparse afﬁnity matrices at multiple scales,

setting up cross-scale constraints, and deriving a new

eigenproblem for this constrained multiscale cut.

Sharon et al. [52] propose an alternative to improve

the computational efﬁciency of Normalized Cuts. This

approach, inspired by algebraic multigrid, iteratively

coarsens the original graph by selecting a subset of nodes

such that each variable on the ﬁne level is strongly

coupled to one on the coarse level. The same merging

strategy is adopted in [31], where the strong coupling of

a subset S of the graph nodes V is formalized as:



j∈S



j∈V

>ψ ∀i ∈ V − S (3)

where ψ is a constant and p

the probability of merging

i and j, estimated from brightness and texture similarity.

Many approaches to image segmentation fall into a

different category than those covered so far, relying on

the formulation of the problem in a variational frame-

work. An example is the model proposed by Mumford

and Shah [53], where the segmentation of an observed

image u

is given by the minimization of the functional:

F(u, C)=



(u − u

)

dx + μ



Ω\C

|∇(u)|

dx + ν|C| (4)

where u is piecewise smooth in Ω\C and μ, ν are weight-

ing parameters. Theoretical properties of this model can

be found in, e.g. [53], [54]. Several algorithms have been

developed to minimize the energy (4) or its simpliﬁed

version, where u is piecewise constant in Ω\C. Koepﬂer

et al. [55] proposed a region merging method for this

purpose. Chan and Vese [56], [57] follow a different

approach, expressing (4) in the level set formalism of

Osher and Sethian [58], [59]. Bertelli et al. [30] extend

this approach to more general cost functions based on

pairwise pixel similarities. Recently, Pock et al. [60] pro-

posed to solve a convex relaxation of (4), thus obtaining

robustness to initialization. Donoser et al. [29] subdivide

the problem into several ﬁgure/ground segmentations,

each initialized using low-level saliency and solved by

minimizing an energy based on Total Variation.

2.3 Benchmarks

Though much of the extensive literature on contour

detection predates its development, the BSDS [2] has

since found wide acceptance as a benchmark for this task

[23], [24], [25], [26], [27], [28], [35], [61]. The standard for

evaluating segmentations algorithms is less clear.

One option is to regard the segment boundaries

as contours and evaluate them as such. However, a

methodology that directly measures the quality of the

segments is also desirable. Some types of errors, e.g. a

missing pixel in the boundary between two regions, may

not be reﬂected in the boundary benchmark, but can

have substantial consequences for segmentation quality,

e.g. incorrectly merging large regions. One might argue

that the boundary benchmark favors contour detectors

over segmentation methods, since the former are not

burdened with the constraint of producing closed curves.

We therefore also consider various region-based metrics.

2.3.1 Variation of Information

The Variation of Information metric was introduced for

the purpose of clustering comparison [6]. It measures the

distance between two segmentations in terms of their

average conditional entropy given by:

VI(S, S



)=H(S)+H(S



) − 2I(S, S



) (5)

where H and I represent respectively the entropies and

mutual information between two clusterings of data S

and S



. In our case, these clusterings are test and ground-

truth segmentations. Although VI possesses some inter-

esting theoretical properties [6], its perceptual meaning

and applicability in the presence of several ground-truth

segmentations remains unclear.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

2.3.2 Rand Index

Originally, the Rand Index [62] was introduced for gen-

eral clustering evaluation. It operates by comparing the

compatibility of assignments between pairs of elements

in the clusters. The Rand Index between test and ground-

truth segmentations S and G is given by the sum of the

number of pairs of pixels that have the same label in

S and G and those that have different labels in both

segmentations, divided by the total number of pairs of

pixels. Variants of the Rand Index have been proposed

[5], [7] for dealing with the case of multiple ground-truth

segmentations. Given a set of ground-truth segmenta-

tions {G

}, the Probabilistic Rand Index is deﬁned as:

PRI(S, {G

})=



i<j

+(1− c

)(1 − p

)] (6)

where c

is the event that pixels i and j have the same

label and p

its probability. T is the total number of

pixel pairs. Using the sample mean to estimate p

, (6)

amounts to averaging the Rand Index among different

ground-truth segmentations. The PRI has been reported

to suffer from a small dynamic range [5], [7], and its

values across images and algorithms are often similar.

In [5], this drawback is addressed by normalization with

an empirical estimation of its expected value.

2.3.3 Segmentation Covering

The overlap between two regions R and R



, deﬁned as:

O(R, R



|R ∩ R



|R ∪ R



(7)

has been used for the evaluation of the pixel-wise clas-

siﬁcation task in recognition [8], [11]. We deﬁne the

covering of a segmentation S by a segmentation S



as:

C(S



→ S)=



R∈S

|R|·max



∈S



O(R, R



) (8)

where N denotes the total number of pixels in the image.

Similarly, the covering of a machine segmentation S by

a family of ground-truth segmentations {G

} is deﬁned

by ﬁrst covering S separately with each human segmen-

tation G

, and then averaging over the different humans.

To achieve perfect covering the machine segmentation

must explain all of the human data. We can then deﬁne

two quality descriptors for regions: the covering of S by

} and the covering of {G

} by S.

3CONTOUR DETECTION

As a starting point for contour detection, we consider

the work of Martin et al. [2], who deﬁne a function

Pb(x, y, θ) that predicts the posterior probability of a

boundary with orientation θ at each image pixel (x, y)

by measuring the difference in local image brightness,

color, and texture channels. In this section, we review

these cues, introduce our own multiscale version of the

Pb detector, and describe the new globalization method

we run on top of this multiscale local detector.

0 0.5 1

Upper Half−Disc Histogram

0 0

Lower Half−Disc Histogram

Fig. 4. Oriented gradient of histograms. Given an

intensity image, consider a circular disc centered at each

pixel and split by a diameter at angle θ. We compute

histograms of intensity values in each half-disc and output

the χ

distance between them as the gradient magnitude.

The blue and red distributions shown in the middle panel

are the histograms of the pixel brightness values in the

blue and red regions, respectively, in the left image. The

right panel shows an example result for a disc of radius

5 pixels at or ientation θ =

after applying a second-

order Savitzky-Golay smoothing ﬁlter to the raw histogram

difference output. Note that the left panel displays a larger

disc (radius 50 pixels) for illustrative purposes.

3.1 Brightness, Color, Texture Gradients

The basic building block of the Pb contour detector is

the computation of an oriented gradient signal G(x, y, θ)

from an intensity image I. This computation proceeds

by placing a circular disc at location (x, y) split into two

half-discs by a diameter at angle θ. For each half-disc, we

histogram the intensity values of the pixels of I covered

by it. The gradient magnitude G at location (x, y) is

deﬁned by the χ

distance between the two half-disc

histograms g and h:

(g, h)=



(g(i) − h(i))

g(i)+h(i)

(9)

We then apply second-order Savitzky-Golay ﬁltering

[63] to enhance local maxima and smooth out multiple

detection peaks in the direction orthogonal to θ. This is

equivalent to ﬁtting a cylindrical parabola, whose axis

is orientated along direction θ, to a local 2D window

surrounding each pixel and replacing the response at the

pixel with that estimated by the ﬁt.

Figure 4 shows an example. This computation is moti-

vated by the intuition that contours correspond to image

discontinuities and histograms provide a robust mech-

anism for modeling the content of an image region. A

strong oriented gradient response means a pixel is likely

to lie on the boundary between two distinct regions.

The Pb detector combines the oriented gradient sig-

nals obtained from transforming an input image into

four separate feature channels and processing each chan-

nel independently. The ﬁrst three correspond to the

channels of the CIE Lab colorspace, which we refer to

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

HTML Viewer

Frequently Asked Questions (15)

Q1. How do the authors compute gradients for each of the channels?

In order to detect fine as well as coarse structures, the authors consider gradients at three scales: [σ2 , σ, 2σ] for each of the brightness, color, and texture channels.

Q2. What is the basic building block of the Pb contour detector?

The basic building block of the Pb contour detector is the computation of an oriented gradient signal G(x, y, θ) from an intensity image The author.

Q3. What is the advantage of this approach?

An advantage of this approach is that it may be possible to handle cues such as parallelism and completion in the initial classification stage.

Q4. What is the way to find the optimal line segment labeling?

Finding the optimal line segment labeling then translates into a general weighted min-cover problem in which the elements being covered are the line segments themselves and the objects covering them are drawn from the set of all possible curves and all possible background line segments.

Q5. What is the graph based region merging algorithm?

The graph based region merging algorithm advocated by Felzenszwalb and Huttenlocher (Felz-Hutt) [32] attempts to partition image pixels into components such that the resulting segmentation is neither too coarse nor too fine.

Q6. What is the weight of the contour currently being removed?

Since at every step of the algorithm all remaining contours must have strength greater than or equal to those previously removed, the weight of the contour currently being removed cannot decrease during the merging process.

Q7. What is the simplest way to partition pixels into components?

Given a graph in which pixels are nodes and edge weights measure the dissimilarity between nodes (e.g. color differences), each node is initially placed in its own component.

Q8. What is the argument that the boundary benchmark favors contour detectors over segmentation methods?

One might argue that the boundary benchmark favors contour detectors over segmentation methods, since the former are not burdened with the constraint of producing closed curves.

Q9. What is the naive implementation of W?

The fact that W must be sparse, in order to avoid a prohibitively expensive computation, limits the naive implementation to using only local pixel affinities.

Q10. What is the way to measure the boundary and region quality?

The BSDS serves as ground-truth for both the boundary and region quality measures, since the human-drawn boundaries are closed and hence are also segmentations.

Q11. What is the procedure for extracting closed contours?

This procedure extracts both closed contours and smooth curves, as edgel chains are allowed to loop back at their termination points.

Q12. What are the recent local approaches?

More recent local approaches take into account color and texture information and make use of learning techniques for cue combination [2], [26], [27].

Q13. What is the general setting for the contour detector?

To describe their algorithm in the most general setting, the authors now consider an arbitrary contour detector, whose output E(x, y, θ) predicts the probability of an image boundary at location (x, y) and orientation θ.

Q14. What is the process of generating a tree of regions?

This process produces a tree of regions, where the leaves are the initial elements of P0, the root is the entire image, and the regions are ordered by the inclusion relation.

Q15. How do the authors use the gPb contour detector?

To produce high-quality image segmentations, the authors link this contour detector with a generic grouping algorithm described in Section 4 and consisting of two steps.

Contour Detection and Hierarchical Image Segmentation

Summary (6 min read)

1 INTRODUCTION

2 PREVIOUS WORK

2.1 Contours

2.2 Regions

2.3 Benchmarks

2.3.1 Variation of Information

2.3.2 Rand Index

2.3.3 Segmentation Covering

3 CONTOUR DETECTION

3.1 Brightness, Color, Texture Gradients

3.2 Multiscale Cue Combination

3.3 Globalization

3.4 Results

4 SEGMENTATION

4.1 Oriented Watershed Transform

4.2 Ultrametric Contour Map

4.3 Results

4.4 Evaluation

4.4.1 Boundary Quality

4.4.2 Region Quality

4.4.4 Summary

5 INTERACTIVE SEGMENTATION

6 MULTISCALE FOR OBJECT ANALYSIS

Figures (22)

Citations

Additional excerpts

Cites background or methods from "Contour Detection and Hierarchical ..."

Cites background from "Contour Detection and Hierarchical ..."

References

"Contour Detection and Hierarchical ..." refers background or methods in this paper

"Contour Detection and Hierarchical ..." refers methods in this paper

Related Papers (5)

Frequently Asked Questions (15)

Q1. How do the authors compute gradients for each of the channels?

Q2. What is the basic building block of the Pb contour detector?

Q3. What is the advantage of this approach?

Q4. What is the way to find the optimal line segment labeling?

Q5. What is the graph based region merging algorithm?

Q6. What is the weight of the contour currently being removed?

Q7. What is the simplest way to partition pixels into components?

Q8. What is the argument that the boundary benchmark favors contour detectors over segmentation methods?

Q9. What is the naive implementation of W?

Q10. What is the way to measure the boundary and region quality?

Q11. What is the procedure for extracting closed contours?

Q12. What are the recent local approaches?

Q13. What is the general setting for the contour detector?

Q14. What is the process of generating a tree of regions?

Q15. How do the authors use the gPb contour detector?