scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Contour Detection and Hierarchical Image Segmentation

TL;DR: This paper investigates two fundamental problems in computer vision: contour detection and image segmentation and presents state-of-the-art algorithms for both of these tasks.
Abstract: This paper investigates two fundamental problems in computer vision: contour detection and image segmentation. We present state-of-the-art algorithms for both of these tasks. Our contour detector combines multiple local cues into a globalization framework based on spectral clustering. Our segmentation algorithm consists of generic machinery for transforming the output of any contour detector into a hierarchical region tree. In this manner, we reduce the problem of image segmentation to that of contour detection. Extensive experimental evaluation demonstrates that both our contour detection and segmentation methods significantly outperform competing algorithms. The automatically generated hierarchical segmentations can be interactively refined by user-specified annotations. Computation at multiple image resolutions provides a means of coupling our system to recognition applications.

Summary (6 min read)

1 INTRODUCTION

  • This paper presents a unified approach to contour detection and image segmentation.
  • This benchmark operates by comparing machine generated contours to human ground-truth data and allows evaluation of segmentations in the same framework by regarding region boundaries as contours.
  • The authors introduced the gPb and gPb-owtucm algorithms in [3] and [4], respectively.
  • To produce high-quality image segmentations, the authors link this contour detector with a generic grouping algorithm described in Section 4 and consisting of two steps.
  • Average agreement between human subjects is indicated by the green dot.

2 PREVIOUS WORK

  • The problems of contour detection and segmentation are related, but not identical.
  • The BSDS300 consists of 200 training and 100 test images, each with multiple ground-truth segmentations.
  • Historically, however, there have been different lines of approach to these two problems, which the authors now review.

2.1 Contours

  • Early approaches to contour detection aim at quantifying the presence of a boundary at a given image location through local measurements.
  • The Roberts [17], Sobel [18], and Prewitt [19] operators detect edges by convolving a grayscale image with local derivative filters.
  • Additional localization and relative contrast cues, defined in terms of the multiscale detector output, are fed to the boundary classifier.
  • The simplest such algorithms link together high-gradient edge fragments in order to identify extended, smooth contours [40], [41], [42].
  • Zhu et al. [24] also start with the output of [2] and create a weighted edgel graph, where the weights measure directed collinearity between neighboring edgels.

2.2 Regions

  • A broad family of approaches to segmentation involve integrating features such as brightness, color, or texture over local image patches and then clustering those features based on, e.g., fitting mixture models [7], [44], mode-finding [34], or graph partitioning [32], [45], [46], [47].
  • The graph based region merging algorithm advocated by Felzenszwalb and Huttenlocher (Felz-Hutt) [32] attempts to partition image pixels into components such that the resulting segmentation is neither too coarse nor too fine.
  • The fact that W must be sparse, in order to avoid a prohibitively expensive computation, limits the naive implementation to using only local pixel affinities.
  • Cour et al. solve this limitation by computing sparse affinity matrices at multiple scales, setting up cross-scale constraints, and deriving a new eigenproblem for this constrained multiscale cut.
  • Recently, Pock et al. [60] proposed to solve a convex relaxation of (4), thus obtaining robustness to initialization.

2.3 Benchmarks

  • The standard for evaluating segmentations algorithms is less clear.
  • One option is to regard the segment boundaries as contours and evaluate them as such.
  • A methodology that directly measures the quality of the segments is also desirable.
  • The authors therefore also consider various region-based metrics.

2.3.1 Variation of Information

  • The Variation of Information metric was introduced for the purpose of clustering comparison [6].
  • It measures the distance between two segmentations in terms of their average conditional entropy given by: V I(S, S′) = H(S) + H(S′)− 2I(S, S′) (5) where H and I represent respectively the entropies and mutual information between two clusterings of data S and S′.
  • In their case, these clusterings are test and groundtruth segmentations.
  • Its perceptual meaning and applicability in the presence of several ground-truth segmentations remains unclear.

2.3.2 Rand Index

  • Originally, the Rand Index [62] was introduced for general clustering evaluation.
  • The Rand Index between test and groundtruth segmentations S and G is given by the sum of the number of pairs of pixels that have the same label in S and G and those that have different labels in both segmentations, divided by the total number of pairs of pixels.
  • Variants of the Rand Index have been proposed [5], [7] for dealing with the case of multiple ground-truth segmentations.
  • Using the sample mean to estimate pij , (6) amounts to averaging the Rand Index among different ground-truth segmentations.
  • The PRI has been reported to suffer from a small dynamic range [5], [7], and its values across images and algorithms are often similar.

2.3.3 Segmentation Covering

  • Similarly, the covering of a machine segmentation S by a family of ground-truth segmentations {Gi} is defined by first covering S separately with each human segmentation Gi, and then averaging over the different humans.
  • To achieve perfect covering the machine segmentation must explain all of the human data.

3 CONTOUR DETECTION

  • As a starting point for contour detection, the authors consider the work of Martin et al. [2], who define a function Pb(x, y, θ) that predicts the posterior probability of a boundary with orientation θ at each image pixel (x, y) by measuring the difference in local image brightness, color, and texture channels.
  • The authors review these cues, introduce their own multiscale version of the Pb detector, and describe the new globalization method they run on top of this multiscale local detector.

3.1 Brightness, Color, Texture Gradients

  • This is equivalent to fitting a cylindrical parabola, whose axis is orientated along direction θ, to a local 2D window surrounding each pixel and replacing the response at the pixel with that estimated by the fit.
  • The first three correspond to the channels of the CIE Lab colorspace, which the authors refer to 6 as the brightness, color a, and color b channels.
  • Each pixel is associated with a (17-dimensional) vector of responses, containing one entry for each filter.
  • The cluster centers define a set of image-specific textons and each pixel is assigned the integer id in [1, K] of the closest cluster center.
  • On this image, the authors compute differences of histograms in oriented half-discs in the same manner as for the brightness and color channels.

3.2 Multiscale Cue Combination

  • The authors now introduce their own multiscale extension of the Pb detector reviewed above.
  • Note that Ren [28] introduces a different, more complicated, and similarly performing multiscale extension in work contemporaneous with their own [3], and also suggests possible reasons Martin et al. [2] did not see performance improvements in their original multiscale experiments, including their use of smaller images and their choice of scales.
  • Figure 6 shows an example of the oriented gradients obtained for each channel.
  • The parameters αi,s weight the relative contribution of each gradient signal.
  • Taking the maximum response over orientations yields a measure of boundary strength at each pixel: mPb(x, y) = max θ {mPb(x, y, θ)} (11) An optional non-maximum suppression step [22] produces thinned, real-valued contours.

3.3 Globalization

  • Spectral clustering lies at the heart of their globalization machinery.
  • At this point, the standard Normalized Cuts approach associates with each pixel a length n descriptor formed from entries of the n eigenvectors and uses a clustering algorithm such as K-means to create a hard partition of the image.
  • To circumvent this difficulty, the authors observe that the eigenvectors themselves carry contour information.
  • Taking derivatives in this manner ignores the smooth variations that previously lead to errors.
  • As with mPb (10), the weights βi,s and γ are learned by gradient ascent on the F-measure using the BSDS training images.

3.4 Results

  • Qualitatively, the combination of the multiscale cues with their globalization machinery translates into a reduction of clutter edges and completion of contours in the output, as shown in Figure 9.
  • Figure 10 breaks down the contributions of the multiscale and spectral signals to the performance of gPb.
  • These precision-recall curves show that the reduction of false positives due to the use of global information in sPb is concentrated in the high thresholds, while gPb takes the best of both worlds, relying on sPb in the high precision regime and on mPb in the high recall regime.
  • Looking again at the comparison of contour detectors on the BSDS300 benchmark in Figure 1, the mean improvement in precision of gPb with respect to the single scale Pb is 10% in the recall range [0.1, 0.9].

4 SEGMENTATION

  • The nonmax suppressed gPb contours produced in the previous section are often not closed and hence do not partition the image into regions.
  • Regions come with their own scale estimates and provide natural domains for computing features used in recognition.
  • The authors show how to recover closed contours, while preserving the gains in boundary quality achieved in the previous section.
  • The authors algorithm, first reported in [4], builds a hierarchical segmentation by exploiting the information in the contour signal.

4.1 Oriented Watershed Transform

  • Using the contour signal, the authors first construct a finest partition for the hierarchy, an over-segmentation whose regions determine the highest level of detail considered.
  • The catchment basins of the minima, denoted P0, provide the regions of the finest partition and the corresponding watershed arcs, K0, the possible locations of the boundaries.
  • A pixel could lie near but not on a strong vertical contour.
  • Several such cases can be seen in Figure 11.

4.2 Ultrametric Contour Map

  • One can interpret the boundary strength assigned to an arc by the Oriented Watershed Transform (OWT) of the previous section as an estimate of the probability of that arc being a true contour.
  • One possibility, which the authors exploit here, is the Ultrametric Contour Map (UCM) [35] which defines a duality between closed, non-selfintersecting weighted contours and a hierarchy of regions.
  • Upper levels of the hierarchy respect only strong contours, resulting in an under-segmentation.
  • Specifically: 1) Select minimum weight contour: C∗ = argminC∈K0W (C).
  • Hence, the constructed region tree has the structure of an indexed hierarchy and can be described by a dendrogram, where the height H(R) of each region R is the value of the dissimilarity at which it first appears.

4.3 Results

  • While the OWT-UCM algorithm can use any source of contours for the input E(x, y, θ) signal (e.g. the Canny edge detector before thresholding), the authors obtain best results by employing the gPb detector [3] introduced in Section 3.
  • The authors report experiments using both gPb as well as the baseline Canny detector, and refer to the resulting segmentation algorithms as gPb-owt-ucm and Cannyowt-ucm, respectively.
  • Since the OWT-UCM algorithm produces hierarchical region trees, obtaining a single segmentation as output involves a choice of scale.
  • One possibility is to use a fixed threshold for all images in the dataset, calibrated to provide optimal performance on the training set.

4.4 Evaluation

  • To provide a basis of comparison for the OWT-UCM algorithm, the authors make use of the region merging [32], Mean Shift [34], Multiscale NCuts [33], and SWA [31] segmentation methods reviewed in Section 2.2.
  • The authors evaluate each method using the boundary-based precision-recall framework of [2], as well as the Variation of Information, Probabilistic Rand Index, and segment covering criteria discussed in Section 2.3.
  • The BSDS serves as ground-truth for both the boundary and region quality measures, since the human-drawn boundaries are closed and hence are also segmentations.

4.4.1 Boundary Quality

  • Remember that the evaluation methodology developed by [2] measures detector performance in terms of precision, the fraction of true positives, and recall, the fraction of ground-truth boundary pixels detected.
  • The global Fmeasure, or harmonic mean of precision and recall at the optimal detector threshold, provides a summary score.
  • Figures 2 and 17 display the full precision-recall curves on the BSDS300 and BSDS500 datasets, respectively.
  • The authors find retraining on the BSDS500 to be unnecessary and use the same parameters learned on the BSDS300.
  • Of particular note in Figure 17 are pairs of curves corresponding to contour detector output and regions produced by running the OWT-UCM algorithm on that output.

4.4.2 Region Quality

  • Table 2 presents region benchmarks on the BSDS.
  • For a family of machine segmentations {Si}, associated with different scales of a hierarchical algorithm or different sets of parameters, the authors report three scores for the covering of the ground-truth by segments in {Si}.
  • These correspond to selecting covering regions from the segmentation at a universal fixed scale (ODS), a fixed scale per image (OIS), or from any level of the hierarchy or collection {Si} (Best).
  • The authors also report the Probabilistic Rand Index and Variation of Information benchmarks.
  • While the relative ranking of segmentation algorithms remains fairly consistent across different benchmark criteria, the boundary benchmark appears most capable of discriminating performance.

4.4.4 Summary

  • The gPb-owt-ucm segmentation algorithm offers the best performance on every dataset and for every benchmark criterion the authors tested.
  • In addition, it is straight-forward, fast, has no parameters to tune, and, as discussed in the following sections, can be adapted for use with topdown knowledge sources.

5 INTERACTIVE SEGMENTATION

  • Until now, the authors have only discussed fully automatic image segmentation.
  • Human assisted segmentation is relevant for many applications, and recent approaches rely on the graph-cuts formalism [72], [73], [74] or other energy minimization procedure [75] to extract foreground regions.
  • The unary potentials encode agreement with estimated foreground or background region models and the pairwise potentials bias neighboring pixels not separated by a strong boundary to have the same label.
  • User-specified hard labeling constraints are enforced by connecting a pixel to the source or sink with sufficiently large weight.
  • Each unlabeled region receives the label of the first labeled region merged with it.

6 MULTISCALE FOR OBJECT ANALYSIS

  • The authors contour detection and segmentation algorithms capture multiscale information by combining local gradient cues computed at three different scales, as described in Section 3.2.
  • Note that this procedure does not prevent the object detector itself from using multiscale information, but rather provides the correct central scale.
  • Martin et al. [2] suggest ways to speed up this computation, including incremental updating of the histograms as the disc is swept across the image.
  • Moreover, in this case, no approximation is required as these operations are equivalent up to the numerical accuracy of the interpolation done when rotating the image.
  • Catanzaro et al. [77] have created a parallel GPU implementation of their gPb contour detector.

Did you find this useful? Give us your feedback

Figures (22)

Content maybe subject to copyright    Report

1
Contour Detection and
Hierarchical Image Segmentation
Pablo Arbel
´
aez, Member, IEEE, Michael Maire, Member, IEEE,
Charless Fowlkes, Member, IEEE, and Jitendra Malik, Fellow, IEEE.
Abstract—This paper investigates two fundamental problems in computer vision: contour detection and image segmentation. We
present state-of-the-ar t algorithms for both of these tasks. Our contour detector combines multiple local cues into a globalization
framework based on spectral clustering. Our segmentation algorithm consists of generic machinery for transforming the output of
any contour detector into a hierarchical region tree. In this manner, we reduce the problem of image segmentation to that of contour
detection. Extensive experimental evaluation demonstrates that both our contour detection and segmentation methods significantly
outperform competing algorithms. The automatically generated hierarchical segmentations can be interactively refined by user-
specified annotations. Computation at multiple image resolutions provides a means of coupling our system to recognition applications.
!
1INTRODUCTION
This paper presents a unified approach to contour de-
tection and image segmentation. Contributions include:
A high performance contour detector, combining
local and global image information.
A method to transform any contour signal into a hi-
erarchy of regions while preserving contour quality.
Extensive quantitative evaluation and the release of
a new annotated dataset.
Figures 1 and 2 summarize our main results. The
two Figures represent the evaluation of multiple con-
tour detection (Figure 1) and image segmentation (Fig-
ure 2) algorithms on the Berkeley Segmentation Dataset
(BSDS300) [1], using the precision-recall framework in-
troduced in [2]. This benchmark operates by compar-
ing machine generated contours to human ground-truth
data (Figure 3) and allows evaluation of segmentations
in the same framework by regarding region boundaries
as contours.
Especially noteworthy in Figure 1 is the contour de-
tector gPb, which compares favorably with other leading
techniques, providing equal or better precision for most
choices of recall. In Figure 2, gPb-owt-ucm provides
universally better performance than alternative segmen-
tation algorithms. We introduced the gPb and gPb-owt-
ucm algorithms in [3] and [4], respectively. This paper
offers comprehensive versions of these algorithms, mo-
tivation behind their design, and additional experiments
which support our basic claims.
We begin with a review of the extensive literature on
contour detection and image segmentation in Section 2.
P. Arbel´aez and J. Malik are with the Department of Electrical Engineering
and Computer Science, University of California at Berkeley, Berkeley, CA
94720. E-mail: {arbelaez,malik}@eecs.berkeley.edu
M. Maire is with the Department of Electrical Engineering, California
Institute of Technology, Pasadena, CA 91125. E-mail: mmaire@caltech.edu
C. Fowlkes is with the Department of Computer Science, University of
California at Irvine, Irvine, CA 92697. E-mail: fowlkes@ics.uci.edu
Section 3 covers the development of the gPb contour
detector. We couple multiscale local brightness, color,
and texture cues to a powerful globalization framework
using spectral clustering. The local cues, computed by
applying oriented gradient operators at every location
in the image, define an affinity matrix representing the
similarity between pixels. From this matrix, we derive
a generalized eigenproblem and solve for a fixed num-
ber of eigenvectors which encode contour information.
Using a classifier to recombine this signal with the local
cues, we obtain a large improvement over alternative
globalization schemes built on top of similar cues.
To produce high-quality image segmentations, we link
this contour detector with a generic grouping algorithm
described in Section 4 and consisting of two steps. First,
we introduce a new image transformation called the
Oriented Watershed Transform for constructing a set of
initial regions from an oriented contour signal. Second,
using an agglomerative clustering procedure, we form
these regions into a hierarchy which can be represented
by an Ultrametric Contour Map, the real-valued image
obtained by weighting each boundary by its scale of
disappearance. We provide experiments on the BSDS300
as well as the BSDS500, a superset newly released here.
Although the precision-recall framework [2] has found
widespread use for evaluating contour detectors, con-
siderable effort has also gone into developing metrics
to directly measure the quality of regions produced by
segmentation algorithms. Noteworthy examples include
the Probabilistic Rand Index, introduced in this context
by [5], the Variation of Information [6], [7], and the
Segmentation Covering criteria used in the PASCAL
challenge [8]. We consider all of these metrics and
demonstrate that gPb-owt-ucm delivers an across-the-
board improvement over existing algorithms.
Sections 5 and 6 explore ways of connecting our
purely bottom-up contour and segmentation machinery
Digital Object Indentifier 10.1109/TPAMI.2010.161 0162-8828/10/$26.00 © 2010 IEEE
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
isoF
Recall
Precision
[F = 0.79] Human
[F = 0.70] gPb
[F = 0.68] Multiscale Ren (2008)
[F = 0.66] BEL Dollar, Tu, Belongie (2006)
[F = 0.66] Mairal, Leordeanu, Bach, Herbert, Ponce (2008)
[F = 0.65] Min Cover Felzenszwalb, McAllester (2006)
[F = 0.65] Pb Martin, Fowlkes, Malik (2004)
[F = 0.64] Untangling Cycles Zhu, Song, Shi (2007)
[F = 0.64] CRF Ren, Fowlkes, Malik (2005)
[F = 0.58] Canny (1986)
[F = 0.56] Perona, Malik (1990)
[F = 0.50] Hildreth, Marr (1980)
[F = 0.48] Prewitt (1970)
[F = 0.48] Sobel (1968)
[F = 0.47] Roberts (1965)
Fig. 1. Evaluation of contour detectors on the Berke-
ley Segmentation Dataset (BSDS300) Benchmark [2].
Leading contour detection approaches are ranked ac-
cording to their maximum F-measure (
2·P recision·Recall
P recision+Recall
)
with respect to human ground-truth boundaries. Iso-F
curves are shown in green. Our gPb detector [3] performs
significantly better than other algorithms [2], [17], [18],
[19], [20], [21], [22], [23], [24], [25], [26], [27], [28] across
almost the entire operating regime. Average agreement
between human subjects is indicated by the green dot.
to sources of top-down knowledge. In Section 5, this
knowledge source is a human. Our hierarchical region
trees serve as a natural starting point for interactive
segmentation. With minimal annotation, a user can cor-
rect errors in the automatic segmentation and pull out
objects of interest from the image. In Section 6, we target
top-down object detection algorithms and show how to
create multiscale contour and region output tailored to
match the scales of interest to the object detector.
Though much remains to be done to take full advan-
tage of segmentation as an intermediate processing layer,
recent work has produced payoffs from this endeavor
[9], [10], [11], [12], [13]. In particular, our gPb-owt-ucm
segmentation algorithm has found use in optical flow
[14] and object recognition [15], [16] applications.
2PREVIOUS WORK
The problems of contour detection and segmentation are
related, but not identical. In general, contour detectors
offer no guarantee that they will produce closed contours
and hence do not necessarily provide a partition of the
image into regions. But, one can always recover closed
contours from regions in the form of their boundaries.
As an accomplishment here, Section 4 shows how to do
the reverse and recover regions from a contour detector.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
isoF
Recall
Precision
[F = 0.79] Human
[F = 0.71] gPbowtucm
[F = 0.67] UCM Arbelaez (2006)
[F = 0.63] Mean Shift Comaniciu, Meer (2002)
[F = 0.62] Normalized Cuts Cour, Benezit, Shi (2005)
[F = 0.58] Cannyowtucm
[F = 0.58] Felzenszwalb, Huttenlocher (2004)
[F = 0.58] Av. Diss. Bertelli, Sumengen, Manjunath, Gibou (2008)
[F = 0.56] SWA Alpert, Galun, Basri, Brandt (2007)
[F = 0.55] ChanVese Bertelli, Sumengen, Manjunath, Gibou (2008)
[F = 0.55] Donoser, Urschler, Hirzer, Bischof (2009)
[F = 0.53] Yang, Wright, Ma, Sastry (2007)
Fig. 2. Evaluation of segmentation algorithms on
the BSDS300 Benchmark. Paired with our gPb contour
detector as input, our hierarchical segmentation algorithm
gPb-owt-ucm [4] produces regions whose boundaries
match ground-truth better than those produced by other
methods [7], [29], [30], [31], [32], [33], [34], [35].
Fig. 3. Berkeley Segmentation Dataset [1]. Top to Bot-
tom: Image and ground-truth segment boundaries hand-
drawn by three different human subjects. The BSDS300
consists of 200 training and 100 test images, each with
multiple ground-truth segmentations. The BSDS500 uses
the BSDS300 as training and adds 200 new test images.
Historically, however, there have been different lines of
approach to these two problems, which we now review.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

3
2.1 Contours
Early approaches to contour detection aim at quantifying
the presence of a boundary at a given image location
through local measurements. The Roberts [17], Sobel
[18], and Prewitt [19] operators detect edges by convolv-
ing a grayscale image with local derivative filters. Marr
and Hildreth [20] use zero crossings of the Laplacian of
Gaussian operator. The Canny detector [22] also models
edges as sharp discontinuities in the brightness chan-
nel, adding non-maximum suppression and hysteresis
thresholding steps. A richer description can be obtained
by considering the response of the image to a family of
filters of different scales and orientations. An example
is the Oriented Energy approach [21], [36], [37], which
uses quadrature pairs of even and odd symmetric filters.
Lindeberg [38] proposes a filter-based method with an
automatic scale selection mechanism.
More recent local approaches take into account color
and texture information and make use of learning tech-
niques for cue combination [2], [26], [27]. Martin et al.
[2] define gradient operators for brightness, color, and
texture channels, and use them as input to a logistic
regression classifier for predicting edge strength. Rather
than rely on such hand-crafted features, Dollar et al. [27]
propose a Boosted Edge Learning (BEL) algorithm which
attempts to learn an edge classifier in the form of a
probabilistic boosting tree [39] from thousands of simple
features computed on image patches. An advantage of
this approach is that it may be possible to handle cues
such as parallelism and completion in the initial classi-
fication stage. Mairal et al. [26] create both generic and
class-specific edge detectors by learning discriminative
sparse representations of local image patches. For each
class, they learn a discriminative dictionary and use the
reconstruction error obtained with each dictionary as
feature input to a final classifier.
The large range of scales at which objects may ap-
pear in the image remains a concern for these modern
local approaches. Ren [28] finds benefit in combining
information from multiple scales of the local operators
developed by [2]. Additional localization and relative
contrast cues, defined in terms of the multiscale detector
output, are fed to the boundary classifier. For each scale,
the localization cue captures the distance from a pixel
to the nearest peak response. The relative contrast cue
normalizes each pixel in terms of the local neighborhood.
An orthogonal line of work in contour detection fo-
cuses primarily on another level of processing, globaliza-
tion, that utilizes local detector output. The simplest such
algorithms link together high-gradient edge fragments
in order to identify extended, smooth contours [40],
[41], [42]. More advanced globalization stages are the
distinguishing characteristics of several of the recent
high-performance methods benchmarked in Figure 1,
including our own, which share as a common feature
their use of the local edge detection operators of [2].
Ren et al. [23] use the Conditional Random Fields
(CRF) framework to enforce curvilinear continuity of
contours. They compute a constrained Delaunay triangu-
lation (CDT) on top of locally detected contours, yielding
a graph consisting of the detected contours along with
the new “completion” edges introduced by the trian-
gulation. The CDT is scale-invariant and tends to fill
short gaps in the detected contours. By associating a
random variable with each contour and each completion
edge, they define a CRF with edge potentials in terms
of detector response and vertex potentials in terms of
junction type and continuation smoothness. They use
loopy belief propagation [43] to compute expectations.
Felzenszwalb and McAllester [25] use a different strat-
egy for extracting salient smooth curves from the output
of a local contour detector. They consider the set of
short oriented line segments that connect pixels in the
image to their neighboring pixels. Each such segment is
either part of a curve or is a background segment. They
assume curves are drawn from a Markov process, the
prior distribution on curves favors few per scene, and
detector responses are conditionally independent given
the labeling of line segments. Finding the optimal line
segment labeling then translates into a general weighted
min-cover problem in which the elements being covered
are the line segments themselves and the objects cover-
ing them are drawn from the set of all possible curves
and all possible background line segments. Since this
problem is NP-hard, an approximate solution is found
using a greedy “cost per pixel” heuristic.
Zhu et al. [24] also start with the output of [2] and
create a weighted edgel graph, where the weights mea-
sure directed collinearity between neighboring edgels.
They propose detecting closed topological cycles in this
graph by considering the complex eigenvectors of the
normalized random walk matrix. This procedure extracts
both closed contours and smooth curves, as edgel chains
are allowed to loop back at their termination points.
2.2 Regions
A broad family of approaches to segmentation involve
integrating features such as brightness, color, or tex-
ture over local image patches and then clustering those
features based on, e.g., fitting mixture models [7], [44],
mode-finding [34], or graph partitioning [32], [45], [46],
[47]. Three algorithms in this category appear to be
the most widely used as sources of image segments in
recent applications, due to a combination of reasonable
performance and publicly available implementations.
The graph based region merging algorithm advocated
by Felzenszwalb and Huttenlocher (Felz-Hutt) [32] at-
tempts to partition image pixels into components such
that the resulting segmentation is neither too coarse nor
too fine. Given a graph in which pixels are nodes and
edge weights measure the dissimilarity between nodes
(e.g. color differences), each node is initially placed in
its own component. Define the internal difference of a
component Int(R) as the largest weight in the minimum
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

4
spanning tree of R. Considering edges in non-decreasing
order by weight, each step of the algorithm merges
components R
1
and R
2
connected by the current edge if
the edge weight is less than:
min(Int(R
1
)+τ(R
1
),Int(R
2
)+τ(R
2
)) (1)
where τ (R)=k/|R|. k is a scale parameter that can be
used to set a preference for component size.
The Mean Shift algorithm [34] offers an alternative
clustering framework. Here, pixels are represented in
the joint spatial-range domain by concatenating their
spatial coordinates and color values into a single vector.
Applying mean shift filtering in this domain yields a
convergence point for each pixel. Regions are formed by
grouping together all pixels whose convergence points
are closer than h
s
in the spatial domain and h
r
in the
range domain, where h
s
and h
r
are respective bandwidth
parameters. Additional merging can also be performed
to enforce a constraint on minimum region area.
Spectral graph theory [48], and in particular the Nor-
malized Cuts criterion [45], [46], provides a way of
integrating global image information into the grouping
process. In this framework, given an affinity matrix W
whose entries encode the similarity between pixels, one
defines diagonal matrix D
ii
=
j
W
ij
and solves for the
generalized eigenvectors of the linear system:
(D W )v = λDv (2)
Traditionally, after this step, K-means clustering is
applied to obtain a segmentation into regions. This ap-
proach often breaks uniform regions where the eigenvec-
tors have smooth gradients. One solution is to reweight
the affinity matrix [47]; others have proposed alternative
graph partitioning formulations [49], [50], [51].
A recent variant of Normalized Cuts for image seg-
mentation is the Multiscale Normalized Cuts (NCuts)
approach of Cour et al. [33]. The fact that W must
be sparse, in order to avoid a prohibitively expensive
computation, limits the naive implementation to using
only local pixel affinities. Cour et al. solve this limitation
by computing sparse affinity matrices at multiple scales,
setting up cross-scale constraints, and deriving a new
eigenproblem for this constrained multiscale cut.
Sharon et al. [52] propose an alternative to improve
the computational efficiency of Normalized Cuts. This
approach, inspired by algebraic multigrid, iteratively
coarsens the original graph by selecting a subset of nodes
such that each variable on the fine level is strongly
coupled to one on the coarse level. The same merging
strategy is adopted in [31], where the strong coupling of
a subset S of the graph nodes V is formalized as:
jS
p
ij
jV
p
ij
i V S (3)
where ψ is a constant and p
ij
the probability of merging
i and j, estimated from brightness and texture similarity.
Many approaches to image segmentation fall into a
different category than those covered so far, relying on
the formulation of the problem in a variational frame-
work. An example is the model proposed by Mumford
and Shah [53], where the segmentation of an observed
image u
0
is given by the minimization of the functional:
F(u, C)=
Ω
(u u
0
)
2
dx + μ
Ω\C
|∇(u)|
2
dx + ν|C| (4)
where u is piecewise smooth in Ω\C and μ, ν are weight-
ing parameters. Theoretical properties of this model can
be found in, e.g. [53], [54]. Several algorithms have been
developed to minimize the energy (4) or its simplified
version, where u is piecewise constant in Ω\C. Koepfler
et al. [55] proposed a region merging method for this
purpose. Chan and Vese [56], [57] follow a different
approach, expressing (4) in the level set formalism of
Osher and Sethian [58], [59]. Bertelli et al. [30] extend
this approach to more general cost functions based on
pairwise pixel similarities. Recently, Pock et al. [60] pro-
posed to solve a convex relaxation of (4), thus obtaining
robustness to initialization. Donoser et al. [29] subdivide
the problem into several figure/ground segmentations,
each initialized using low-level saliency and solved by
minimizing an energy based on Total Variation.
2.3 Benchmarks
Though much of the extensive literature on contour
detection predates its development, the BSDS [2] has
since found wide acceptance as a benchmark for this task
[23], [24], [25], [26], [27], [28], [35], [61]. The standard for
evaluating segmentations algorithms is less clear.
One option is to regard the segment boundaries
as contours and evaluate them as such. However, a
methodology that directly measures the quality of the
segments is also desirable. Some types of errors, e.g. a
missing pixel in the boundary between two regions, may
not be reflected in the boundary benchmark, but can
have substantial consequences for segmentation quality,
e.g. incorrectly merging large regions. One might argue
that the boundary benchmark favors contour detectors
over segmentation methods, since the former are not
burdened with the constraint of producing closed curves.
We therefore also consider various region-based metrics.
2.3.1 Variation of Information
The Variation of Information metric was introduced for
the purpose of clustering comparison [6]. It measures the
distance between two segmentations in terms of their
average conditional entropy given by:
VI(S, S
)=H(S)+H(S
) 2I(S, S
) (5)
where H and I represent respectively the entropies and
mutual information between two clusterings of data S
and S
. In our case, these clusterings are test and ground-
truth segmentations. Although VI possesses some inter-
esting theoretical properties [6], its perceptual meaning
and applicability in the presence of several ground-truth
segmentations remains unclear.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

5
2.3.2 Rand Index
Originally, the Rand Index [62] was introduced for gen-
eral clustering evaluation. It operates by comparing the
compatibility of assignments between pairs of elements
in the clusters. The Rand Index between test and ground-
truth segmentations S and G is given by the sum of the
number of pairs of pixels that have the same label in
S and G and those that have different labels in both
segmentations, divided by the total number of pairs of
pixels. Variants of the Rand Index have been proposed
[5], [7] for dealing with the case of multiple ground-truth
segmentations. Given a set of ground-truth segmenta-
tions {G
k
}, the Probabilistic Rand Index is defined as:
PRI(S, {G
k
})=
1
T
i<j
[c
ij
p
ij
+(1 c
ij
)(1 p
ij
)] (6)
where c
ij
is the event that pixels i and j have the same
label and p
ij
its probability. T is the total number of
pixel pairs. Using the sample mean to estimate p
ij
, (6)
amounts to averaging the Rand Index among different
ground-truth segmentations. The PRI has been reported
to suffer from a small dynamic range [5], [7], and its
values across images and algorithms are often similar.
In [5], this drawback is addressed by normalization with
an empirical estimation of its expected value.
2.3.3 Segmentation Covering
The overlap between two regions R and R
, defined as:
O(R, R
)=
|R R
|
|R R
|
(7)
has been used for the evaluation of the pixel-wise clas-
sification task in recognition [8], [11]. We define the
covering of a segmentation S by a segmentation S
as:
C(S
S)=
1
N
RS
|Rmax
R
S
O(R, R
) (8)
where N denotes the total number of pixels in the image.
Similarly, the covering of a machine segmentation S by
a family of ground-truth segmentations {G
i
} is defined
by first covering S separately with each human segmen-
tation G
i
, and then averaging over the different humans.
To achieve perfect covering the machine segmentation
must explain all of the human data. We can then define
two quality descriptors for regions: the covering of S by
{G
i
} and the covering of {G
i
} by S.
3CONTOUR DETECTION
As a starting point for contour detection, we consider
the work of Martin et al. [2], who define a function
Pb(x, y, θ) that predicts the posterior probability of a
boundary with orientation θ at each image pixel (x, y)
by measuring the difference in local image brightness,
color, and texture channels. In this section, we review
these cues, introduce our own multiscale version of the
Pb detector, and describe the new globalization method
we run on top of this multiscale local detector.
0 0.5 1
Upper HalfDisc Histogram
0 0
.
5
1
Lower HalfDisc Histogram
Fig. 4. Oriented gradient of histograms. Given an
intensity image, consider a circular disc centered at each
pixel and split by a diameter at angle θ. We compute
histograms of intensity values in each half-disc and output
the χ
2
distance between them as the gradient magnitude.
The blue and red distributions shown in the middle panel
are the histograms of the pixel brightness values in the
blue and red regions, respectively, in the left image. The
right panel shows an example result for a disc of radius
5 pixels at or ientation θ =
π
4
after applying a second-
order Savitzky-Golay smoothing filter to the raw histogram
difference output. Note that the left panel displays a larger
disc (radius 50 pixels) for illustrative purposes.
3.1 Brightness, Color, Texture Gradients
The basic building block of the Pb contour detector is
the computation of an oriented gradient signal G(x, y, θ)
from an intensity image I. This computation proceeds
by placing a circular disc at location (x, y) split into two
half-discs by a diameter at angle θ. For each half-disc, we
histogram the intensity values of the pixels of I covered
by it. The gradient magnitude G at location (x, y) is
defined by the χ
2
distance between the two half-disc
histograms g and h:
χ
2
(g, h)=
1
2
i
(g(i) h(i))
2
g(i)+h(i)
(9)
We then apply second-order Savitzky-Golay filtering
[63] to enhance local maxima and smooth out multiple
detection peaks in the direction orthogonal to θ. This is
equivalent to fitting a cylindrical parabola, whose axis
is orientated along direction θ, to a local 2D window
surrounding each pixel and replacing the response at the
pixel with that estimated by the fit.
Figure 4 shows an example. This computation is moti-
vated by the intuition that contours correspond to image
discontinuities and histograms provide a robust mech-
anism for modeling the content of an image region. A
strong oriented gradient response means a pixel is likely
to lie on the boundary between two distinct regions.
The Pb detector combines the oriented gradient sig-
nals obtained from transforming an input image into
four separate feature channels and processing each chan-
nel independently. The first three correspond to the
channels of the CIE Lab colorspace, which we refer to
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

Citations
More filters
Journal ArticleDOI
TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.
Abstract: The ImageNet Large Scale Visual Recognition Challenge is a benchmark in object category classification and detection on hundreds of object categories and millions of images. The challenge has been run annually from 2010 to present, attracting participation from more than fifty institutions. This paper describes the creation of this benchmark dataset and the advances in object recognition that have been possible as a result. We discuss the challenges of collecting large-scale ground truth annotation, highlight key breakthroughs in categorical object recognition, provide a detailed analysis of the current state of the field of large-scale image classification and object detection, and compare the state-of-the-art computer vision accuracy with human accuracy. We conclude with lessons learned in the 5 years of the challenge, and propose future directions and improvements.

30,811 citations

Book ChapterDOI
06 Sep 2014
TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.
Abstract: We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding. This is achieved by gathering images of complex everyday scenes containing common objects in their natural context. Objects are labeled using per-instance segmentations to aid in precise object localization. Our dataset contains photos of 91 objects types that would be easily recognizable by a 4 year old. With a total of 2.5 million labeled instances in 328k images, the creation of our dataset drew upon extensive crowd worker involvement via novel user interfaces for category detection, instance spotting and instance segmentation. We present a detailed statistical analysis of the dataset in comparison to PASCAL, ImageNet, and SUN. Finally, we provide baseline performance analysis for bounding box and segmentation detection results using a Deformable Parts Model.

30,462 citations


Additional excerpts

  • ...The Berkeley Segmentation Data Set (BSDS500) [37] has been used extensively to evaluate both segmentation and edge detection algorithms....

    [...]

  • ...The Berkeley Segmentation Data Set (BSDS500) [37]...

    [...]

Journal ArticleDOI
TL;DR: This paper introduces selective search which combines the strength of both an exhaustive search and segmentation, and shows that its selective search enables the use of the powerful Bag-of-Words model for recognition.
Abstract: This paper addresses the problem of generating possible object locations for use in object recognition. We introduce selective search which combines the strength of both an exhaustive search and segmentation. Like segmentation, we use the image structure to guide our sampling process. Like exhaustive search, we aim to capture all possible object locations. Instead of a single technique to generate possible object locations, we diversify our search and use a variety of complementary image partitionings to deal with as many image conditions as possible. Our selective search results in a small set of data-driven, class-independent, high quality locations, yielding 99 % recall and a Mean Average Best Overlap of 0.879 at 10,097 locations. The reduced number of locations compared to an exhaustive search enables the use of stronger machine learning techniques and stronger appearance models for object recognition. In this paper we show that our selective search enables the use of the powerful Bag-of-Words model for recognition. The selective search software is made publicly available (Software: http://disi.unitn.it/~uijlings/SelectiveSearch.html ).

5,843 citations


Cites background or methods from "Contour Detection and Hierarchical ..."

  • ...They first generate a set of part hypotheses using a grouping method based on Arbelaez et al. [3]....

    [...]

  • ...the fast method of Felzenszwalb and Huttenlocher [13], which [3] found well-suited for such purpose....

    [...]

  • ...This is most naturally addressed by using a hierarchical partitioning, as done for example by Arbelaez et al. [3]....

    [...]

  • ...We compare with the segmentation of [3] and with the object hypothesis regions of both [4, 9]....

    [...]

  • ...In contrast to the segmentation of [4, 9], instead of focusing on the best segmentation algorithm [3], we use a variety of strategies to deal with as many image conditions as possible, thereby severely reducing computational costs while potentially capturing more objects accurately....

    [...]

Book ChapterDOI
TL;DR: In this article, a new representation learning approach for domain adaptation is proposed, in which data at training and test time come from similar but different distributions, and features that cannot discriminate between the training (source) and test (target) domains are used to promote the emergence of features that are discriminative for the main learning task on the source domain.
Abstract: We introduce a new representation learning approach for domain adaptation, in which data at training and test time come from similar but different distributions. Our approach is directly inspired by the theory on domain adaptation suggesting that, for effective domain transfer to be achieved, predictions must be made based on features that cannot discriminate between the training (source) and test (target) domains. The approach implements this idea in the context of neural network architectures that are trained on labeled data from the source domain and unlabeled data from the target domain (no labeled target-domain data is necessary). As the training progresses, the approach promotes the emergence of features that are (i) discriminative for the main learning task on the source domain and (ii) indiscriminate with respect to the shift between the domains. We show that this adaptation behaviour can be achieved in almost any feed-forward model by augmenting it with few standard layers and a new gradient reversal layer. The resulting augmented architecture can be trained using standard backpropagation and stochastic gradient descent, and can thus be implemented with little effort using any of the deep learning packages. We demonstrate the success of our approach for two distinct classification problems (document sentiment analysis and image classification), where state-of-the-art domain adaptation performance on standard benchmarks is achieved. We also validate the approach for descriptor learning task in the context of person re-identification application.

4,862 citations

Book
30 Sep 2010
TL;DR: Computer Vision: Algorithms and Applications explores the variety of techniques commonly used to analyze and interpret images and takes a scientific approach to basic vision problems, formulating physical models of the imaging process before inverting them to produce descriptions of a scene.
Abstract: Humans perceive the three-dimensional structure of the world with apparent ease. However, despite all of the recent advances in computer vision research, the dream of having a computer interpret an image at the same level as a two-year old remains elusive. Why is computer vision such a challenging problem and what is the current state of the art? Computer Vision: Algorithms and Applications explores the variety of techniques commonly used to analyze and interpret images. It also describes challenging real-world applications where vision is being successfully used, both for specialized applications such as medical imaging, and for fun, consumer-level tasks such as image editing and stitching, which students can apply to their own personal photos and videos. More than just a source of recipes, this exceptionally authoritative and comprehensive textbook/reference also takes a scientific approach to basic vision problems, formulating physical models of the imaging process before inverting them to produce descriptions of a scene. These problems are also analyzed using statistical models and solved using rigorous engineering techniques Topics and features: structured to support active curricula and project-oriented courses, with tips in the Introduction for using the book in a variety of customized courses; presents exercises at the end of each chapter with a heavy emphasis on testing algorithms and containing numerous suggestions for small mid-term projects; provides additional material and more detailed mathematical topics in the Appendices, which cover linear algebra, numerical techniques, and Bayesian estimation theory; suggests additional reading at the end of each chapter, including the latest research in each sub-field, in addition to a full Bibliography at the end of the book; supplies supplementary course material for students at the associated website, http://szeliski.org/Book/. Suitable for an upper-level undergraduate or graduate-level course in computer science or engineering, this textbook focuses on basic techniques that work under real-world conditions and encourages students to push their creative boundaries. Its design and exposition also make it eminently suitable as a unique reference to the fundamental techniques and current research literature in computer vision.

4,146 citations


Cites background from "Contour Detection and Hierarchical ..."

  • ...Since the literature on image segmentation is so vast, a good way to get a handle on some of the better performing algorithms is to look at experimental comparisons on human-labeled databases (Arbeláez et al. 2010)....

    [...]

  • ...2008), as well as grouping contours into likely regions (Arbeláez et al. 2010) and wide-baseline correspondence (Meltzer and Soatto 2008)....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: There is a natural uncertainty principle between detection and localization performance, which are the two main goals, and with this principle a single operator shape is derived which is optimal at any scale.
Abstract: This paper describes a computational approach to edge detection. The success of the approach depends on the definition of a comprehensive set of goals for the computation of edge points. These goals must be precise enough to delimit the desired behavior of the detector while making minimal assumptions about the form of the solution. We define detection and localization criteria for a class of edges, and present mathematical forms for these criteria as functionals on the operator impulse response. A third criterion is then added to ensure that the detector has only one response to a single edge. We use the criteria in numerical optimization to derive detectors for several common image features, including step edges. On specializing the analysis to step edges, we find that there is a natural uncertainty principle between detection and localization performance, which are the two main goals. With this principle we derive a single operator shape which is optimal at any scale. The optimal detector has a simple approximate implementation in which edges are marked at maxima in gradient magnitude of a Gaussian-smoothed image. We extend this simple detector using operators of several widths to cope with different signal-to-noise ratios in the image. We present a general method, called feature synthesis, for the fine-to-coarse integration of information from operators at different scales. Finally we show that step edge detector performance improves considerably as the operator point spread function is extended along the edge.

28,073 citations


"Contour Detection and Hierarchical ..." refers background or methods in this paper

  • ...The Canny detector [22] also models edges as sharp discontinuities in the brightness channel, adding nonmaximum suppression and hysteresis thresholding steps....

    [...]

  • ...Comparing boundaries to human ground truth allows us to evaluate contour detectors [3], [22] (dotted lines) and segmentation algorithms [4], [32], [33], [34] (solid lines) in the same framework....

    [...]

  • ...Our gPb detector [3] performs significantly better than other algorithms [2], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26], [27], [28] across almost the entire operating regime....

    [...]

  • ...An optional nonmaximum suppression step [22] produces thinned, real-valued contours....

    [...]

Journal ArticleDOI

17,427 citations


"Contour Detection and Hierarchical ..." refers methods in this paper

  • ...We then apply second-order Savitzky-Golay filtering [ 63 ] to enhance local maxima and smooth out multiple detection peaks in the direction orthogonal to � . This is equivalent to fitting a cylindrical parabola, whose axis is orientated along direction � , to a local 2D window surrounding each pixel and replacing the response at the pixel with that estimated by the fit....

    [...]

Journal ArticleDOI
TL;DR: This work treats image segmentation as a graph partitioning problem and proposes a novel global criterion, the normalized cut, for segmenting the graph, which measures both the total dissimilarity between the different groups as well as the total similarity within the groups.
Abstract: We propose a novel approach for solving the perceptual grouping problem in vision. Rather than focusing on local features and their consistencies in the image data, our approach aims at extracting the global impression of an image. We treat image segmentation as a graph partitioning problem and propose a novel global criterion, the normalized cut, for segmenting the graph. The normalized cut criterion measures both the total dissimilarity between the different groups as well as the total similarity within the groups. We show that an efficient computational technique based on a generalized eigenvalue problem can be used to optimize this criterion. We applied this approach to segmenting static images, as well as motion sequences, and found the results to be very encouraging.

13,789 citations

Book
01 Jan 1973
TL;DR: In this article, a unified, comprehensive and up-to-date treatment of both statistical and descriptive methods for pattern recognition is provided, including Bayesian decision theory, supervised and unsupervised learning, nonparametric techniques, discriminant analysis, clustering, preprosessing of pictorial data, spatial filtering, shape description techniques, perspective transformations, projective invariants, linguistic procedures, and artificial intelligence techniques for scene analysis.
Abstract: Provides a unified, comprehensive and up-to-date treatment of both statistical and descriptive methods for pattern recognition. The topics treated include Bayesian decision theory, supervised and unsupervised learning, nonparametric techniques, discriminant analysis, clustering, preprosessing of pictorial data, spatial filtering, shape description techniques, perspective transformations, projective invariants, linguistic procedures, and artificial intelligence techniques for scene analysis.

13,647 citations

Frequently Asked Questions (15)
Q1. How do the authors compute gradients for each of the channels?

In order to detect fine as well as coarse structures, the authors consider gradients at three scales: [σ2 , σ, 2σ] for each of the brightness, color, and texture channels. 

The basic building block of the Pb contour detector is the computation of an oriented gradient signal G(x, y, θ) from an intensity image The author. 

An advantage of this approach is that it may be possible to handle cues such as parallelism and completion in the initial classification stage. 

Finding the optimal line segment labeling then translates into a general weighted min-cover problem in which the elements being covered are the line segments themselves and the objects covering them are drawn from the set of all possible curves and all possible background line segments. 

The graph based region merging algorithm advocated by Felzenszwalb and Huttenlocher (Felz-Hutt) [32] attempts to partition image pixels into components such that the resulting segmentation is neither too coarse nor too fine. 

Since at every step of the algorithm all remaining contours must have strength greater than or equal to those previously removed, the weight of the contour currently being removed cannot decrease during the merging process. 

Given a graph in which pixels are nodes and edge weights measure the dissimilarity between nodes (e.g. color differences), each node is initially placed in its own component. 

One might argue that the boundary benchmark favors contour detectors over segmentation methods, since the former are not burdened with the constraint of producing closed curves. 

The fact that W must be sparse, in order to avoid a prohibitively expensive computation, limits the naive implementation to using only local pixel affinities. 

The BSDS serves as ground-truth for both the boundary and region quality measures, since the human-drawn boundaries are closed and hence are also segmentations. 

This procedure extracts both closed contours and smooth curves, as edgel chains are allowed to loop back at their termination points. 

More recent local approaches take into account color and texture information and make use of learning techniques for cue combination [2], [26], [27]. 

To describe their algorithm in the most general setting, the authors now consider an arbitrary contour detector, whose output E(x, y, θ) predicts the probability of an image boundary at location (x, y) and orientation θ. 

This process produces a tree of regions, where the leaves are the initial elements of P0, the root is the entire image, and the regions are ordered by the inclusion relation. 

To produce high-quality image segmentations, the authors link this contour detector with a generic grouping algorithm described in Section 4 and consisting of two steps.