scispace - formally typeset
Open AccessProceedings ArticleDOI

Shape recognition with edge-based features

Reads0
Chats0
TLDR
An approach to recognizing poorly textured objects, that may contain holes and tubular parts, in cluttered scenes under arbitrary viewing conditions is described and a new edge-based local feature detector that is invariant to similarity transformations is introduced.
Abstract
In this paper we describe an approach to recognizing poorly textured objects, that may contain holes and tubular parts, in cluttered scenes under arbitrary viewing conditions. To this end we develop a number of novel components. First, we introduce a new edge-based local feature detector that is invariant to similarity transformations. The features are localized on edges and a neighbourhood is estimated in a scale invariant manner. Second, the neighbourhood descriptor computed for foreground features is not affected by background clutter, even if the feature is on an object boundary. Third, the descriptor generalizes Lowe's SIFT method to edges. An object model is learnt from a single training image. The object is then recognized in new images in a series of steps which apply progressively tighter geometric restrictions. A final contribution of this work is to allow sufficient flexibility in the geometric representation that objects in the same visual class can be recognized. Results are demonstrated for various object classes including bikes and rackets.

read more

Content maybe subject to copyright    Report

Shape recognition with edge-based features
K. Mikolajczyk A. Zisserman C. Schmid
Dept. of Engineering Science Dept. of Engineering Science INRIA Rhˆone-Alpes
Oxford, OX1 3PJ Oxford, OX1 3PJ 38330 Montbonnot
United Kingdom United Kingdom France
   !"#%$
Abstract
In this paper we describe an approach to recognizing poorly textured ob-
jects, that may contain holes and tubular parts, in cluttered scenes under ar-
bitrary viewing conditions. To this end we develop a number of novel com-
ponents. First, we introduce a new edge-based local feature detector that is
invariant to similarity transformations. The features are localized on edges
and a neighbourhood is estimated in a scale invariant manner. Second, the
neighbourhood descriptor computed for foreground features is not affected
by background clutter, even if the feature is on an object boundary. Third,
the descriptor generalizes Lowe’s SIFT method [12] to edges.
An object model is learnt from a single training image. The object is
then recognized in new images in a series of steps which apply progressively
tighter geometric restrictions. A final contribution of this work is to allow
sufficient flexibility in the geometric representation that objects in the same
visual class can be recognized. Results are demonstrated for various object
classes including bikes and rackets.
1 Introduction
Numerous recent approaches to object recognition [2, 12, 14, 15, 13, 20, 24] represent the
object by a set of colour or grey-leveltextured local patches. They obtain excellent results
for objects which are locally planar and have a distinctive texture [21]. However there are
many common objects where texture or colour cannot be used as a cue for recognition (cf.
figure 1). The distinctive features of such objects are edges and the geometric relations
between them. In this paper we present a recognition approach based on local edge fea-
tures invariant to scale changes. Our goal is to recognize classes of roughly planar objects
of wiry components against a cluttered background. For example, bikes, chairs, ladders
etc.
A very important property of our recognition approach is scale invariance [12, 14].
This enables the recognition of an object viewed from a different distance or with dif-
ferent camera settings. The scale invariance can locally approximate affine deformations,
thereby additionally providingsome immunity to out of plane rotations for planar objects.
A second problem area is occlusions and background clutter. These can significantly
change the appearance of features localized on object boundaries. Therefore, it is crucial
to separate the foreground from the background. Since strong edges often appear on the
boundaries they can be used to split the support regions before computing the descriptors.

1.1 Background
Our approach builds on recent object recognition methods. The idea of representing an
object by a collection of local invariant patches (to avoid occlusion problems) can be
traced back to Schmid and Mohr [21], where the patches were based on interest points
and were invariant to rotations. Lowe [12] developed an efficient object recognition ap-
proach based on scale invariant features (SIFT). This approach was recently extended
to sub-pixel/sub-scale feature localization [5]. In the context of scale invariant features
Mikolajczyk and Schmid [14] developed a scale invariant interest point detector.
Recently, many authors developed affine invariant features based on the second mo-
ment matrix [2, 15, 20] or other methods [13, 24]. However, affine invariant features
provide better results than scale invariant features only for significant affine deforma-
tions [15], and are not used here.
The invariance to affine geometric (and photometric transformations) reduces the al-
ready limited information content of local features. Therefore, many authors also use
geometric relations between features to correctly resolve ambiguous matches. A common
approach is to require that the neighbouring matches are consistent with a local estimate
of a geometric transformation [18, 20, 21, 22]. This method has proved very good at
rejecting false matches, and is adopted here.
Edge based method with affine [10] or projective [19] invariance, were successful
in the early nineties, but fell out of favour partly because of the difficulties of correctly
segmenting long edge curves. More recently recognition methods based on the statis-
tics of local edges have been developed by Amit and Geman [1], and Carmichael and
Hebert [7, 8]. The latter successfully detect objects with wiry components in cluttered
backgrounds. However, many positive and negative examples are required to learn the
object shape and background appearance, and there is no invariance to scale. We adopt
a local edge description and incorporate the scale invariance previously only available
to methods based on local appearance patches. The problem of background clutter was
also handled, although manually, in the patch approach proposed by Borenstein and Ull-
man [4] for object segmentation.
Other related approaches using edge information are those of Belongie et al. [3] who
use 2D shape signatures based on edges in the context of shape matching, although scale
invariance and background clutter problems are not addressed in their work, and the pro-
jectively invariant shape descriptor used by Sullivan and Carlsson [25].
1.2 Overview
Section 2 presents the new feature detector and local edge descriptor. Section 3 describes
the two stages of the recognition system: first clustering on a local transformation to re-
duce ambiguity, and then estimating a global (affine) transformation to detect the object
in an image. In more detail, we combine an appearance distance between feature de-
scriptors and local geometric consistency to compute the scores for point matches. The
best matches with relatively few outliers are then used to vote in the Hough space of local
affine transformations. The distinctive clusters in this space are used to detect and localize
the objects. Section 4 gives experimental results.

2 Local features
In the following we describe our feature detector. Our objective is to determine the edge
neighbourhood that is related to the scale of the object. We then show how we deal with
occlusions and background clutter. Finally we present the descriptor that represents the
edge shape in the point neighbourhood.
2.1 Support regions
Edge features. In our task edges of low curvature and their spatial relations are very
characteristic of the object. The widely used Harris [9] and DoG [12] detectors are not
suitable for our purpose as the first one detects corner-like structures and the second one
mostly blobs. Moreover these points are rarely localized on edges, and only accidentally
on straight edges. It is well know that edge features are present at various scales and
can change their appearance at different scales. Figure 1 shows the object and the edges
detected with Gaussian derivatives at
σ
1 and
σ
3. The edges change their locations
due to blurring, and new edges appear at different scales (cf. figure 1(b)(c)). Therefore
it is crucial to build a scale-space representation to capture the possible edge appearance.
To find the local features we first extract edges with a multi-scale Canny edge detector [6]
using Gaussian derivatives at several pre-selected scales, with the scale interval of 1.4.
(a) (b) (c)
Figure 1: (a) Object model. (b) Edges detected at scale
σ
1. (c) Edges detected at scale
σ
3.
Scale invariance. Having computed edges at multiple scales, our goal is now to deter-
mine the size of the neighbourhood of the edge point that will be used to compute the
descriptor. Several authors use the Laplacian operator for this purpose [11, 12, 14, 20].
Given a point we compute the Laplacian responses for several scales. We then select the
scales for which the response attains an extremum. For a perfect step-edge the scale pa-
rameter for which the Laplacian attains an extremum is in fact equal to the distance to
the step-edge. This is a well known property of the Laplacian and can be proved analyti-
cally [11]. Figure 2(a) shows an example of a step-edge and a sketch of a 2D Laplacian
operator centred on a point near the edge. Figure 2(b) shows the responses of the scale
normalized Laplacian for different parameters
σ
. The scale trace attains a minimum for
σ
equal to the distance to the step-edge. There are several advantages to this approach. The
first one is that we obtain characteristic scale for the edge points. This scale is related to
the object scale and determines the point neighbourhood within which we capture more
signal changes [14]. Figure 3 shows a few examples of point neighbourhoods selected
by the Laplacian operator applied to images of different scale. Note that the feature is
centred on one edge and the selected scale corresponds to the distance from the point to
a neighbouring edge tangent to the circle. The edge neighbourhood is correctly detected

(a) (b) (c) (d)
Figure 2: Scale trace of the Laplacian localized on a 2D ridge. (a) 2D ridge. (b) Sketch
of 2D Laplacian operator. (c) Laplacian localized on one edge. (d) Responses of the
scaled Laplacian operator for the given location. The scale of the extremum response
corresponds to the distance to the other edge.
despite the scale change and different background. A second advantage of this approach
is that points which have homogeneous neighbourhood can easily be identified and re-
jected since they do not attain a distinctive extremum over scale. In this manner many of
the edge points computed over the multiple scales are discarded.
An alternative straightforward method would be to search for tangent neighbouring
edges but we found this approach less stable than the Laplacian scale selection.
(a) (b) (c) (d)
Figure 3: A few points selected by the Laplacian measure centred at the edge points.
(a)(c) Images related by a scale factor of 2. (b)(d) Edges with corresponding regions.
Note that the Laplacian attains an extremum when it finds another edge. The radius of the
circles is equal to the selected
σ
.
Foreground-background segmentation. In the following we describe a new method
for separating foreground and background. In the context of recognition of objects with
holes and tubular components the background texture can significantly affect the descrip-
tors such that recognition becomes impossible. To reduce the background influence, the
point neighbourhood is divided into two parts separated by a chain of dominant edges,
and descriptors are computed separately for each part as described below. The domi-
nant edges are selected by locally fitting a line to the extracted edges using RANSAC.
Figure 4(a) shows an example of corresponding edge points on different backgrounds.
Figure 4(b) displays the gradient images and figure 4(c)(d) the selected principal edge
with the neighbourhood. The tangent angle
φ
is used to obtain rotation invariance for the
descriptors.
2.2 Edge Descriptors
A descriptor that captures the shape of the edges and is robust to small geometric and
photometric transformations is needed for this approach. A comparative evaluation of
descriptors in [16] showed that SIFT descriptors [12] perform significantly better than
many other local descriptors recently proposed in the literature. Inspired by this result

(a) (b) (c) (d)
Figure 4: Background foreground segmentation. (a) Point neighbourhood, (b) Gradient
edges. (c)(d) Region parts separated by the dominant edge.
φ
is the reference angle for
the descriptor.
we extend the SIFT descriptor to represent the edges in the point neighbourhood. For
each region part (cf. figure (a)) we build a 3D histogram of gradient values, for which
the dimensions are the edge point coordinates (x, y) and the gradient orientation. The
histogram bins are incremented by the gradient values at the edge points. The values are
weighted by a Gaussian window centred on the region. The descriptor is built from two
histograms. To compute the first we use a 2
2 location grid and 4 orientation planes
(vertical, horizontal and two diagonals, cf. figure (b)). The dimension of this descriptor
is 16. For the second histogram we use a 4
4 location grid and 8 orientation planes
(cf. figure (c)). The dimension is 128. These two histograms are used in our coarse-to-
fine matching strategy discussed in the next section. To obtain rotation invariance the
gradient orientation and the coordinates are relative to the principal line separating the
region (cf. figure 3(c)(d)). The descriptor of each region part contains also the points on
the dominant edge. Each region part is described separately but we also use the joint
descriptor to represent the whole region. To compensate for affine illumination changes
we normalize each description vector by the square root of the sum of squared vector
components. The similarity between the descriptors is measured with Euclidean distance.
hor. diag.
diag. vert.
0
o
45
o
90
o
135
o
180
o
225
o
270
o
315
o
(a) (b) (c)
Figure 5: Edge-based local descriptor. (a) Supportregion and location grid. (b) Four 2
2
orientation planes. (c) Eight 4
4 orientation planes.

Citations
More filters
Journal ArticleDOI

Distinctive Image Features from Scale-Invariant Keypoints

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.

Distinctive Image Features from Scale-Invariant Keypoints

TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.
Journal ArticleDOI

Scale & Affine Invariant Interest Point Detectors

TL;DR: A comparative evaluation of different detectors is presented and it is shown that the proposed approach for detecting interest points invariant to scale and affine transformations provides better results than existing methods.
Journal ArticleDOI

A Comparison of Affine Region Detectors

TL;DR: A snapshot of the state of the art in affine covariant region detectors, and compares their performance on a set of test images under varying imaging conditions to establish a reference test set of images and performance software so that future detectors can be evaluated in the same framework.
Book

Local Invariant Feature Detectors: A Survey

TL;DR: An overview of invariant interest point detectors can be found in this paper, where an overview of the literature over the past four decades organized in different categories of feature extraction methods is presented.
References
More filters
Journal ArticleDOI

A Computational Approach to Edge Detection

TL;DR: There is a natural uncertainty principle between detection and localization performance, which are the two main goals, and with this principle a single operator shape is derived which is optimal at any scale.
Proceedings ArticleDOI

Object recognition from local scale-invariant features

TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
Proceedings ArticleDOI

A Combined Corner and Edge Detector

TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.
Journal ArticleDOI

A performance evaluation of local descriptors

TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.
Journal ArticleDOI

Shape matching and object recognition using shape contexts

TL;DR: This paper presents work on computing shape models that are computationally fast and invariant basic transformations like translation, scaling and rotation, and proposes shape detection using a feature called shape context, which is descriptive of the shape of the object.
Related Papers (5)
Frequently Asked Questions (10)
Q1. What contributions have the authors mentioned in the paper "Shape recognition with edge-based features" ?

In this paper the authors describe an approach to recognizing poorly textured objects, that may contain holes and tubular parts, in cluttered scenes under arbitrary viewing conditions. First, the authors introduce a new edge-based local feature detector that is invariant to similarity transformations. A final contribution of this work is to allow sufficient flexibility in the geometric representation that objects in the same visual class can be recognized. 

The scale invariance can locally approximate affine deformations, thereby additionally providing some immunity to out of plane rotations for planar objects. 

Since strong edges often appear on the boundaries they can be used to split the support regions before computing the descriptors. 

Edge based method with affine [10] or projective [19] invariance, were successful in the early nineties, but fell out of favour partly because of the difficulties of correctly segmenting long edge curves. 

A descriptor that captures the shape of the edges and is robust to small geometric and photometric transformations is needed for this approach. 

The authors use 1 dE to avoid zero in the denominator (cf. equation 1), which can happen when the distance between descriptor vectors equals zero. 

The first stage is filtering matches by taking into account the similarity of their histogram descriptors and the local geometric consistency of a similarity transformations between spatially neighbouring matches. 

many authors developed affine invariant features based on the second moment matrix [2, 15, 20] or other methods [13, 24]. 

The matching score for a given pair of points is:v xa xb 11 dE xa xb ∑i j βi jαi j1 dE xi x j (1)where α and β are the penalizing functions defined byαi j 1 1 0 1 φa b φi j βi j σa b σi j i f σa b σi j 1σi j σa b otherwise Points xi x j are spatial neighbours of points xa xb (cf. figure 6) within a distance 5σa 5σb respectively. 

For a perfect step-edge the scale parameter for which the Laplacian attains an extremum is in fact equal to the distance to the step-edge.