Why do the authors use 1 dE to avoid zero in the denominator?

The authors use 1 dE to avoid zero in the denominator (cf. equation 1), which can happen when the distance between descriptor vectors equals zero.

What is the first stage of the recognition strategy?

The first stage is filtering matches by taking into account the similarity of their histogram descriptors and the local geometric consistency of a similarity transformations between spatially neighbouring matches.

What is the score for a given pair of points?

The matching score for a given pair of points is:v xa xb 11 dE xa xb ∑i j βi jαi j1 dE xi x j (1)where α and β are the penalizing functions defined byαi j 1 1 0 1 φa b φi j βi j σa b σi j i f σa b σi j 1σi j σa b otherwise Points xi x j are spatial neighbours of points xa xb (cf. figure 6) within a distance 5σa 5σb respectively.

(Open Access) Shape recognition with edge-based features (2003) | Krystian Mikolajczyk

Q: What contributions have the authors mentioned in the paper "Shape recognition with edge-based features" ?

In this paper the authors describe an approach to recognizing poorly textured objects, that may contain holes and tubular parts, in cluttered scenes under arbitrary viewing conditions. First, the authors introduce a new edge-based local feature detector that is invariant to similarity transformations. A final contribution of this work is to allow sufficient flexibility in the geometric representation that objects in the same visual class can be recognized.

Q: What is the key property of the scale invariance approach?

The scale invariance can locally approximate affine deformations, thereby additionally providing some immunity to out of plane rotations for planar objects.

Q: What was the first successful method in the early nineties?

Edge based method with affine [10] or projective [19] invariance, were successful in the early nineties, but fell out of favour partly because of the difficulties of correctly segmenting long edge curves.

Q: What is the recent development of affine invariant features?

many authors developed affine invariant features based on the second moment matrix [2, 15, 20] or other methods [13, 24].

Q: What is the scale parameter for which the Laplacian attains an extremum?

For a perfect step-edge the scale parameter for which the Laplacian attains an extremum is in fact equal to the distance to the step-edge.

Shape recognition with edge-based features

K. Mikolajczyk A. Zisserman C. Schmid

Dept. of Engineering Science Dept. of Engineering Science INRIA Rhˆone-Alpes

Oxford, OX1 3PJ Oxford, OX1 3PJ 38330 Montbonnot

United Kingdom United Kingdom France

   !"#%$

Abstract

In this paper we describe an approach to recognizing poorly textured ob-

jects, that may contain holes and tubular parts, in cluttered scenes under ar-

bitrary viewing conditions. To this end we develop a number of novel com-

ponents. First, we introduce a new edge-based local feature detector that is

invariant to similarity transformations. The features are localized on edges

and a neighbourhood is estimated in a scale invariant manner. Second, the

neighbourhood descriptor computed for foreground features is not affected

by background clutter, even if the feature is on an object boundary. Third,

the descriptor generalizes Lowe’s SIFT method [12] to edges.

An object model is learnt from a single training image. The object is

then recognized in new images in a series of steps which apply progressively

tighter geometric restrictions. A ﬁnal contribution of this work is to allow

sufﬁcient ﬂexibility in the geometric representation that objects in the same

visual class can be recognized. Results are demonstrated for various object

classes including bikes and rackets.

1 Introduction

Numerous recent approaches to object recognition [2, 12, 14, 15, 13, 20, 24] represent the

object by a set of colour or grey-leveltextured local patches. They obtain excellent results

for objects which are locally planar and have a distinctive texture [21]. However there are

many common objects where texture or colour cannot be used as a cue for recognition (cf.

ﬁgure 1). The distinctive features of such objects are edges and the geometric relations

between them. In this paper we present a recognition approach based on local edge fea-

tures invariant to scale changes. Our goal is to recognize classes of roughly planar objects

of wiry components against a cluttered background. For example, bikes, chairs, ladders

etc.

A very important property of our recognition approach is scale invariance [12, 14].

This enables the recognition of an object viewed from a different distance or with dif-

ferent camera settings. The scale invariance can locally approximate afﬁne deformations,

thereby additionally providingsome immunity to out of plane rotations for planar objects.

A second problem area is occlusions and background clutter. These can signiﬁcantly

change the appearance of features localized on object boundaries. Therefore, it is crucial

to separate the foreground from the background. Since strong edges often appear on the

boundaries they can be used to split the support regions before computing the descriptors.

1.1 Background

Our approach builds on recent object recognition methods. The idea of representing an

object by a collection of local invariant patches (to avoid occlusion problems) can be

traced back to Schmid and Mohr [21], where the patches were based on interest points

and were invariant to rotations. Lowe [12] developed an efﬁcient object recognition ap-

proach based on scale invariant features (SIFT). This approach was recently extended

to sub-pixel/sub-scale feature localization [5]. In the context of scale invariant features

Mikolajczyk and Schmid [14] developed a scale invariant interest point detector.

Recently, many authors developed afﬁne invariant features based on the second mo-

ment matrix [2, 15, 20] or other methods [13, 24]. However, afﬁne invariant features

provide better results than scale invariant features only for signiﬁcant afﬁne deforma-

tions [15], and are not used here.

The invariance to afﬁne geometric (and photometric transformations) reduces the al-

ready limited information content of local features. Therefore, many authors also use

geometric relations between features to correctly resolve ambiguous matches. A common

approach is to require that the neighbouring matches are consistent with a local estimate

of a geometric transformation [18, 20, 21, 22]. This method has proved very good at

rejecting false matches, and is adopted here.

Edge based method with afﬁne [10] or projective [19] invariance, were successful

in the early nineties, but fell out of favour partly because of the difﬁculties of correctly

segmenting long edge curves. More recently recognition methods based on the statis-

tics of local edges have been developed by Amit and Geman [1], and Carmichael and

Hebert [7, 8]. The latter successfully detect objects with wiry components in cluttered

backgrounds. However, many positive and negative examples are required to learn the

object shape and background appearance, and there is no invariance to scale. We adopt

a local edge description and incorporate the scale invariance previously only available

to methods based on local appearance patches. The problem of background clutter was

also handled, although manually, in the patch approach proposed by Borenstein and Ull-

man [4] for object segmentation.

Other related approaches using edge information are those of Belongie et al. [3] who

use 2D shape signatures based on edges in the context of shape matching, although scale

invariance and background clutter problems are not addressed in their work, and the pro-

jectively invariant shape descriptor used by Sullivan and Carlsson [25].

1.2 Overview

Section 2 presents the new feature detector and local edge descriptor. Section 3 describes

the two stages of the recognition system: ﬁrst clustering on a local transformation to re-

duce ambiguity, and then estimating a global (afﬁne) transformation to detect the object

in an image. In more detail, we combine an appearance distance between feature de-

scriptors and local geometric consistency to compute the scores for point matches. The

best matches with relatively few outliers are then used to vote in the Hough space of local

afﬁne transformations. The distinctive clusters in this space are used to detect and localize

the objects. Section 4 gives experimental results.

2 Local features

In the following we describe our feature detector. Our objective is to determine the edge

neighbourhood that is related to the scale of the object. We then show how we deal with

occlusions and background clutter. Finally we present the descriptor that represents the

edge shape in the point neighbourhood.

2.1 Support regions

Edge features. In our task edges of low curvature and their spatial relations are very

characteristic of the object. The widely used Harris [9] and DoG [12] detectors are not

suitable for our purpose as the ﬁrst one detects corner-like structures and the second one

mostly blobs. Moreover these points are rarely localized on edges, and only accidentally

on straight edges. It is well know that edge features are present at various scales and

can change their appearance at different scales. Figure 1 shows the object and the edges

detected with Gaussian derivatives at



1 and



3. The edges change their locations

due to blurring, and new edges appear at different scales (cf. ﬁgure 1(b)(c)). Therefore

it is crucial to build a scale-space representation to capture the possible edge appearance.

To ﬁnd the local features we ﬁrst extract edges with a multi-scale Canny edge detector [6]

using Gaussian derivatives at several pre-selected scales, with the scale interval of 1.4.

(a) (b) (c)

Figure 1: (a) Object model. (b) Edges detected at scale



1. (c) Edges detected at scale



Scale invariance. Having computed edges at multiple scales, our goal is now to deter-

mine the size of the neighbourhood of the edge point that will be used to compute the

descriptor. Several authors use the Laplacian operator for this purpose [11, 12, 14, 20].

Given a point we compute the Laplacian responses for several scales. We then select the

scales for which the response attains an extremum. For a perfect step-edge the scale pa-

rameter for which the Laplacian attains an extremum is in fact equal to the distance to

the step-edge. This is a well known property of the Laplacian and can be proved analyti-

cally [11]. Figure 2(a) shows an example of a step-edge and a sketch of a 2D Laplacian

operator centred on a point near the edge. Figure 2(b) shows the responses of the scale

normalized Laplacian for different parameters

. The scale trace attains a minimum for

equal to the distance to the step-edge. There are several advantages to this approach. The

ﬁrst one is that we obtain characteristic scale for the edge points. This scale is related to

the object scale and determines the point neighbourhood within which we capture more

signal changes [14]. Figure 3 shows a few examples of point neighbourhoods selected

by the Laplacian operator applied to images of different scale. Note that the feature is

centred on one edge and the selected scale corresponds to the distance from the point to

a neighbouring edge tangent to the circle. The edge neighbourhood is correctly detected

(a) (b) (c) (d)

Figure 2: Scale trace of the Laplacian localized on a 2D ridge. (a) 2D ridge. (b) Sketch

of 2D Laplacian operator. (c) Laplacian localized on one edge. (d) Responses of the

scaled Laplacian operator for the given location. The scale of the extremum response

corresponds to the distance to the other edge.

despite the scale change and different background. A second advantage of this approach

is that points which have homogeneous neighbourhood can easily be identiﬁed and re-

jected since they do not attain a distinctive extremum over scale. In this manner many of

the edge points computed over the multiple scales are discarded.

An alternative straightforward method would be to search for tangent neighbouring

edges but we found this approach less stable than the Laplacian scale selection.

(a) (b) (c) (d)

Figure 3: A few points selected by the Laplacian measure centred at the edge points.

(a)(c) Images related by a scale factor of 2. (b)(d) Edges with corresponding regions.

Note that the Laplacian attains an extremum when it ﬁnds another edge. The radius of the

circles is equal to the selected

Foreground-background segmentation. In the following we describe a new method

for separating foreground and background. In the context of recognition of objects with

holes and tubular components the background texture can signiﬁcantly affect the descrip-

tors such that recognition becomes impossible. To reduce the background inﬂuence, the

point neighbourhood is divided into two parts separated by a chain of dominant edges,

and descriptors are computed separately for each part as described below. The domi-

nant edges are selected by locally ﬁtting a line to the extracted edges using RANSAC.

Figure 4(a) shows an example of corresponding edge points on different backgrounds.

Figure 4(b) displays the gradient images and ﬁgure 4(c)(d) the selected principal edge

with the neighbourhood. The tangent angle

is used to obtain rotation invariance for the

descriptors.

2.2 Edge Descriptors

A descriptor that captures the shape of the edges and is robust to small geometric and

photometric transformations is needed for this approach. A comparative evaluation of

descriptors in [16] showed that SIFT descriptors [12] perform signiﬁcantly better than

many other local descriptors recently proposed in the literature. Inspired by this result

(a) (b) (c) (d)

Figure 4: Background foreground segmentation. (a) Point neighbourhood, (b) Gradient

edges. (c)(d) Region parts separated by the dominant edge.

is the reference angle for

the descriptor.

we extend the SIFT descriptor to represent the edges in the point neighbourhood. For

each region part (cf. ﬁgure (a)) we build a 3D histogram of gradient values, for which

the dimensions are the edge point coordinates (x, y) and the gradient orientation. The

histogram bins are incremented by the gradient values at the edge points. The values are

weighted by a Gaussian window centred on the region. The descriptor is built from two

histograms. To compute the ﬁrst we use a 2



2 location grid and 4 orientation planes

(vertical, horizontal and two diagonals, cf. ﬁgure (b)). The dimension of this descriptor

is 16. For the second histogram we use a 4



4 location grid and 8 orientation planes

(cf. ﬁgure (c)). The dimension is 128. These two histograms are used in our coarse-to-

ﬁne matching strategy discussed in the next section. To obtain rotation invariance the

gradient orientation and the coordinates are relative to the principal line separating the

region (cf. ﬁgure 3(c)(d)). The descriptor of each region part contains also the points on

the dominant edge. Each region part is described separately but we also use the joint

descriptor to represent the whole region. To compensate for afﬁne illumination changes

we normalize each description vector by the square root of the sum of squared vector

components. The similarity between the descriptors is measured with Euclidean distance.

hor. diag.

diag. vert.

135

180

225

270

315

(a) (b) (c)

Figure 5: Edge-based local descriptor. (a) Supportregion and location grid. (b) Four 2



orientation planes. (c) Eight 4



4 orientation planes.

Shape recognition with edge-based features

Figures

Citations

Distinctive Image Features from Scale-Invariant Keypoints

Distinctive Image Features from Scale-Invariant Keypoints

Scale & Affine Invariant Interest Point Detectors

A Comparison of Affine Region Detectors

Local Invariant Feature Detectors: A Survey

References

A Computational Approach to Edge Detection

Object recognition from local scale-invariant features

A Combined Corner and Edge Detector

A performance evaluation of local descriptors

Shape matching and object recognition using shape contexts

Related Papers (5)

Distinctive Image Features from Scale-Invariant Keypoints

A Combined Corner and Edge Detector

Object recognition from local scale-invariant features

Shape matching and object recognition using shape contexts

A performance evaluation of local descriptors

Frequently Asked Questions (10)

Q1. What contributions have the authors mentioned in the paper "Shape recognition with edge-based features" ?

Q2. What is the key property of the scale invariance approach?

Q3. Why do strong edges often appear on the boundaries?

Q4. What was the first successful method in the early nineties?

Q5. What is the need for a descriptor that captures the shape of the edges?

Q6. Why do the authors use 1 dE to avoid zero in the denominator?

Q7. What is the first stage of the recognition strategy?

Q8. What is the recent development of affine invariant features?

Q9. What is the score for a given pair of points?

Q10. What is the scale parameter for which the Laplacian attains an extremum?