scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Shape recognition with edge-based features

TL;DR: An approach to recognizing poorly textured objects, that may contain holes and tubular parts, in cluttered scenes under arbitrary viewing conditions is described and a new edge-based local feature detector that is invariant to similarity transformations is introduced.
Abstract: In this paper we describe an approach to recognizing poorly textured objects, that may contain holes and tubular parts, in cluttered scenes under arbitrary viewing conditions. To this end we develop a number of novel components. First, we introduce a new edge-based local feature detector that is invariant to similarity transformations. The features are localized on edges and a neighbourhood is estimated in a scale invariant manner. Second, the neighbourhood descriptor computed for foreground features is not affected by background clutter, even if the feature is on an object boundary. Third, the descriptor generalizes Lowe's SIFT method to edges. An object model is learnt from a single training image. The object is then recognized in new images in a series of steps which apply progressively tighter geometric restrictions. A final contribution of this work is to allow sufficient flexibility in the geometric representation that objects in the same visual class can be recognized. Results are demonstrated for various object classes including bikes and rackets.

Summary (2 min read)

1 Introduction

  • They obtain excellent results for objects which are locally planar and have a distinctive texture [21].
  • The authors goal is to recognize classes of roughly planar objects of wiry components against a cluttered background.
  • A very important property of their recognition approach is scale invariance [12, 14].
  • A second problem area is occlusions and background clutter.
  • These can significantly change the appearance of features localized on object boundaries.

1.1 Background

  • The authors approach builds on recent object recognition methods.
  • In the context of scale invariant features Mikolajczyk and Schmid [14] developed a scale invariant interest point detector.
  • Therefore, many authors also use geometric relations between features to correctly resolve ambiguous matches.
  • The latter successfully detect objects with wiry components in cluttered backgrounds.
  • Other related approaches using edge information are those of Belongie et al. [3] who use 2D shape signatures based on edges in the context of shape matching, although scale invariance and background clutter problems are not addressed in their work, and the projectively invariant shape descriptor used by Sullivan and Carlsson [25].

1.2 Overview

  • Section 2 presents the new feature detector and local edge descriptor.
  • Section 3 describes the two stages of the recognition system: first clustering on a local transformation to reduce ambiguity, and then estimating a global transformation to detect the object in an image.
  • In more detail, the authors combine an appearance distance between feature descriptors and local geometric consistency to compute the scores for point matches.
  • The best matches with relatively few outliers are then used to vote in the Hough space of local affine transformations.
  • The distinctive clusters in this space are used to detect and localize the objects.

2 Local features

  • In the following the authors describe their feature detector.
  • The authors objective is to determine the edge neighbourhood that is related to the scale of the object.
  • The authors then show how they deal with occlusions and background clutter.
  • Finally the authors present the descriptor that represents the edge shape in the point neighbourhood.

2.1 Support regions

  • In their task edges of low curvature and their spatial relations are very characteristic of the object.
  • It is well know that edge features are present at various scales and can change their appearance at different scales.
  • Given a point the authors compute the Laplacian responses for several scales.
  • There are several advantages to this approach.
  • To reduce the background influence, the point neighbourhood is divided into two parts separated by a chain of dominant edges, and descriptors are computed separately for each part as described below.

2.2 Edge Descriptors

  • A descriptor that captures the shape of the edges and is robust to small geometric and photometric transformations is needed for this approach.
  • A comparative evaluation of descriptors in [16] showed that SIFT descriptors [12] perform significantly better than many other local descriptors recently proposed in the literature.
  • For each region part (cf. figure (a)) the authors build a 3D histogram of gradient values, for which the dimensions are the edge point coordinates (x, y) and the gradient orientation.
  • The descriptor is built from two histograms.
  • The descriptor of each region part contains also the points on the dominant edge.

3 Coarse-to-fine geometric consistency

  • The recognition strategy consists of two main stages aimed at establishing matches between the model and target image.
  • The second stage is clustering the pose of the whole object in a coarsely partitioned affine space.
  • This geometric consistency is used to weight the descriptor distance of every neighbouring point pair.
  • The matched points xa xb give a hypothesis of the local similarity transformation between the images, where the scale change is σa b σa σb and the rotation is φa b φa φb (cf. figure 6).
  • The votes in the transformation space are weighted by the scores obtained in the previous stage.

4 Results

  • To validate their approach the authors detect bicycles in cluttered outdoor scenes under wide viewpoint changes.
  • Other clusters in the space are insignificant.
  • Figures 7(e) and (f) present examples of multiple objects of different scales and small changes of aspect ratio.
  • The local edge descriptors convey information about the shape of the edges and not about their exact appearance in the image.
  • The authors can learn, for example, the variation of the descriptor computed on the same parts of similar objects.

Did you find this useful? Give us your feedback

Figures (7)

Content maybe subject to copyright    Report

Shape recognition with edge-based features
K. Mikolajczyk A. Zisserman C. Schmid
Dept. of Engineering Science Dept. of Engineering Science INRIA Rhˆone-Alpes
Oxford, OX1 3PJ Oxford, OX1 3PJ 38330 Montbonnot
United Kingdom United Kingdom France
   !"#%$
Abstract
In this paper we describe an approach to recognizing poorly textured ob-
jects, that may contain holes and tubular parts, in cluttered scenes under ar-
bitrary viewing conditions. To this end we develop a number of novel com-
ponents. First, we introduce a new edge-based local feature detector that is
invariant to similarity transformations. The features are localized on edges
and a neighbourhood is estimated in a scale invariant manner. Second, the
neighbourhood descriptor computed for foreground features is not affected
by background clutter, even if the feature is on an object boundary. Third,
the descriptor generalizes Lowe’s SIFT method [12] to edges.
An object model is learnt from a single training image. The object is
then recognized in new images in a series of steps which apply progressively
tighter geometric restrictions. A final contribution of this work is to allow
sufficient flexibility in the geometric representation that objects in the same
visual class can be recognized. Results are demonstrated for various object
classes including bikes and rackets.
1 Introduction
Numerous recent approaches to object recognition [2, 12, 14, 15, 13, 20, 24] represent the
object by a set of colour or grey-leveltextured local patches. They obtain excellent results
for objects which are locally planar and have a distinctive texture [21]. However there are
many common objects where texture or colour cannot be used as a cue for recognition (cf.
figure 1). The distinctive features of such objects are edges and the geometric relations
between them. In this paper we present a recognition approach based on local edge fea-
tures invariant to scale changes. Our goal is to recognize classes of roughly planar objects
of wiry components against a cluttered background. For example, bikes, chairs, ladders
etc.
A very important property of our recognition approach is scale invariance [12, 14].
This enables the recognition of an object viewed from a different distance or with dif-
ferent camera settings. The scale invariance can locally approximate affine deformations,
thereby additionally providingsome immunity to out of plane rotations for planar objects.
A second problem area is occlusions and background clutter. These can significantly
change the appearance of features localized on object boundaries. Therefore, it is crucial
to separate the foreground from the background. Since strong edges often appear on the
boundaries they can be used to split the support regions before computing the descriptors.

1.1 Background
Our approach builds on recent object recognition methods. The idea of representing an
object by a collection of local invariant patches (to avoid occlusion problems) can be
traced back to Schmid and Mohr [21], where the patches were based on interest points
and were invariant to rotations. Lowe [12] developed an efficient object recognition ap-
proach based on scale invariant features (SIFT). This approach was recently extended
to sub-pixel/sub-scale feature localization [5]. In the context of scale invariant features
Mikolajczyk and Schmid [14] developed a scale invariant interest point detector.
Recently, many authors developed affine invariant features based on the second mo-
ment matrix [2, 15, 20] or other methods [13, 24]. However, affine invariant features
provide better results than scale invariant features only for significant affine deforma-
tions [15], and are not used here.
The invariance to affine geometric (and photometric transformations) reduces the al-
ready limited information content of local features. Therefore, many authors also use
geometric relations between features to correctly resolve ambiguous matches. A common
approach is to require that the neighbouring matches are consistent with a local estimate
of a geometric transformation [18, 20, 21, 22]. This method has proved very good at
rejecting false matches, and is adopted here.
Edge based method with affine [10] or projective [19] invariance, were successful
in the early nineties, but fell out of favour partly because of the difficulties of correctly
segmenting long edge curves. More recently recognition methods based on the statis-
tics of local edges have been developed by Amit and Geman [1], and Carmichael and
Hebert [7, 8]. The latter successfully detect objects with wiry components in cluttered
backgrounds. However, many positive and negative examples are required to learn the
object shape and background appearance, and there is no invariance to scale. We adopt
a local edge description and incorporate the scale invariance previously only available
to methods based on local appearance patches. The problem of background clutter was
also handled, although manually, in the patch approach proposed by Borenstein and Ull-
man [4] for object segmentation.
Other related approaches using edge information are those of Belongie et al. [3] who
use 2D shape signatures based on edges in the context of shape matching, although scale
invariance and background clutter problems are not addressed in their work, and the pro-
jectively invariant shape descriptor used by Sullivan and Carlsson [25].
1.2 Overview
Section 2 presents the new feature detector and local edge descriptor. Section 3 describes
the two stages of the recognition system: first clustering on a local transformation to re-
duce ambiguity, and then estimating a global (affine) transformation to detect the object
in an image. In more detail, we combine an appearance distance between feature de-
scriptors and local geometric consistency to compute the scores for point matches. The
best matches with relatively few outliers are then used to vote in the Hough space of local
affine transformations. The distinctive clusters in this space are used to detect and localize
the objects. Section 4 gives experimental results.

2 Local features
In the following we describe our feature detector. Our objective is to determine the edge
neighbourhood that is related to the scale of the object. We then show how we deal with
occlusions and background clutter. Finally we present the descriptor that represents the
edge shape in the point neighbourhood.
2.1 Support regions
Edge features. In our task edges of low curvature and their spatial relations are very
characteristic of the object. The widely used Harris [9] and DoG [12] detectors are not
suitable for our purpose as the first one detects corner-like structures and the second one
mostly blobs. Moreover these points are rarely localized on edges, and only accidentally
on straight edges. It is well know that edge features are present at various scales and
can change their appearance at different scales. Figure 1 shows the object and the edges
detected with Gaussian derivatives at
σ
1 and
σ
3. The edges change their locations
due to blurring, and new edges appear at different scales (cf. figure 1(b)(c)). Therefore
it is crucial to build a scale-space representation to capture the possible edge appearance.
To find the local features we first extract edges with a multi-scale Canny edge detector [6]
using Gaussian derivatives at several pre-selected scales, with the scale interval of 1.4.
(a) (b) (c)
Figure 1: (a) Object model. (b) Edges detected at scale
σ
1. (c) Edges detected at scale
σ
3.
Scale invariance. Having computed edges at multiple scales, our goal is now to deter-
mine the size of the neighbourhood of the edge point that will be used to compute the
descriptor. Several authors use the Laplacian operator for this purpose [11, 12, 14, 20].
Given a point we compute the Laplacian responses for several scales. We then select the
scales for which the response attains an extremum. For a perfect step-edge the scale pa-
rameter for which the Laplacian attains an extremum is in fact equal to the distance to
the step-edge. This is a well known property of the Laplacian and can be proved analyti-
cally [11]. Figure 2(a) shows an example of a step-edge and a sketch of a 2D Laplacian
operator centred on a point near the edge. Figure 2(b) shows the responses of the scale
normalized Laplacian for different parameters
σ
. The scale trace attains a minimum for
σ
equal to the distance to the step-edge. There are several advantages to this approach. The
first one is that we obtain characteristic scale for the edge points. This scale is related to
the object scale and determines the point neighbourhood within which we capture more
signal changes [14]. Figure 3 shows a few examples of point neighbourhoods selected
by the Laplacian operator applied to images of different scale. Note that the feature is
centred on one edge and the selected scale corresponds to the distance from the point to
a neighbouring edge tangent to the circle. The edge neighbourhood is correctly detected

(a) (b) (c) (d)
Figure 2: Scale trace of the Laplacian localized on a 2D ridge. (a) 2D ridge. (b) Sketch
of 2D Laplacian operator. (c) Laplacian localized on one edge. (d) Responses of the
scaled Laplacian operator for the given location. The scale of the extremum response
corresponds to the distance to the other edge.
despite the scale change and different background. A second advantage of this approach
is that points which have homogeneous neighbourhood can easily be identified and re-
jected since they do not attain a distinctive extremum over scale. In this manner many of
the edge points computed over the multiple scales are discarded.
An alternative straightforward method would be to search for tangent neighbouring
edges but we found this approach less stable than the Laplacian scale selection.
(a) (b) (c) (d)
Figure 3: A few points selected by the Laplacian measure centred at the edge points.
(a)(c) Images related by a scale factor of 2. (b)(d) Edges with corresponding regions.
Note that the Laplacian attains an extremum when it finds another edge. The radius of the
circles is equal to the selected
σ
.
Foreground-background segmentation. In the following we describe a new method
for separating foreground and background. In the context of recognition of objects with
holes and tubular components the background texture can significantly affect the descrip-
tors such that recognition becomes impossible. To reduce the background influence, the
point neighbourhood is divided into two parts separated by a chain of dominant edges,
and descriptors are computed separately for each part as described below. The domi-
nant edges are selected by locally fitting a line to the extracted edges using RANSAC.
Figure 4(a) shows an example of corresponding edge points on different backgrounds.
Figure 4(b) displays the gradient images and figure 4(c)(d) the selected principal edge
with the neighbourhood. The tangent angle
φ
is used to obtain rotation invariance for the
descriptors.
2.2 Edge Descriptors
A descriptor that captures the shape of the edges and is robust to small geometric and
photometric transformations is needed for this approach. A comparative evaluation of
descriptors in [16] showed that SIFT descriptors [12] perform significantly better than
many other local descriptors recently proposed in the literature. Inspired by this result

(a) (b) (c) (d)
Figure 4: Background foreground segmentation. (a) Point neighbourhood, (b) Gradient
edges. (c)(d) Region parts separated by the dominant edge.
φ
is the reference angle for
the descriptor.
we extend the SIFT descriptor to represent the edges in the point neighbourhood. For
each region part (cf. figure (a)) we build a 3D histogram of gradient values, for which
the dimensions are the edge point coordinates (x, y) and the gradient orientation. The
histogram bins are incremented by the gradient values at the edge points. The values are
weighted by a Gaussian window centred on the region. The descriptor is built from two
histograms. To compute the first we use a 2
2 location grid and 4 orientation planes
(vertical, horizontal and two diagonals, cf. figure (b)). The dimension of this descriptor
is 16. For the second histogram we use a 4
4 location grid and 8 orientation planes
(cf. figure (c)). The dimension is 128. These two histograms are used in our coarse-to-
fine matching strategy discussed in the next section. To obtain rotation invariance the
gradient orientation and the coordinates are relative to the principal line separating the
region (cf. figure 3(c)(d)). The descriptor of each region part contains also the points on
the dominant edge. Each region part is described separately but we also use the joint
descriptor to represent the whole region. To compensate for affine illumination changes
we normalize each description vector by the square root of the sum of squared vector
components. The similarity between the descriptors is measured with Euclidean distance.
hor. diag.
diag. vert.
0
o
45
o
90
o
135
o
180
o
225
o
270
o
315
o
(a) (b) (c)
Figure 5: Edge-based local descriptor. (a) Supportregion and location grid. (b) Four 2
2
orientation planes. (c) Eight 4
4 orientation planes.

Citations
More filters
Journal ArticleDOI
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

46,906 citations


Cites background from "Shape recognition with edge-based f..."

  • ...Mikolajczyk et al. (2003) have developed a new descriptor that uses local edges while ignoring unrelated nearby edges, providing the ability to find stable features even near the boundaries of narrow shapes superimposed on background clutter....

    [...]

01 Jan 2011
TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.
Abstract: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images. These features can then be used to reliably match objects in diering images. The algorithm was rst proposed by Lowe [12] and further developed to increase performance resulting in the classic paper [13] that served as foundation for SIFT which has played an important role in robotic and machine vision in the past decade.

14,708 citations

Journal ArticleDOI
TL;DR: A comparative evaluation of different detectors is presented and it is shown that the proposed approach for detecting interest points invariant to scale and affine transformations provides better results than existing methods.
Abstract: In this paper we propose a novel approach for detecting interest points invariant to scale and affine transformations. Our scale and affine invariant detectors are based on the following recent results: (1) Interest points extracted with the Harris detector can be adapted to affine transformations and give repeatable results (geometrically stable). (2) The characteristic scale of a local structure is indicated by a local extremum over scale of normalized derivatives (the Laplacian). (3) The affine shape of a point neighborhood is estimated based on the second moment matrix. Our scale invariant detector computes a multi-scale representation for the Harris interest point detector and then selects points at which a local measure (the Laplacian) is maximal over scales. This provides a set of distinctive points which are invariant to scale, rotation and translation as well as robust to illumination changes and limited changes of viewpoint. The characteristic scale determines a scale invariant region for each point. We extend the scale invariant detector to affine invariance by estimating the affine shape of a point neighborhood. An iterative algorithm modifies location, scale and neighborhood of each point and converges to affine invariant points. This method can deal with significant affine transformations including large scale changes. The characteristic scale and the affine shape of neighborhood determine an affine invariant region for each point. We present a comparative evaluation of different detectors and show that our approach provides better results than existing methods. The performance of our detector is also confirmed by excellent matching resultss the image is described by a set of scale/affine invariant descriptors computed on the regions associated with our points.

4,107 citations


Cites background or methods from "Shape recognition with edge-based f..."

  • ...An additional post-processing method can be used to separate the foreground from the background (Borenstein and Ullman, 2002; Mikolajczyk and Schmid, 2003b)....

    [...]

  • ...The accuracy of the feature localization and shape is critical for local descriptors, for example, differential descriptors fail if this error is significant (Mikolajczyk and Schmid, 2003a)....

    [...]

  • ...The reader is referred to Mikolajczyk and Schmid (2003a), for a detailed evaluation of different descriptors computed on scale and affine invariant regions....

    [...]

  • ...The evaluation of interest point detectors presented in Schmid et al. (2000) demonstrate an excellent performance of the Harris detector compared to other existing approaches (Cottier, 1994; Forstner, 1994; Heitger et al....

    [...]

  • ...This can be achieved by using (i) more distinctive descriptors (see Mikolajczyk and Schmid, 2003a for a performance evaluation of different descriptors computed for affine-invariant regions) or (ii) semi-local ge- ometric consistency (Dufournaud et al., 2000; Pritchett and Zisserman, 1998; Tell and…...

    [...]

Journal ArticleDOI
TL;DR: A snapshot of the state of the art in affine covariant region detectors, and compares their performance on a set of test images under varying imaging conditions to establish a reference test set of images and performance software so that future detectors can be evaluated in the same framework.
Abstract: The paper gives a snapshot of the state of the art in affine covariant region detectors, and compares their performance on a set of test images under varying imaging conditions. Six types of detectors are included: detectors based on affine normalization around Harris (Mikolajczyk and Schmid, 2002; Schaffalitzky and Zisserman, 2002) and Hessian points (Mikolajczyk and Schmid, 2002), a detector of `maximally stable extremal regions', proposed by Matas et al. (2002); an edge-based region detector (Tuytelaars and Van Gool, 1999) and a detector based on intensity extrema (Tuytelaars and Van Gool, 2000), and a detector of `salient regions', proposed by Kadir, Zisserman and Brady (2004). The performance is measured against changes in viewpoint, scale, illumination, defocus and image compression. The objective of this paper is also to establish a reference test set of images and performance software, so that future detectors can be evaluated in the same framework.

3,359 citations


Cites background from "Shape recognition with edge-based f..."

  • ...…regions which are covariant only to similarity transformations (i.e., in particular scale), such as (Lowe, 1999, 2004; Mikolajczyk and Schmid, 2001; Mikolajczyk et al., 2003), or other methods of computing affine invariant descriptors, such as image lines connecting interest points (Matas et al.,…...

    [...]

  • ..., in particular scale), such as (Lowe, 1999, 2004; Mikolajczyk and Schmid, 2001; Mikolajczyk et al., 2003), or other methods of computing affine invariant descriptors, such as image lines connecting interest points (Matas et al....

    [...]

Book
16 Jun 2008
TL;DR: An overview of invariant interest point detectors can be found in this paper, where an overview of the literature over the past four decades organized in different categories of feature extraction methods is presented.
Abstract: In this survey, we give an overview of invariant interest point detectors, how they evolvd over time, how they work, and what their respective strengths and weaknesses are. We begin with defining the properties of the ideal local feature detector. This is followed by an overview of the literature over the past four decades organized in different categories of feature extraction methods. We then provide a more detailed analysis of a selection of methods which had a particularly significant impact on the research field. We conclude with a summary and promising future research directions.

1,144 citations

References
More filters
Journal ArticleDOI
TL;DR: There is a natural uncertainty principle between detection and localization performance, which are the two main goals, and with this principle a single operator shape is derived which is optimal at any scale.
Abstract: This paper describes a computational approach to edge detection. The success of the approach depends on the definition of a comprehensive set of goals for the computation of edge points. These goals must be precise enough to delimit the desired behavior of the detector while making minimal assumptions about the form of the solution. We define detection and localization criteria for a class of edges, and present mathematical forms for these criteria as functionals on the operator impulse response. A third criterion is then added to ensure that the detector has only one response to a single edge. We use the criteria in numerical optimization to derive detectors for several common image features, including step edges. On specializing the analysis to step edges, we find that there is a natural uncertainty principle between detection and localization performance, which are the two main goals. With this principle we derive a single operator shape which is optimal at any scale. The optimal detector has a simple approximate implementation in which edges are marked at maxima in gradient magnitude of a Gaussian-smoothed image. We extend this simple detector using operators of several widths to cope with different signal-to-noise ratios in the image. We present a general method, called feature synthesis, for the fine-to-coarse integration of information from operators at different scales. Finally we show that step edge detector performance improves considerably as the operator point spread function is extended along the edge.

28,073 citations


"Shape recognition with edge-based f..." refers methods in this paper

  • ...To find the local features we first extract edges with a multi-scale Canny edge detector [6] using Gaussian derivatives at several pre-selected scales, with the scale interval of 1....

    [...]

Proceedings ArticleDOI
20 Sep 1999
TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

16,989 citations


"Shape recognition with edge-based f..." refers background or methods in this paper

  • ...Several authors use the Laplacian operator for this purpose [11, 12, 14, 20]....

    [...]

  • ...The widely used Harris [9] and DoG [12] detectors are not suitable for our purpose as the first one detects corner-like structures and the second one mostly blobs....

    [...]

  • ...Third, the descriptor generalizes Lowe’s SIFT method [12] to edges....

    [...]

  • ...A very important property of our recognition approach is scale invariance [12, 14]....

    [...]

  • ...Numerous recent approaches to object recognition [2, 12, 13, 14, 15, 20, 24] represent the object by a set of colour or grey-level textured local patches....

    [...]

Proceedings ArticleDOI
01 Jan 1988
TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.
Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

13,993 citations


"Shape recognition with edge-based f..." refers methods in this paper

  • ...The widely used Harris [9] and DoG [12] detectors are not suitable for our purpose as the first one detects corner-like structures and the second one mostly blobs....

    [...]

Journal ArticleDOI
TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.
Abstract: In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector [Mikolajczyk, K and Schmid, C, 2004]. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [Belongie, S, et al., April 2002], steerable filters [Freeman, W and Adelson, E, Setp. 1991], PCA-SIFT [Ke, Y and Sukthankar, R, 2004], differential invariants [Koenderink, J and van Doorn, A, 1987], spin images [Lazebnik, S, et al., 2003], SIFT [Lowe, D. G., 1999], complex filters [Schaffalitzky, F and Zisserman, A, 2002], moment invariants [Van Gool, L, et al., 1996], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.

7,057 citations


"Shape recognition with edge-based f..." refers background in this paper

  • ...A comparative evaluation of descriptors in [16] showed that SIFT descriptors [12] perform significantly better than...

    [...]

Journal ArticleDOI
TL;DR: This paper presents work on computing shape models that are computationally fast and invariant basic transformations like translation, scaling and rotation, and proposes shape detection using a feature called shape context, which is descriptive of the shape of the object.
Abstract: We present a novel approach to measuring similarity between shapes and exploit it for object recognition. In our framework, the measurement of similarity is preceded by: (1) solving for correspondences between points on the two shapes; (2) using the correspondences to estimate an aligning transform. In order to solve the correspondence problem, we attach a descriptor, the shape context, to each point. The shape context at a reference point captures the distribution of the remaining points relative to it, thus offering a globally discriminative characterization. Corresponding points on two similar shapes will have similar shape contexts, enabling us to solve for correspondences as an optimal assignment problem. Given the point correspondences, we estimate the transformation that best aligns the two shapes; regularized thin-plate splines provide a flexible class of transformation maps for this purpose. The dissimilarity between the two shapes is computed as a sum of matching errors between corresponding points, together with a term measuring the magnitude of the aligning transform. We treat recognition in a nearest-neighbor classification framework as the problem of finding the stored prototype shape that is maximally similar to that in the image. Results are presented for silhouettes, trademarks, handwritten digits, and the COIL data set.

6,693 citations

Frequently Asked Questions (10)
Q1. What contributions have the authors mentioned in the paper "Shape recognition with edge-based features" ?

In this paper the authors describe an approach to recognizing poorly textured objects, that may contain holes and tubular parts, in cluttered scenes under arbitrary viewing conditions. First, the authors introduce a new edge-based local feature detector that is invariant to similarity transformations. A final contribution of this work is to allow sufficient flexibility in the geometric representation that objects in the same visual class can be recognized. 

The scale invariance can locally approximate affine deformations, thereby additionally providing some immunity to out of plane rotations for planar objects. 

Since strong edges often appear on the boundaries they can be used to split the support regions before computing the descriptors. 

Edge based method with affine [10] or projective [19] invariance, were successful in the early nineties, but fell out of favour partly because of the difficulties of correctly segmenting long edge curves. 

A descriptor that captures the shape of the edges and is robust to small geometric and photometric transformations is needed for this approach. 

The authors use 1 dE to avoid zero in the denominator (cf. equation 1), which can happen when the distance between descriptor vectors equals zero. 

The first stage is filtering matches by taking into account the similarity of their histogram descriptors and the local geometric consistency of a similarity transformations between spatially neighbouring matches. 

many authors developed affine invariant features based on the second moment matrix [2, 15, 20] or other methods [13, 24]. 

The matching score for a given pair of points is:v xa xb 11 dE xa xb ∑i j βi jαi j1 dE xi x j (1)where α and β are the penalizing functions defined byαi j 1 1 0 1 φa b φi j βi j σa b σi j i f σa b σi j 1σi j σa b otherwise Points xi x j are spatial neighbours of points xa xb (cf. figure 6) within a distance 5σa 5σb respectively. 

For a perfect step-edge the scale parameter for which the Laplacian attains an extremum is in fact equal to the distance to the step-edge.